plastid.readers.bigwig module

Reader for BigWig files, built atop Jim Kent’s utilities.

Summary

BigWig is a binary, indexed, high-performance format that associates quantitative values with genomic positions. Because BigWig files are indexed, they allow both sequential and random access to data, and require substantially less memory than unindexed wiggle or bedGraph files. Like wiggle and bedGraph formats, BigWig is an unstranded format, so data for plus and minus strands must be stored in separate files.

Module Contents

BigWigReader(filename[, maxmem])

Reader providing random or sequential access to data stored in BigWig files.

BigWigIterator(BigWigIterator)

Iterate over records in the BigWig file, sorted lexically by chromosome and position.

Examples

Fetch counts over a Transcript/SegmentChain or GenomicSegment:

>>> count_data = BigWigReader("some_file.bw")

# segment covering positions 5-2000 on chrI
>>> segment_counts = count_data[GenomicSegment("chrI", 50, 2000, "+")]

# a transcript- normally this would come from a BED_Reader or similar
>>> my_transcript = Transcript(GenomicSegment("chrV", 5000, 5500, "+"),
>>>                            GenomicSegment("chrV", 8000, 9000, "+"),
>>>                            ID='some_transcript')
>>> transcript_counts = count_data[my_transcript]
>>> transcript_counts
array([ 36, 171,  65,  95, 109, 178, 211,  71,  25, 208,  20, 189, 205,
       182, 102, 159, 154, 148,  15,  65, 237, 104, 211, 162,  22,   4,
       254,  85,  53, 160,  58,  74, 199,  85, 205, 242, 162,  23, 246,
       ...
       (rest of output omitted) ])

Efficiently fetch a numpy.ndarray of counts covering a whole chromosome:

>>> chrI_counts = count_data.get_chromosome("chrI")
>>> chrI_counts
[ numpy array of counts covering chromosome chrI ]

Iterate over a BigWig file (this is unusual). Data are returned as tuples of (chromosome name, start coordinate, end coordinate, and the value over those coordinates):

>>> for chrom, my_start, my_end, value in count_data:
>>>     pass # do something interesting with those values

See also

BigWigGenomeArray

A GenomeArray for BigWig files.

Kent2010

Description of BigBed and BigWig formats. Especially see supplemental data.

Source repository for Kent utilities

The header files are particularly useful.

class plastid.readers.bigwig.BigWigReader(filename, maxmem=0)

Bases: plastid.readers.bbifile._BBI_Reader

Reader providing random or sequential access to data stored in BigWig files.

Parameters
filenamestr

Name of bigwig file

maxmemfloat

Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)

Examples

Fetch counts over a Transcript/SegmentChain or GenomicSegment:

>>> count_data = BigWigReader("some_file.bw")

# segment covering positions 5-2000 on chrI
>>> segment_counts = count_data[GenomicSegment("chrI", 50, 2000, "+")]

# a transcript- normally this would come from a BED_Reader or similar
>>> my_transcript = Transcript(GenomicSegment("chrV", 5000, 5500, "+"),
>>>                            GenomicSegment("chrV", 8000, 9000, "+"),
>>>                            ID='some_transcript')
>>> transcript_counts = count_data[my_transcript]

Efficiently fetch a numpy.ndarray of counts covering a whole chromosome:

>>> chrI_counts = count_data.get_chromosome("chrI")

Iterate over a BigWig file (this is unusual). Data are returned as tuples of (chromosome name, start coordinate, end coordinate, and the value over those coordinates):

>>> for chrom, my_start, my_end, value in count_data:
>>>     pass # do something interesting with those values
Attributes
chrom_sizes

DEPRECATED: Use .chroms instead of .chrom_sizes

chromids
chroms

Dictionary mapping chromosome names to lengths

filename

Name of BigWig or BigBed file

uncompress_buf_size

Size of buffer needed to uncompress blocks.

version

Version of BigWig or BigBed file format

Methods

get(self, roi, bool roi_order=True, ...)

Retrieve array of counts from a region of interest.

get_chromosome_counts(self, unicode chrom)

Retrieve values across an entire chromosome more efficiently than using bigwig_reader[chromosome_roi]

sum(self)

Return sum of data in BigWig file, calculating if necessary

get(self, roi, bool roi_order=True, double fill=numpy.nan)

Retrieve array of counts from a region of interest.

Parameters
roiGenomicSegment or SegmentChain

Region of interest in genome

roi_orderbool, optional

If True (default) return vector of values 5’ to 3’ relative to vector rather than genome.

filldouble, optional

Override fill value to put in positions with no data. (Default: Use value of self.fill)

Returns
numpy.ndarray

vector of numbers, each position corresponding to a position in roi, from 5’ to 3’ relative to roi

See also

plastid.genomics.roitools.SegmentChain.get_counts

Fetch a spliced vector of data covering a SegmentChain

get_chromosome_counts(self, unicode chrom)

Retrieve values across an entire chromosome more efficiently than using bigwig_reader[chromosome_roi]

Parameters
chromstr

Chromosome name

Returns
numpy.ndarray

Numpy array of float values covering entire chromosome chrom. If chom is not in BigWig file, returns a numpy scalar of 0.

sum(self)

Return sum of data in BigWig file, calculating if necessary

Returns
double

Sum of all values over all positions

chrom_sizes

DEPRECATED: Use .chroms instead of .chrom_sizes

chromids
chroms

Dictionary mapping chromosome names to lengths

filename

Name of BigWig or BigBed file

uncompress_buf_size

Size of buffer needed to uncompress blocks. If 0, the data is uncompressed

version

Version of BigWig or BigBed file format

plastid.readers.bigwig.BigWigIterator(BigWigReader reader, maxmem=0) BigWigIterator(reader, maxmem = 0)
plastid.readers.bigwig.BigWigIterator(reader, maxmem=0) None

Iterate over records in the BigWig file, sorted lexically by chromosome and position.

Parameters
readerBigWigReader

Reader to iterate over

maxmemfloat

Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)

Yields
tuple

(chrom name, start, end, value), where start & end are zero-indexed and half-open

Raises
MemoryError

If memory cannot be allocated