plastid.readers.bigwig module

Reader for BigWig files, built atop Jim Kent’s utilities.

Summary

BigWig is a binary, indexed, high-performance format that associates quantitative values with genomic positions. Because BigWig files are indexed, they allow both sequential and random access to data, and require substantially less memory than unindexed wiggle or bedGraph files. Like wiggle and bedGraph formats, BigWig is an unstranded format, so data for plus and minus strands must be stored in separate files.

Module Contents

BigWigReader Reader providing random or sequential access to data stored in BigWig files.
BigWigIterator(BigWigReader reader[, maxmem]) Iterate over records in the BigWig file, sorted lexically by chromosome and position.

Examples

Fetch counts over a Transcript/SegmentChain or GenomicSegment:

>>> count_data = BigWigReader("some_file.bw")

# segment covering positions 5-2000 on chrI
>>> segment_counts = count_data[GenomicSegment("chrI",50,2000,"+")] 

# a transcript- normally this would come from a BED_Reader or similar
>>> my_transcript = Transcript(GenomicSegment("chrV",5000,5500,"+"),
>>>                            GenomicSegment("chrV",8000,9000,"+"),
>>>                            ID='some_transcript')
>>> transcript_counts = count_data[my_transcript]
>>> transcript_counts
array([ 36, 171,  65,  95, 109, 178, 211,  71,  25, 208,  20, 189, 205,
       182, 102, 159, 154, 148,  15,  65, 237, 104, 211, 162,  22,   4,
       254,  85,  53, 160,  58,  74, 199,  85, 205, 242, 162,  23, 246,
       ...
       (rest of output omitted) ])

Efficiently fetch a numpy.ndarray of counts covering a whole chromosome:

>>> chrI_counts = count_data.get_chromosome("chrI")
>>> chrI_counts
[ numpy array of counts covering chromosome chrI ]

Iterate over a BigWig file (this is unusual). Data are returned as tuples of (chromosome name, start coordinate, end coordinate, and the value over those coordinates):

>>> for chrom, my_start, my_end, value in count_data:
>>>     pass # do something interesting with those values

See also

BigWigGenomeArray
A GenomeArray for BigWig files.
Kent2010
Description of BigBed and BigWig formats. Especially see supplemental data.
Source repository for Kent utilities
The header files are particularly useful.
class plastid.readers.bigwig.BigWigReader

Bases: plastid.readers.bbifile._BBI_Reader

Reader providing random or sequential access to data stored in BigWig files.

Parameters:
filename : str

Name of bigwig file

maxmem : float

Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)

Examples

Fetch counts over a Transcript/SegmentChain or GenomicSegment:

>>> count_data = BigWigReader("some_file.bw")

# segment covering positions 5-2000 on chrI
>>> segment_counts = count_data[GenomicSegment("chrI",50,2000,"+")] 

# a transcript- normally this would come from a BED_Reader or similar
>>> my_transcript = Transcript(GenomicSegment("chrV",5000,5500,"+"),
>>>                            GenomicSegment("chrV",8000,9000,"+"),
>>>                            ID='some_transcript')
>>> transcript_counts = count_data[my_transcript]

Efficiently fetch a numpy.ndarray of counts covering a whole chromosome:

>>> chrI_counts = count_data.get_chromosome("chrI")

Iterate over a BigWig file (this is unusual). Data are returned as tuples of (chromosome name, start coordinate, end coordinate, and the value over those coordinates):

>>> for chrom, my_start, my_end, value in count_data:
>>>     pass # do something interesting with those values
Attributes:
chrom_sizes

DEPRECATED: Use .chroms instead of .chrom_sizes

chromids
chroms

Dictionary mapping chromosome names to lengths

filename

Name of BigWig or BigBed file

uncompress_buf_size

Size of buffer needed to uncompress blocks.

version

Version of BigWig or BigBed file format

Methods

get(self, roi, bool roi_order=True, …) Retrieve array of counts from a region of interest.
get_chromosome_counts(self, str chrom) Retrieve values across an entire chromosome more efficiently than using bigwig_reader[chromosome_roi]
sum(self) Return sum of data in BigWig file, calculating if necessary
get(self, roi, bool roi_order=True, double fill=numpy.nan)

Retrieve array of counts from a region of interest.

Parameters:
roi : GenomicSegment or SegmentChain

Region of interest in genome

roi_order : bool, optional

If True (default) return vector of values 5’ to 3’ relative to vector rather than genome.

fill : double, optional

Override fill value to put in positions with no data. (Default: Use value of self.fill)

Returns:
:class:`numpy.ndarray`

vector of numbers, each position corresponding to a position in roi, from 5’ to 3’ relative to roi

See also

plastid.genomics.roitools.SegmentChain.get_counts
Fetch a spliced vector of data covering a SegmentChain
get_chromosome_counts(self, str chrom)

Retrieve values across an entire chromosome more efficiently than using bigwig_reader[chromosome_roi]

Parameters:
chrom : str

Chromosome name

Returns:
:class:`numpy.ndarray`

Numpy array of float values covering entire chromosome chrom. If chom is not in BigWig file, returns a numpy scalar of 0.

sum(self)

Return sum of data in BigWig file, calculating if necessary

Returns:
double

Sum of all values over all positions

chrom_sizes

DEPRECATED: Use .chroms instead of .chrom_sizes

chromids
chroms

Dictionary mapping chromosome names to lengths

filename

Name of BigWig or BigBed file

uncompress_buf_size

Size of buffer needed to uncompress blocks. If 0, the data is uncompressed

version

Version of BigWig or BigBed file format

plastid.readers.bigwig.BigWigIterator(BigWigReader reader, maxmem=0) BigWigIterator(reader, maxmem = 0)

Iterate over records in the BigWig file, sorted lexically by chromosome and position.

Parameters:
reader : BigWigReader

Reader to iterate over

maxmem : float

Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)

Yields:
tuple

(chrom name, start, end, value), where start & end are zero-indexed and half-open

Raises:
MemoryError

If memory cannot be allocated