plastid.readers.bigwig module¶
Reader for BigWig files, built atop Jim Kent’s utilities.
Summary¶
BigWig is a binary, indexed, high-performance format that associates quantitative values with genomic positions. Because BigWig files are indexed, they allow both sequential and random access to data, and require substantially less memory than unindexed wiggle or bedGraph files. Like wiggle and bedGraph formats, BigWig is an unstranded format, so data for plus and minus strands must be stored in separate files.
Module Contents¶
|
Reader providing random or sequential access to data stored in BigWig files. |
|
Iterate over records in the BigWig file, sorted lexically by chromosome and position. |
Examples¶
Fetch counts over a Transcript
/SegmentChain
or GenomicSegment
:
>>> count_data = BigWigReader("some_file.bw")
# segment covering positions 5-2000 on chrI
>>> segment_counts = count_data[GenomicSegment("chrI", 50, 2000, "+")]
# a transcript- normally this would come from a BED_Reader or similar
>>> my_transcript = Transcript(GenomicSegment("chrV", 5000, 5500, "+"),
>>> GenomicSegment("chrV", 8000, 9000, "+"),
>>> ID='some_transcript')
>>> transcript_counts = count_data[my_transcript]
>>> transcript_counts
array([ 36, 171, 65, 95, 109, 178, 211, 71, 25, 208, 20, 189, 205,
182, 102, 159, 154, 148, 15, 65, 237, 104, 211, 162, 22, 4,
254, 85, 53, 160, 58, 74, 199, 85, 205, 242, 162, 23, 246,
...
(rest of output omitted) ])
Efficiently fetch a numpy.ndarray
of counts covering a whole
chromosome:
>>> chrI_counts = count_data.get_chromosome("chrI")
>>> chrI_counts
[ numpy array of counts covering chromosome chrI ]
Iterate over a BigWig file (this is unusual). Data are returned as tuples of (chromosome name, start coordinate, end coordinate, and the value over those coordinates):
>>> for chrom, my_start, my_end, value in count_data:
>>> pass # do something interesting with those values
See also¶
BigWigGenomeArray
A GenomeArray for BigWig files.
- Kent2010
Description of BigBed and BigWig formats. Especially see supplemental data.
- Source repository for Kent utilities
The header files are particularly useful.
- class plastid.readers.bigwig.BigWigReader(filename, maxmem=0)¶
Bases:
plastid.readers.bbifile._BBI_Reader
Reader providing random or sequential access to data stored in BigWig files.
- Parameters
- filenamestr
Name of bigwig file
- maxmemfloat
Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)
Examples
Fetch counts over a
Transcript
/SegmentChain
orGenomicSegment
:>>> count_data = BigWigReader("some_file.bw") # segment covering positions 5-2000 on chrI >>> segment_counts = count_data[GenomicSegment("chrI", 50, 2000, "+")] # a transcript- normally this would come from a BED_Reader or similar >>> my_transcript = Transcript(GenomicSegment("chrV", 5000, 5500, "+"), >>> GenomicSegment("chrV", 8000, 9000, "+"), >>> ID='some_transcript') >>> transcript_counts = count_data[my_transcript]
Efficiently fetch a
numpy.ndarray
of counts covering a whole chromosome:>>> chrI_counts = count_data.get_chromosome("chrI")
Iterate over a BigWig file (this is unusual). Data are returned as tuples of (chromosome name, start coordinate, end coordinate, and the value over those coordinates):
>>> for chrom, my_start, my_end, value in count_data: >>> pass # do something interesting with those values
- Attributes
chrom_sizes
DEPRECATED: Use .chroms instead of .chrom_sizes
- chromids
chroms
Dictionary mapping chromosome names to lengths
filename
Name of BigWig or BigBed file
uncompress_buf_size
Size of buffer needed to uncompress blocks.
version
Version of BigWig or BigBed file format
Methods
get
(self, roi, bool roi_order=True, ...)Retrieve array of counts from a region of interest.
get_chromosome_counts
(self, unicode chrom)Retrieve values across an entire chromosome more efficiently than using
bigwig_reader[chromosome_roi]
sum
(self)Return sum of data in BigWig file, calculating if necessary
- get(self, roi, bool roi_order=True, double fill=numpy.nan)¶
Retrieve array of counts from a region of interest.
- Parameters
- roi
GenomicSegment
orSegmentChain
Region of interest in genome
- roi_orderbool, optional
If True (default) return vector of values 5’ to 3’ relative to vector rather than genome.
- filldouble, optional
Override fill value to put in positions with no data. (Default: Use value of self.fill)
- roi
- Returns
numpy.ndarray
vector of numbers, each position corresponding to a position in roi, from 5’ to 3’ relative to roi
See also
plastid.genomics.roitools.SegmentChain.get_counts
Fetch a spliced vector of data covering a
SegmentChain
- get_chromosome_counts(self, unicode chrom)¶
Retrieve values across an entire chromosome more efficiently than using
bigwig_reader[chromosome_roi]
- Parameters
- chromstr
Chromosome name
- Returns
numpy.ndarray
Numpy array of float values covering entire chromosome chrom. If chom is not in BigWig file, returns a numpy scalar of 0.
- sum(self)¶
Return sum of data in BigWig file, calculating if necessary
- Returns
- double
Sum of all values over all positions
- chrom_sizes¶
DEPRECATED: Use .chroms instead of .chrom_sizes
- chromids¶
- chroms¶
Dictionary mapping chromosome names to lengths
- filename¶
Name of BigWig or BigBed file
- uncompress_buf_size¶
Size of buffer needed to uncompress blocks. If 0, the data is uncompressed
- version¶
Version of BigWig or BigBed file format
- plastid.readers.bigwig.BigWigIterator(BigWigReader reader, maxmem=0) BigWigIterator(reader, maxmem = 0)¶
- plastid.readers.bigwig.BigWigIterator(reader, maxmem=0) None
Iterate over records in the BigWig file, sorted lexically by chromosome and position.
- Parameters
- reader
BigWigReader
Reader to iterate over
- maxmemfloat
Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)
- reader
- Yields
- tuple
(chrom name, start, end, value), where start & end are zero-indexed and half-open
- Raises
- MemoryError
If memory cannot be allocated