plastid.readers.bigwig module¶

Reader for BigWig files, built atop Jim Kent’s utilities.

Summary
Module Contents
Examples
See also

Summary ¶

BigWig is a binary, indexed, high-performance format that associates quantitative values with genomic positions. Because BigWig files are indexed, they allow both sequential and random access to data, and require substantially less memory than unindexed wiggle or bedGraph files. Like wiggle and bedGraph formats, BigWig is an unstranded format, so data for plus and minus strands must be stored in separate files.

Module Contents ¶

`BigWigReader`(filename[, maxmem])	Reader providing random or sequential access to data stored in BigWig files.
`BigWigIterator`(BigWigIterator)	Iterate over records in the BigWig file, sorted lexically by chromosome and position.

Examples ¶

Fetch counts over a Transcript/SegmentChain or GenomicSegment:

>>> count_data = BigWigReader("some_file.bw")

# segment covering positions 5-2000 on chrI
>>> segment_counts = count_data[GenomicSegment("chrI", 50, 2000, "+")]

# a transcript- normally this would come from a BED_Reader or similar
>>> my_transcript = Transcript(GenomicSegment("chrV", 5000, 5500, "+"),
>>>                            GenomicSegment("chrV", 8000, 9000, "+"),
>>>                            ID='some_transcript')
>>> transcript_counts = count_data[my_transcript]
>>> transcript_counts
array([ 36, 171,  65,  95, 109, 178, 211,  71,  25, 208,  20, 189, 205,
       182, 102, 159, 154, 148,  15,  65, 237, 104, 211, 162,  22,   4,
       254,  85,  53, 160,  58,  74, 199,  85, 205, 242, 162,  23, 246,
       ...
       (rest of output omitted) ])

Efficiently fetch a numpy.ndarray of counts covering a whole chromosome:

>>> chrI_counts = count_data.get_chromosome("chrI")
>>> chrI_counts
[ numpy array of counts covering chromosome chrI ]

Iterate over a BigWig file (this is unusual). Data are returned as tuples of (chromosome name, start coordinate, end coordinate, and the value over those coordinates):

>>> for chrom, my_start, my_end, value in count_data:
>>>     pass # do something interesting with those values

See also ¶

BigWigGenomeArray: A GenomeArray for BigWig files.
Kent2010: Description of BigBed and BigWig formats. Especially see supplemental data.
Source repository for Kent utilities: The header files are particularly useful.

class plastid.readers.bigwig.BigWigReader(filename, maxmem=0)¶

Bases: plastid.readers.bbifile._BBI_Reader

Reader providing random or sequential access to data stored in BigWig files.

Parameters

filenamestr: Name of bigwig file
maxmemfloat: Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)

Examples

Fetch counts over a Transcript/SegmentChain or GenomicSegment:

>>> count_data = BigWigReader("some_file.bw")

# segment covering positions 5-2000 on chrI
>>> segment_counts = count_data[GenomicSegment("chrI", 50, 2000, "+")]

# a transcript- normally this would come from a BED_Reader or similar
>>> my_transcript = Transcript(GenomicSegment("chrV", 5000, 5500, "+"),
>>>                            GenomicSegment("chrV", 8000, 9000, "+"),
>>>                            ID='some_transcript')
>>> transcript_counts = count_data[my_transcript]

Efficiently fetch a numpy.ndarray of counts covering a whole chromosome:

>>> chrI_counts = count_data.get_chromosome("chrI")

Iterate over a BigWig file (this is unusual). Data are returned as tuples of (chromosome name, start coordinate, end coordinate, and the value over those coordinates):

>>> for chrom, my_start, my_end, value in count_data:
>>>     pass # do something interesting with those values

Attributes

chrom_sizes: DEPRECATED: Use .chroms instead of .chrom_sizes
chromids
chroms: Dictionary mapping chromosome names to lengths
filename: Name of BigWig or BigBed file
uncompress_buf_size: Size of buffer needed to uncompress blocks.
version: Version of BigWig or BigBed file format

Methods

`get`(self, roi, bool roi_order=True, ...)	Retrieve array of counts from a region of interest.
`get_chromosome_counts`(self, unicode chrom)	Retrieve values across an entire chromosome more efficiently than using `bigwig_reader[chromosome_roi]`
`sum`(self)	Return sum of data in BigWig file, calculating if necessary

get(self, roi, bool roi_order=True, double fill=numpy.nan)¶

Retrieve array of counts from a region of interest.

Parameters

roiGenomicSegment or SegmentChain: Region of interest in genome
roi_orderbool, optional: If True (default) return vector of values 5’ to 3’ relative to vector rather than genome.
filldouble, optional: Override fill value to put in positions with no data. (Default: Use value of self.fill)

Returns

numpy.ndarray: vector of numbers, each position corresponding to a position in roi, from 5’ to 3’ relative to roi

plastid.readers.bigwig module¶

Summary¶

Module Contents¶

Examples¶

See also¶

Summary ¶

Module Contents ¶

Examples ¶

See also ¶