plastid.readers.bigbed module¶
BigBedReader
, a parser for BigBed files.
Summary¶
In contrast to BED, GTF2, and GFF3 files, BigBed files are binary, indexed, and randomly-accessible. This means:
BigBedReader
can be used to iterate over records, like a reader, or to fetch records that cover a region of interest, in the manner of aGenomeHash
BigBed use less memory, because their records don’t need to be loaded into memory to be parsed or accessed.
Indexes BigBed files can be searched for matching records
Module Contents¶
|
Reader for BigBed files. |
|
Iterate over records in the BigBed file, sorted lexically by chromosome and position. |
Examples¶
Iterate over all features in a BigBed file:
>>> my_reader = BigBedReader("some_file.bb", return_type=Transcript)
>>> for feature in my_reader:
>>> pass # do something with each Transcript
BigBed files can be accessed as dictionaries. To find features overlapping a region of interest:
>>> roi = GenomicSegment("chrI", 0, 100000, "+")
>>> overlapping_features = my_reader[roi]
>>> list(overlapping_features)
[ list of SegmentChains/Transcripts ]
Find features that match keyword(s) in a certain field:
>>> # which fields are indexed and searchable?
>>> my_reader.indexed_fields
['name', 'gene_id']
>>> # find all entries whose 'gene_id' matches 'nanos'
>>> list(bb.search('gene_id', 'nanos'))
[ list of matching SegmentChains/Transcripts ]
See also¶
- Kent2010
Description of BigBed and BigWig formats. Especially see supplemental data.
- UCSC file format FAQ
Descriptions of BED, GTF2, GFF3 and other text-based formats.
- class plastid.readers.bigbed.BigBedReader(filename, return_type=SegmentChain, add_three_for_stop=False, maxmem=0)¶
Bases:
plastid.readers.bbifile._BBI_Reader
Reader for BigBed files. This class is useful for both iteration over genomic features one-by-one (like a reader), as well as random access to genomic features that overlap a region of interest (like a
GenomeHash
).- Parameters
- filenamestr
Path to BigBed file
- return_type
SegmentChain
or subclass, optional Type of feature to return from assembled subfeatures (Default:
SegmentChain
)- add_three_for_stopbool, optional
Some annotation files exclude the stop codon from CDS annotations. If set to True, three nucleotides will be added to the threeprime end of each CDS annotation, UNLESS the annotated transcript contains explicit stop_codon feature. (Default: False)
- maxmemfloat, optional
Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)
Examples
Iterate over all features in a BigBed file:
>>> my_reader = BigBedReader("some_file.bb") >>> for feature in my_reader: >>> pass # do something with each feature
BigBed files can be accessed as dictionaries. To find features overlapping a region of interest:
>>> roi = GenomicSegment("chrI", 0, 100000, "+") >>> for feature in my_reader[roi]: >>> pass # do something with that feature
Find features overlapping a genomic region of interest roi, on either strand:
>>> for feature in my_reader.get(roi, stranded=False): >>> pass # do something with that feature
- Attributes
extension_fields
OrderedDictDictionary of names and types extra fields included in BigWig/BigBed file
- extension_typesOrderedDict
Dictionary mapping custom field names to objects that parse their types from strings
filename
strName of BigWig or BigBed file
num_records
intNumber of features in file
num_chroms
intNumber of chromosomes in the BigBed file
chroms
dictDictionary mapping chromosome names to lengths
return_type
class implementing afrom_bed()
method, or strReturn type of reader
Methods
get
(self, roi, bool stranded=True, ...)Iterate over features that share genomic positions with a region of interest
search
(self, field_name, *values)Search indexed fields in the BigBed file for records matching value See self.indexed_fields for names of indexed fields and self.extension_fields for descriptions of extension fields.
- get(self, roi, bool stranded=True, bool check_unique=True)¶
Iterate over features that share genomic positions with a region of interest
- Parameters
- roi
SegmentChain
orGenomicSegment
Query feature representing region of interest
- strandedbool, optional
If True, retrieve only features on same strand as query feature. Otherwise, retrieve features on both strands. (Default: True)
- check_unique: bool, optional
if True, assure that all results in generator are unique. (Default: True)
- roi
- Yields
- object
self.return_type of each record in the BigBed file
- Raises
- TypeError
if other is not a
GenomicSegment
orSegmentChain
- search(self, field_name, *values)¶
Search indexed fields in the BigBed file for records matching value See self.indexed_fields for names of indexed fields and self.extension_fields for descriptions of extension fields.
- Parameters
- field_namestr
Name of field to search
- *valuesone or more str
Value(s) to match. If multiple are given, records matching any value will be returned.
- Yields
- object
self.return_type of matching record in the BigBed file
- Raises
- IndexError
If field field_name is not indexed
Examples
Find all entries matching a given gene ID:
# open file >>> bb = BigBedFile("some_file.bb") # which fields are searchable? >>> bb.indexed_fields ['name', 'gene_id'] # find all entries whose 'gene_id' matches 'nanos' >>> bb.search('gene_id', 'nanos') [ list of matching segmentchains ] # find all entries whose 'gene_id' matches 'nanos' or 'oskar' >>> bb.search('gene_id', 'nanos', 'oskar') [ list of matching segmentchains ]
- chrom_sizes¶
DEPRECATED: Use .chroms instead of .chrom_sizes
- chromids¶
- chroms¶
Dictionary mapping chromosome names to lengths
- filename¶
Name of BigWig or BigBed file
- indexed_fields¶
Names of indexed fields in BigBed file. These are searchable by self.search
- num_records¶
Number of features in file
- return_type¶
Return type of reader
- uncompress_buf_size¶
Size of buffer needed to uncompress blocks. If 0, the data is uncompressed
- version¶
Version of BigWig or BigBed file format
- plastid.readers.bigbed.BigBedIterator(BigBedReader reader, maxmem=0) BigBedIterator(reader, maxmem = 0)¶
- plastid.readers.bigbed.BigBedIterator(reader, maxmem=0) None
Iterate over records in the BigBed file, sorted lexically by chromosome and position.
- Parameters
- reader
BigBedReader
Reader to iterate over
- maxmemfloat
Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)
- reader
- Yields
- object
reader.return_type of BED record
- Raises
- MemoryError
If memory cannot be allocated