plastid.readers.bigbed module¶

BigBedReader, a parser for BigBed files.

Summary
Module Contents
Examples
See also

Summary ¶

In contrast to BED, GTF2, and GFF3 files, BigBed files are binary, indexed, and randomly-accessible. This means:

BigBedReader can be used to iterate over records, like a reader, or to fetch records that cover a region of interest, in the manner of a GenomeHash

BigBed use less memory, because their records don’t need to be loaded into memory to be parsed or accessed.

Indexes BigBed files can be searched for matching records

Module Contents ¶

`BigBedReader`(filename[, return_type, ...])	Reader for BigBed files.
`BigBedIterator`(BigBedIterator)	Iterate over records in the BigBed file, sorted lexically by chromosome and position.

Examples ¶

Iterate over all features in a BigBed file:

>>> my_reader = BigBedReader("some_file.bb", return_type=Transcript)
>>> for feature in my_reader:
>>>    pass # do something with each Transcript

BigBed files can be accessed as dictionaries. To find features overlapping a region of interest:

>>> roi = GenomicSegment("chrI", 0, 100000, "+")
>>> overlapping_features = my_reader[roi]
>>> list(overlapping_features)
[ list of SegmentChains/Transcripts ]

Find features that match keyword(s) in a certain field:

>>> # which fields are indexed and searchable?
>>> my_reader.indexed_fields
['name', 'gene_id']

>>> # find all entries whose 'gene_id' matches 'nanos'
>>> list(bb.search('gene_id', 'nanos'))
[ list of matching SegmentChains/Transcripts ]

See also ¶

Kent2010: Description of BigBed and BigWig formats. Especially see supplemental data.
UCSC file format FAQ: Descriptions of BED, GTF2, GFF3 and other text-based formats.

class plastid.readers.bigbed.BigBedReader(filename, return_type=SegmentChain, add_three_for_stop=False, maxmem=0)¶

Bases: plastid.readers.bbifile._BBI_Reader

Reader for BigBed files. This class is useful for both iteration over genomic features one-by-one (like a reader), as well as random access to genomic features that overlap a region of interest (like a GenomeHash).

Parameters

filenamestr: Path to BigBed file
return_typeSegmentChain or subclass, optional: Type of feature to return from assembled subfeatures (Default: SegmentChain)
add_three_for_stopbool, optional: Some annotation files exclude the stop codon from CDS annotations. If set to True, three nucleotides will be added to the threeprime end of each CDS annotation, UNLESS the annotated transcript contains explicit stop_codon feature. (Default: False)
maxmemfloat, optional: Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)

Examples

Iterate over all features in a BigBed file:

>>> my_reader = BigBedReader("some_file.bb")
>>> for feature in my_reader:
>>>    pass # do something with each feature

BigBed files can be accessed as dictionaries. To find features overlapping a region of interest:

>>> roi = GenomicSegment("chrI", 0, 100000, "+")
>>> for feature in my_reader[roi]:
>>>     pass # do something with that feature

Find features overlapping a genomic region of interest roi, on either strand:

>>> for feature in my_reader.get(roi, stranded=False):
>>>     pass # do something with that feature

Attributes

extension_fieldsOrderedDict: Dictionary of names and types extra fields included in BigWig/BigBed file
extension_typesOrderedDict: Dictionary mapping custom field names to objects that parse their types from strings
filenamestr: Name of BigWig or BigBed file
num_recordsint: Number of features in file
num_chromsint: Number of chromosomes in the BigBed file
chromsdict: Dictionary mapping chromosome names to lengths
return_typeclass implementing a from_bed() method, or str: Return type of reader

Methods

`get`(self, roi, bool stranded=True, ...)	Iterate over features that share genomic positions with a region of interest
`search`(self, field_name, *values)	Search indexed fields in the BigBed file for records matching value See self.indexed_fields for names of indexed fields and self.extension_fields for descriptions of extension fields.

get(self, roi, bool stranded=True, bool check_unique=True)¶

Iterate over features that share genomic positions with a region of interest

Parameters

roiSegmentChain or GenomicSegment: Query feature representing region of interest
strandedbool, optional: If True, retrieve only features on same strand as query feature. Otherwise, retrieve features on both strands. (Default: True)
check_unique: bool, optional: if True, assure that all results in generator are unique. (Default: True)

Yields

object: self.return_type of each record in the BigBed file

Raises

TypeError: if other is not a GenomicSegment or SegmentChain

search(self, field_name, *values)¶

Search indexed fields in the BigBed file for records matching value See self.indexed_fields for names of indexed fields and self.extension_fields for descriptions of extension fields.

Parameters

field_namestr: Name of field to search
*valuesone or more str: Value(s) to match. If multiple are given, records matching any value will be returned.

Yields

object: self.return_type of matching record in the BigBed file

Raises

IndexError: If field field_name is not indexed

Examples

Find all entries matching a given gene ID:

# open file
>>> bb = BigBedFile("some_file.bb")

# which fields are searchable?
>>> bb.indexed_fields
['name', 'gene_id']

# find all entries whose 'gene_id' matches 'nanos'
>>> bb.search('gene_id', 'nanos')
[ list of matching segmentchains ]

# find all entries whose 'gene_id' matches 'nanos' or 'oskar'
>>> bb.search('gene_id', 'nanos', 'oskar')
[ list of matching segmentchains ]

bed_fields¶: Number of standard BED format columns included in file

chrom_sizes¶: DEPRECATED: Use .chroms instead of .chrom_sizes

chromids¶

chroms¶: Dictionary mapping chromosome names to lengths

extension_fields¶: Dictionary of names and types extra fields included in BigWig/BigBed file

filename¶: Name of BigWig or BigBed file

indexed_fields¶: Names of indexed fields in BigBed file. These are searchable by self.search

num_chroms¶: Number of chromosomes in the BigBed file

num_records¶: Number of features in file

return_type¶: Return type of reader

uncompress_buf_size¶: Size of buffer needed to uncompress blocks. If 0, the data is uncompressed

version¶: Version of BigWig or BigBed file format

plastid.readers.bigbed.BigBedIterator(BigBedReader reader, maxmem=0) BigBedIterator(reader, maxmem = 0)¶

plastid.readers.bigbed.BigBedIterator(reader, maxmem=0) → None

Iterate over records in the BigBed file, sorted lexically by chromosome and position.

Parameters

readerBigBedReader: Reader to iterate over
maxmemfloat: Maximum desired memory footprint for C objects, in megabytes. May be temporarily exceeded if large queries are requested. Does not include memory footprint of Python objects. (Default: 0, no limit)

Yields

object: reader.return_type of BED record

Raises

MemoryError: If memory cannot be allocated

plastid.readers.bigbed module¶

Summary¶

Module Contents¶

Examples¶

See also¶

Summary ¶

Module Contents ¶

Examples ¶

See also ¶