plastid.readers.gff module¶

Tools for reading, writing, analyzing, and manipulating GFF file subtypes (e.g. GTF2 and GFF3).

Summary
Module contents
Examples
See Also

Summary ¶

Because GTF2/GFF3 files are hierarchically structured – i.e. a complex feature can be assembled from several component features; each component feature having its own record on its own line – two interfaces for reading GTF2/GFF3 files are included:

Assembly of transcripts from exon, CDS, & UTR annotations
GTF2_TranscriptAssembler and GFF3_TranscriptAssembler collect individual exon and CDS features, and assemble these into Transcripts.

Features are read from GTF2/GFF3 files, grouped by transcript_id, Parent, or ID attributes, depending on file type. Assembled Transcripts are yielded only when their component features have fully been collected.

Low-level parsing of simple features
GTF2_Reader and GFF3_Reader read raw features (such as individual exons, stop codons, SNPs, et c) from GTF2/GFF3 files. Each line is returned as a SegmentChain.

Module contents ¶

`GTF2_Reader`(*streams[, end_included, ...])	Read raw features in GTF2 files as `SegmentChains`.
`GTF2_TranscriptAssembler`(*streams[, ...])	Assemble `Transcripts` from raw features in GTF2 format.
`GFF3_Reader`(*streams[, end_included, ...])	Read raw features in GFF3 files as `SegmentChains`.
`GFF3_TranscriptAssembler`(*streams[, ...])	Assemble `Transcripts` from raw features in GFF3 format.

Examples ¶

GTF2_Reader and GFF3_Reader return raw, unmodified features from GTF2 or GFF3 files – e.g. exons, coding regions, stop codons – without assembling them into transcripts:

>>> feature_reader = GTF2_Reader("some_file.gtf")
>>> for feature in reader:
>>>     print(feature.get_name(),feature.attr["type"],str(feature))
('YAL030W_mRNA',  'exon',        'chrI:87262-87387(+)')
('YAL030W_mRNA',  'exon',        'chrI:87500-87857(+)')
('YAL030W_mRNA',  'CDS',         'chrI:87285-87387(+)')
('YAL030W_mRNA',  'CDS',         'chrI:87500-87749(+)')
('YAL030W_mRNA',  'start_codon', 'chrI:87285-87288(+)')
('YAL030W_mRNA',  'stop_codon',  'chrI:87749-87752(+)')
('YBL092W_mRNA',  'exon',        'chrII:45643-45644(+)')
('YBL092W_mRNA',  'exon',        'chrII:45977-46440(+)')
('YBL092W_mRNA',  'CDS',         'chrII:45977-46367(+)')
('YBL092W_mRNA',  'start_codon', 'chrII:45977-45980(+)')
[rest of output omitted]

In contrast, GTF2_TranscriptAssembler and GFF3_TranscriptAssembler reconstruct transcripts from their components, based upon their transcript_id, ID, or Parent attributes. Note how all features are of type mRNA, and how some contain multiple exons (coordinates separated by ‘^’):

>>> transcript_reader = GTF2_TranscriptAssembler("some_file.gtf")
>>> for transcript in reader:
>>>     print(transcript.get_name(),transcript.attr["type"],str(transcript))
('YAL030W_mRNA',   'mRNA',  'chrI:87262-87387^87500-87857(+)')
('YBL092W_mRNA',   'mRNA',  'chrII:45643-45644^45977-46440(+)')
('YBL057C_mRNA',   'mRNA',  'chrII:112749-113427^113444-113450(-)')
('YBL040C_mRNA',   'mRNA',  'chrII:142033-142749^142846-142891(-)')
('YBL018C_mRNA',   'mRNA',  'chrII:185961-186352^186427-186504(-)')
('YBR012W-B',      'mRNA',  'chrII:259868-261173^261174-265140(+)')
('YBR044C_mRNA',   'mRNA',  'chrII:324292-324336^324340-326127(-)')
('YBR082C_mRNA',   'mRNA',  'chrII:406506-407027^407122-407379(-)')
('YBR126W-B_mRNA', 'mRNA',  'chrII:490824-491202(+)')
('YBR138C_mRNA',   'mRNA',  'chrII:513636-515391(-)')
[rest of output omitted]

See Also ¶

GFF3 specification: GFF3 specification by the Sequence Ontology consortium
GTF2.2 specification: Hosted by the Brent lab
UCSC file format FAQ.: GFF & GTF descriptions at UCSC

class plastid.readers.gff.GFF3_Reader(*streams, end_included=True, return_stopfeatures=False, is_sorted=False, tabix=False)[source]¶

Bases: plastid.readers.gff.AbstractGFF_Reader

Read raw features in GFF3 files as SegmentChains.

Users who wish to reconstruct Transcripts from raw features should instead use GFF3_TranscriptAssembler, which performs this task automatically.

Assumes input stream to use 1-indexed coordinates, in compliance with the Sequence Ontology GFF3 specification.

GFF3 attributes (from column 9) for each record are stored in its attr dictionary. Names and values of attributes are unescaped. The values for the attributes Parent, Alias, Dbxref, dbxref, and Note, if present, are lists rather than strings, because the GFF3 spec enables these to have multiple values.

Parameters

*streamsone or more str or file-like: One or more input streams or filenames pointing to GFF information
end_includedbool, optional: Boolean, whether the end coordinate is included in the feature (closed or ‘end-included’ intervals) or not (half-open intervals). All coordinates will be normalized to 0-indexed, half-open (Default: True)
return_stopfeaturesbool, optional: If True, return a special SegmentChain called StopFeature signifying that all previously emitted GFF entries may be assembled into complete entities. These are emitted when the line “###” is encountered in a GFF3. (Default: False)
is_sortedbool, optional: If True and return_stopfeatures is True, assume the GFF3 is sorted. The reader will return StopFeature when the chromosome name of a given feature differs from that of the previous feature. (Default: False)
tabixboolean, optional: streams point to tabix-compressed files or are open tabix_file_iterator (Default: False)

Examples

Read raw features from a GFF3 file:

>>> feature_reader = GFF3_Reader(open("./some_file.gff"))
>>> for feature in feature_reader:
>>>     print(feature.get_name(), feature.attr["type"], str(feature))
('chrI', 'chromosome', 'chrI:0-230218(.)')
('TEL01L-TR', 'telomeric_repeat', 'chrI:0-62(-)')
('TEL01L', 'telomere', 'chrI:0-801(-)')
('TEL01L-XR', 'X_element_combinatorial_repeat', 'chrI:62-336(-)')
('YAL069W', 'gene', 'chrI:334-649(+)')
('TEL01L-XC', 'X_element', 'chrI:336-801(-)')
('TEL01L-XC_nucleotide_match', 'nucleotide_match', 'chrI:752-763(-)')
('TEL01L-XC_binding_site', 'binding_site', 'chrI:531-544(-)')
('YAL068W-A', 'gene', 'chrI:537-792(+)')
('ARS102', 'ARS', 'chrI:649-1791(.)')
[rest of output omitted]

Attributes

metadatadict: Dictionary of metadata found in file headers

Methods

`close`()	Close stream
`fileno`()	Returns underlying file descriptor if one exists.
`filter`(line)	Parses lines of the GFF stream into `SegmentChain` When metadata is found, temporarily delegates processing to `_parse_metatokens()`, and then reads the next genomic feature
`flush`(/)	Flush write buffers, if applicable.
`isatty`()	Return whether this is an 'interactive' stream.
`read`()	Similar to `file.read()`.
`readable`()	Return whether object was opened for reading.
`readline`()	Process a single line of data, assuming it is string-like `next(self)` is more likely to behave as expected.
`readlines`()	Similar to `file.readlines()`.
`seek`	Change stream position.
`seekable`()	Return whether object supports random access.
`tell`(/)	Return current stream position.
`truncate`	Truncate file to size bytes.
`writable`()	Return whether object was opened for writing.
`writelines`(lines, /)	Write a list of lines to stream.

next

close()¶: Close stream

fileno()¶

Returns underlying file descriptor if one exists.

OSError is raised if the IO object does not use a file descriptor.

filter(line)¶

Parses lines of the GFF stream into SegmentChain When metadata is found, temporarily delegates processing to _parse_metatokens(), and then reads the next genomic feature

Parameters

line: Next line from GFF stream

Returns

SegmentChain: Next feature in file

flush(/)¶

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

isatty()¶

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

next()¶

read()¶

Similar to file.read(). Process all units of data, assuming it is string-like

Returns

str

readable()¶

Return whether object was opened for reading.

If False, read() will raise OSError.

readline()¶

Process a single line of data, assuming it is string-like next(self) is more likely to behave as expected.

Returns

object: a unit of processed data

readlines()¶

Similar to file.readlines().

Returns

list: processed data

seek()¶

Change stream position.

Change the stream position to the given byte offset. The offset is interpreted relative to the position indicated by whence. Values for whence are:

0 – start of stream (the default); offset should be zero or positive
1 – current stream position; offset may be negative
2 – end of stream; offset is usually negative

Return the new absolute position.

seekable()¶

Return whether object supports random access.

If False, seek(), tell() and truncate() will raise OSError. This method may need to do a test seek().

tell(/)¶: Return current stream position.

truncate()¶

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writable()¶

Return whether object was opened for writing.

If False, write() will raise OSError.

writelines(lines, /)¶

Write a list of lines to stream.

Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

closed¶

class plastid.readers.gff.GFF3_TranscriptAssembler(*streams, is_sorted=False, return_type=SegmentChain, add_three_for_stop=False, printer=None, tabix=False)[source]¶

Bases: plastid.readers.gff.AbstractGFF_Assembler

Assemble Transcripts from raw features in GFF3 format.

Within a chromosome, transcripts are returned in lexical order. Features that do not constitute portions of transcripts (e.g. origins of replication) are ignored. For access to those, read raw features using GFF3_Reader.

Parameters

streamsone or more str or file-like: One or more input streams or filenames pointing to GFF3 data
is_sortedbool, optional: GFF3 is sorted by chromosome name, allowing some memory savings (Default: False)
return_typeSegmentChain or subclass, optional: Type of feature to return from assembled subfeatures (Default: SegmentChain)
add_three_for_stopbool, optional: Some annotation files exclude the stop codon from CDS annotations. If set to True, three nucleotides will be added to the threeprime end of each CDS annotation. (Default: False)
transcript_typeslist, optional: List of GFF3 feature types that should be considered as transcripts (Default: as specified in SO 2.5.3 )
exon_typeslist, optional: List of GFF3 feature types that should be considered as exons or contributing to transcript nucleotide positions during transcript assembly (Default: as specified in SO 2.5.3 )
cds_typeslist, optional: List of GFF3 feature types that should be considered as CDS or contributing to transcript coding regions during transcript assembly (Default: as specified in SO 2.5.3 )
printerfile-like, optional: Logger implementing a write() method. Default: NullWriter
tabixboolean, optional: streams point to tabix-compressed files or are open tabix_file_iterator (Default: False)

Notes

GFF3 schemas vary

GFF3 files can have many different schemas of hierarchy. We deal with that here by allowing users to supply transcript_types and exon_types, to indicate which sorts of features should be included. By default, we use a subset of the schema set out in Seqence Ontology 2.5.3

Briefly:

1. The GFF3 file is combed for transcripts of the types specified by transcript_types, exons specified by exon_types, and CDS specified by types listed in cds_types.

Exons and CDS are matched with their parent transcripts by matching the Parent attributes of CDS and exons to the ID of transcripts. Transcripts are then constructed from those intervals, and coding regions set accordingly.

If exons and/or CDS features point to a Parent that is not in transcript_types, they are grouped into a new transcript, whose ID is set to the value of their shared Parent. However, this value for Parent might refer to a gene rather than a transcript; unfortunately this cannot be known without other information. Attributes that are common to all CDS and exon features are bubbled up to the transcript.

If exons and/or CDS features have no Parent, but share a common ID, they are grouped by ID into a single transcript. Attributes common to all CDS and exon features are bubbled up to the transcript. The Parent attribute is left unset.

If a transcript feature is annotated but has no child CDS or exons, the transcript is assumed to be non-coding and is assembled from any transcript-type features that share its ID attribute.

Identity relationships between elements vary between GFF3 files

Different GFF3 files specify discontiguous features differently. For example, in Flybase, different exons of a transcript will have unique IDs, but will share the same ‘Parent’ attribute in column 9 of the GFF. In Wormbase, however, different exons of the same transcript will share the same ID. Here, we first check for the Flybase style (by Parent), then fall back to Wormbase style (by shared ID).

Transcript assembly

To save memory, transcripts are assembled lazily as follows:

If there exist assembled transcripts in self._transript_cache, return the next transcript. Transcripts in the cache are stored lexically.
Otherwise, collect features from the GFF3 stream until either a ‘###’ line or EOF is encountered. Then, assemble transcripts and store them in self._transcript_cache. Delete unused features from memory. If the GFF3 is sorted, then a change in chromosome name will also trigger assembly of collected features.

Examples

Assemble transcripts from a GFF3 file:

>>> transcript_reader = GFF3_TranscriptAssembler(open("some_file.gff"))
>>> for transcript in reader:
>>>     print(transcript.get_name(),transcript.attr["type"],str(transcript)) # do something

('YAL030W_mRNA',   'mRNA',  'chrI:87262-87387^87500-87857(+)')
('YBL092W_mRNA',   'mRNA',  'chrII:45643-45644^45977-46440(+)')
('YBL057C_mRNA',   'mRNA',  'chrII:112749-113427^113444-113450(-)')
('YBL040C_mRNA',   'mRNA',  'chrII:142033-142749^142846-142891(-)')
('YBL018C_mRNA',   'mRNA',  'chrII:185961-186352^186427-186504(-)')
('YBR012W-B',      'mRNA',  'chrII:259868-261173^261174-265140(+)')
('YBR044C_mRNA',   'mRNA',  'chrII:324292-324336^324340-326127(-)')
('YBR082C_mRNA',   'mRNA',  'chrII:406506-407027^407122-407379(-)')
('YBR126W-B_mRNA', 'mRNA',  'chrII:490824-491202(+)')
('YBR138C_mRNA',   'mRNA',  'chrII:513636-515391(-)')
[rest of output omitted]

Attributes

streamsfile-like: Input stream, usually constructed from or more open filehandles
metadatadict: Various attributes gleaned from the stream, if any
counterint: Cumulative line number counter over all streams
printerfile-like, optional: Logger implementing a write() method.
rejectedlist: A list of transcript IDs from transcripts that failed to assemble properly

Methods

`close`()	Close stream
`fileno`()	Returns underlying file descriptor if one exists.
`filter`(data)	Return next assembled feature from self.stream
`flush`(/)	Flush write buffers, if applicable.
`isatty`()	Return whether this is an 'interactive' stream.
`read`()	Similar to `file.read()`.
`readable`()	Return whether object was opened for reading.
`readline`()	Process a single line of data, assuming it is string-like `next(self)` is more likely to behave as expected.
`readlines`()	Similar to `file.readlines()`.
`seek`	Change stream position.
`seekable`()	Return whether object supports random access.
`tell`(/)	Return current stream position.
`truncate`	Truncate file to size bytes.
`writable`()	Return whether object was opened for writing.
`writelines`(lines, /)	Write a list of lines to stream.

next

close()¶: Close stream

fileno()¶

Returns underlying file descriptor if one exists.

OSError is raised if the IO object does not use a file descriptor.

filter(data)¶

Return next assembled feature from self.stream

Returns

SegmentChain or subclass: Next feature assembled from self.streams, type specified by self.return_type

flush(/)¶

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

isatty()¶

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

next()¶

read()¶

Similar to file.read(). Process all units of data, assuming it is string-like

Returns

str

readable()¶

Return whether object was opened for reading.

If False, read() will raise OSError.

readline()¶

Process a single line of data, assuming it is string-like next(self) is more likely to behave as expected.

Returns

object: a unit of processed data

readlines()¶

Similar to file.readlines().

Returns

list: processed data

seek()¶

Change stream position.

Change the stream position to the given byte offset. The offset is interpreted relative to the position indicated by whence. Values for whence are:

0 – start of stream (the default); offset should be zero or positive
1 – current stream position; offset may be negative
2 – end of stream; offset is usually negative

Return the new absolute position.

seekable()¶

Return whether object supports random access.

If False, seek(), tell() and truncate() will raise OSError. This method may need to do a test seek().

tell(/)¶: Return current stream position.

truncate()¶

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writable()¶

Return whether object was opened for writing.

If False, write() will raise OSError.

writelines(lines, /)¶

Write a list of lines to stream.

Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

closed¶

class plastid.readers.gff.GTF2_Reader(*streams, end_included=True, return_stopfeatures=False, is_sorted=False, tabix=False)[source]¶

Bases: plastid.readers.gff.AbstractGFF_Reader

Read raw features in GTF2 files as SegmentChains. To assemble transcripts from raw features, use GTF2_TranscriptAssembler.

Assumes input to comply with the GTF2 specification. Each element must:

use 1-indexed, fully-closed coordinates

have defined gene_id and transcript_id attributes

All SegmentChain objects returned by the reader have 0-indexed, half-open coordinates in keeping with Python conventions.

Parameters

*streamsone or more str or file-like: One or more input streams or filenames pointing to GFF information
end_includedbool, optional: Boolean, whether the end coordinate is included in the feature (closed or ‘end-included’ intervals) or not (half-open intervals). (Default: True)
return_stopfeaturesbool, optional: If True, will return a special SegmentChain called StopFeature signifying that all previously emitted SegmentChains may be assembled into complete entities. These are emitted when the line “###” is encountered in a GTF2. (Default: False)
is_sortedbool, optional: If True and return_stopfeatures is True, assume the GTF2 is sorted by chromosome. The reader will return StopFeature when the chromosome name of a given feature differs from that of the previous feature. (Default: False)
tabixboolean, optional: streams point to tabix-compressed files or are open tabix_file_iterator (Default: False)

Examples

Read raw features from a GTF2 file:

>>> feature_reader = GTF2_Reader(open("some_file.gtf"))
>>> for feature in reader:
>>>     print(feature.get_name(),feature.attr["type"],str(feature))
('YAL030W_mRNA',  'exon',        'chrI:87262-87387(+)')
('YAL030W_mRNA',  'exon',        'chrI:87500-87857(+)')
('YAL030W_mRNA',  'CDS',         'chrI:87285-87387(+)')
('YAL030W_mRNA',  'CDS',         'chrI:87500-87749(+)')
('YAL030W_mRNA',  'start_codon', 'chrI:87285-87288(+)')
('YAL030W_mRNA',  'stop_codon',  'chrI:87749-87752(+)')
('YBL092W_mRNA',  'exon',        'chrII:45643-45644(+)')
('YBL092W_mRNA',  'exon',        'chrII:45977-46440(+)')
('YBL092W_mRNA',  'CDS',         'chrII:45977-46367(+)')
('YBL092W_mRNA',  'start_codon', 'chrII:45977-45980(+)')
[rest of output omitted]

Attributes

metadatadict: Dictionary of metadata found in file headers

Methods

`close`()	Close stream
`fileno`()	Returns underlying file descriptor if one exists.
`filter`(line)	Parses lines of the GFF stream into `SegmentChain` When metadata is found, temporarily delegates processing to `_parse_metatokens()`, and then reads the next genomic feature
`flush`(/)	Flush write buffers, if applicable.
`isatty`()	Return whether this is an 'interactive' stream.
`read`()	Similar to `file.read()`.
`readable`()	Return whether object was opened for reading.
`readline`()	Process a single line of data, assuming it is string-like `next(self)` is more likely to behave as expected.
`readlines`()	Similar to `file.readlines()`.
`seek`	Change stream position.
`seekable`()	Return whether object supports random access.
`tell`(/)	Return current stream position.
`truncate`	Truncate file to size bytes.
`writable`()	Return whether object was opened for writing.
`writelines`(lines, /)	Write a list of lines to stream.

next

close()¶: Close stream

fileno()¶

Returns underlying file descriptor if one exists.

OSError is raised if the IO object does not use a file descriptor.

filter(line)¶

Parses lines of the GFF stream into SegmentChain When metadata is found, temporarily delegates processing to _parse_metatokens(), and then reads the next genomic feature

Parameters

line: Next line from GFF stream

Returns

SegmentChain: Next feature in file

flush(/)¶

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

isatty()¶

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

next()¶

read()¶

Similar to file.read(). Process all units of data, assuming it is string-like

Returns

str

readable()¶

Return whether object was opened for reading.

If False, read() will raise OSError.

readline()¶

Process a single line of data, assuming it is string-like next(self) is more likely to behave as expected.

Returns

object: a unit of processed data

readlines()¶

Similar to file.readlines().

Returns

list: processed data

seek()¶

Change stream position.

Change the stream position to the given byte offset. The offset is interpreted relative to the position indicated by whence. Values for whence are:

0 – start of stream (the default); offset should be zero or positive
1 – current stream position; offset may be negative
2 – end of stream; offset is usually negative

Return the new absolute position.

seekable()¶

Return whether object supports random access.

If False, seek(), tell() and truncate() will raise OSError. This method may need to do a test seek().

tell(/)¶: Return current stream position.

truncate()¶

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writable()¶

Return whether object was opened for writing.

If False, write() will raise OSError.

writelines(lines, /)¶

Write a list of lines to stream.

Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

closed¶

class plastid.readers.gff.GTF2_TranscriptAssembler(*streams, is_sorted=False, return_type=SegmentChain, add_three_for_stop=False, printer=None, tabix=False)[source]¶

Bases: plastid.readers.gff.AbstractGFF_Assembler

Assemble Transcripts from raw features in GTF2 format.

Exons and CDS features are grouped by shared transcript_id. Attributes that have common values for all exons and CDS within a transcript are propagated up to the attr dict of the assembled Transcript. Other attributes from individual CDS or exon components are discarded.

The assembler functions as an iterator. Within each chromosome, transcripts are returned in lexical order.

For access to raw features, instead use GTF2_Reader.

Parameters

*streamsone or more str or file-like: One or more input streams or filenames pointing to GTF2 data
is_sortedbool, optional: GTF2 is sorted by chromosome name, allowing some memory savings (Default: False)
return_typeSegmentChain or subclass, optional: Type of feature to return from assembled subfeatures (Default: SegmentChain)
add_three_for_stopbool, optional: Some annotation files exclude the stop codon from CDS annotations. If set to True, three nucleotides will be added to the threeprime end of each CDS annotation, UNLESS the annotated transcript contains explicit stop_codon feature. (Default: False)
printerfile-like, optional: Logger implementing a write() method. Default: NullWriter
tabixboolean, optional: streams point to tabix-compressed files or are open tabix_file_iterator (Default: False)

Examples

Assemble transcripts from a GTF2 file:

>>> transcript_reader = GTF2_TranscriptAssembler(open("some_file.gtf"))
>>> for transcript in reader:
>>>     print(transcript.get_name(),transcript.attr["type"],str(transcript)) # do something

('YAL030W_mRNA',   'mRNA',  'chrI:87262-87387^87500-87857(+)')
('YBL092W_mRNA',   'mRNA',  'chrII:45643-45644^45977-46440(+)')
('YBL057C_mRNA',   'mRNA',  'chrII:112749-113427^113444-113450(-)')
('YBL040C_mRNA',   'mRNA',  'chrII:142033-142749^142846-142891(-)')
('YBL018C_mRNA',   'mRNA',  'chrII:185961-186352^186427-186504(-)')
('YBR012W-B',      'mRNA',  'chrII:259868-261173^261174-265140(+)')
('YBR044C_mRNA',   'mRNA',  'chrII:324292-324336^324340-326127(-)')
('YBR082C_mRNA',   'mRNA',  'chrII:406506-407027^407122-407379(-)')
('YBR126W-B_mRNA', 'mRNA',  'chrII:490824-491202(+)')
('YBR138C_mRNA',   'mRNA',  'chrII:513636-515391(-)')
[rest of output omitted]

Attributes

streamsfile-like: Input streams, usually constructed from one or more open filehandles
metadatadict: Various attributes gleaned from the streams, if any
counterint: Cumulative line number counter over all streams
printerfile-like, optional: Logger implementing a write() method.
rejectedlist: A list of transcript IDs from transcripts that failed to assemble properly

Methods

`close`()	Close stream
`fileno`()	Returns underlying file descriptor if one exists.
`filter`(data)	Return next assembled feature from self.stream
`flush`(/)	Flush write buffers, if applicable.
`isatty`()	Return whether this is an 'interactive' stream.
`read`()	Similar to `file.read()`.
`readable`()	Return whether object was opened for reading.
`readline`()	Process a single line of data, assuming it is string-like `next(self)` is more likely to behave as expected.
`readlines`()	Similar to `file.readlines()`.
`seek`	Change stream position.
`seekable`()	Return whether object supports random access.
`tell`(/)	Return current stream position.
`truncate`	Truncate file to size bytes.
`writable`()	Return whether object was opened for writing.
`writelines`(lines, /)	Write a list of lines to stream.

next

close()¶: Close stream

fileno()¶

Returns underlying file descriptor if one exists.

OSError is raised if the IO object does not use a file descriptor.

filter(data)¶

Return next assembled feature from self.stream

Returns

SegmentChain or subclass: Next feature assembled from self.streams, type specified by self.return_type

flush(/)¶

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

isatty()¶

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

next()¶

read()¶

Similar to file.read(). Process all units of data, assuming it is string-like

Returns

str

readable()¶

Return whether object was opened for reading.

If False, read() will raise OSError.

readline()¶

Process a single line of data, assuming it is string-like next(self) is more likely to behave as expected.

Returns

object: a unit of processed data

readlines()¶

Similar to file.readlines().

Returns

list: processed data

seek()¶

Change stream position.

Change the stream position to the given byte offset. The offset is interpreted relative to the position indicated by whence. Values for whence are:

0 – start of stream (the default); offset should be zero or positive
1 – current stream position; offset may be negative
2 – end of stream; offset is usually negative

Return the new absolute position.

seekable()¶

Return whether object supports random access.

If False, seek(), tell() and truncate() will raise OSError. This method may need to do a test seek().

tell(/)¶: Return current stream position.

truncate()¶

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writable()¶

Return whether object was opened for writing.

If False, write() will raise OSError.

writelines(lines, /)¶

Write a list of lines to stream.

Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

closed¶

dtmp = {'CDS_like': {}, 'exon_like': {}}¶

plastid.readers.gff.StopFeature = <SegmentChain segments=1 bounds=Stop:0-1(.) name=StopFeature>¶

Special SegmentChain emitted from GFF readers when:

the special line ### is encountered
the special line ###FASTA is encountered
a GFF file is marked as sorted, and the contig/chromosome changes
the source stream of features is changed

indicating that all previously returned features may be assembled into full objects.

Note

Because StopFeature is zero-length, it does not evaluate as equal to itself. Use x is StopFeature or x is not StopFeature it testing for equality.

plastid.readers.gff module¶

Summary¶

Module contents¶

Examples¶

See Also¶

Summary ¶

Module contents ¶

Examples ¶

See Also ¶