plastid.readers.psl module

This module defines a two classes for reading PSL files (made by, for example, blat):

PSL_Reader
Read a PSL file line-by-line, converting each line into a SegmentChain or Transcript
BundledPSL_Reader
Read PSL files, returning lists of SegmentChains grouped by query sequence.
class plastid.readers.psl.BundledPSL_Reader(*streams, return_type=SegmentChain, add_three_for_stop=False, tabix=False, printer=None, **kwargs)[source]

Bases: plastid.readers.psl.PSL_Reader

Read PSL files, returning lists of SegmentChains grouped by query sequence. Use this when a given query sequence has multiple hits in your PSL file, and you want the output to be grouped.

Parameters:
*streams : file-like

One or more open filehandles of input data.

return_type : SegmentChain or subclass, optional

Type of feature to return from assembled subfeatures (Default: SegmentChain)

add_three_for_stop : bool, optional

Some annotation files exclude the stop codon from CDS annotations. If set to True, three nucleotides will be added to the threeprime end of each CDS annotation, UNLESS the annotated transcript contains explicit stop_codon feature. (Default: False)

printer : file-like, optional

Logger implementing a write() method. Default: NullWriter

tabix : bool, optional

streams point to tabix-compressed files or are open tabix_file_iterator (Default: False)

**kwargs

Other keyword arguments used by specific parsers

Attributes:
closed

Methods

close() Close stream
filter(line) Process lines of PSL files input into SegmentChain, and group these by query sequence.
flush Flush write buffers, if applicable.
read() Similar to file.read().
readline() Process a single line of data, assuming it is string-like next(self) is more likely to behave as expected.
readlines() Similar to file.readlines().
seek Change stream position.
tell Return current stream position.
truncate Truncate file to size bytes.
fileno  
isatty  
next  
readable  
seekable  
writable  
writelines  
close()

Close stream

fileno()

Returns underlying file descriptor if one exists.

An IOError is raised if the IO object does not use a file descriptor.

filter(line)[source]

Process lines of PSL files input into SegmentChain, and group these by query sequence.

Parameters:
line : str

line of PSL input

Returns:
list

list of SegmentChain objects sharing a query sequence

flush()

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

isatty()

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

next() → the next value, or raise StopIteration
read()

Similar to file.read(). Process all units of data, assuming it is string-like

Returns:
str
readable()

Return whether object was opened for reading.

If False, read() will raise IOError.

readline()

Process a single line of data, assuming it is string-like next(self) is more likely to behave as expected.

Returns:
object

a unit of processed data

readlines()

Similar to file.readlines().

Returns:
list

processed data

seek()

Change stream position.

Change the stream position to the given byte offset. The offset is interpreted relative to the position indicated by whence. Values for whence are:

  • 0 – start of stream (the default); offset should be zero or positive
  • 1 – current stream position; offset may be negative
  • 2 – end of stream; offset is usually negative

Return the new absolute position.

seekable()

Return whether object supports random access.

If False, seek(), tell() and truncate() will raise IOError. This method may need to do a test seek().

tell()

Return current stream position.

truncate()

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writable()

Return whether object was opened for writing.

If False, read() will raise IOError.

writelines()
closed
class plastid.readers.psl.PSL_Reader(*streams, return_type=SegmentChain, add_three_for_stop=False, tabix=False, printer=None, **kwargs)[source]

Bases: plastid.readers.common.AssembledFeatureReader

Read PSL files into SegmentChain or Transcript objects

Parameters:
*streams : file-like

One or more open filehandles of input data.

return_type : SegmentChain or subclass, optional

Type of feature to return from assembled subfeatures (Default: SegmentChain)

add_three_for_stop : bool, optional

Some annotation files exclude the stop codon from CDS annotations. If set to True, three nucleotides will be added to the threeprime end of each CDS annotation, UNLESS the annotated transcript contains explicit stop_codon feature. (Default: False)

printer : file-like, optional

Logger implementing a write() method. Default: NullWriter

tabix : bool, optional

streams point to tabix-compressed files or are open tabix_file_iterator (Default: False)

**kwargs

Other keyword arguments used by specific parsers

Attributes:
streams : file-like

One or more open streams (usually filehandles) of input data.

return_type : class

The type of object assembled by the reader. Typically an SegmentChain or a subclass thereof. Must import a method called from_psl

counter : int

Cumulative line number counter over all streams

rejected : list

A list of lines from PSL file that did not assemble properly

metadata : dict

Various attributes gleaned from the stream, if any

Methods

close() Close stream
filter(data) Return next assembled feature from self.stream
flush Flush write buffers, if applicable.
read() Similar to file.read().
readline() Process a single line of data, assuming it is string-like next(self) is more likely to behave as expected.
readlines() Similar to file.readlines().
seek Change stream position.
tell Return current stream position.
truncate Truncate file to size bytes.
fileno  
isatty  
next  
readable  
seekable  
writable  
writelines  
close()

Close stream

fileno()

Returns underlying file descriptor if one exists.

An IOError is raised if the IO object does not use a file descriptor.

filter(data)

Return next assembled feature from self.stream

Returns:
|SegmentChain| or subclass

Next feature assembled from self.streams, type specified by self.return_type

flush()

Flush write buffers, if applicable.

This is not implemented for read-only and non-blocking streams.

isatty()

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

next() → the next value, or raise StopIteration
read()

Similar to file.read(). Process all units of data, assuming it is string-like

Returns:
str
readable()

Return whether object was opened for reading.

If False, read() will raise IOError.

readline()

Process a single line of data, assuming it is string-like next(self) is more likely to behave as expected.

Returns:
object

a unit of processed data

readlines()

Similar to file.readlines().

Returns:
list

processed data

seek()

Change stream position.

Change the stream position to the given byte offset. The offset is interpreted relative to the position indicated by whence. Values for whence are:

  • 0 – start of stream (the default); offset should be zero or positive
  • 1 – current stream position; offset may be negative
  • 2 – end of stream; offset is usually negative

Return the new absolute position.

seekable()

Return whether object supports random access.

If False, seek(), tell() and truncate() will raise IOError. This method may need to do a test seek().

tell()

Return current stream position.

truncate()

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

writable()

Return whether object was opened for writing.

If False, read() will raise IOError.

writelines()
closed