plastid.bin.phase_by_size module

Estimate sub-codon phasing in a ribosome profiling dataset, stratified by read length.

Because ribosomes step three nucleotides in each cycle of translation elongation, in many ribosome profiling datasets a triplet periodicity is observable in the distribution of ribosome-protected footprints.

In a good dataset, 70-90% of the reads on a codon fall within the first of the three codon positions. This allows deduction of translation reading frames, if the reading frame is not known a priori. See [IGNW09] for more details.

Output files

OUTBASE_phasing.txt

Read phasing for each read length

OUTBASE_phasing.svg

Plot of phasing by read length

where OUTBASE is supplied by the user.

Note

To avoid double-counting of codons, users should either use an ROI file made by the generate subprogram of the metagene script, or supply an annotation file that includes only one transcript isoform per gene.

plastid.bin.phase_by_size.main(argv=['-T', '-E', '-b', 'html', '-d', '_build/doctrees', '-D', 'language=en', '.', '_build/html'])[source]

Command-line program

Parameters
argvlist, optional

A list of command-line arguments, which will be processed as if the script were called from the command line if main() is called directly.

Default: `sys.argv[1:]`. The command-line arguments, if the script is
invoked from the command line
plastid.bin.phase_by_size.roi_row_to_cds(row)[source]

Helper function to extract coding portions from maximal spanning windows flanking CDS starts that are created by metagene generate subprogram.

Parameters
row(int, Series)

Row from a pandas.DataFrame of an ROI file made by the metagene generate subprogram

Returns
SegmentChain

Coding portion of maximal spanning window