plastid.bin.phase_by_size module¶
Estimate sub-codon phasing in a ribosome profiling dataset, stratified by read length.
Because ribosomes step three nucleotides in each cycle of translation elongation, in many ribosome profiling datasets a triplet periodicity is observable in the distribution of ribosome-protected footprints.
In a good dataset, 70-90% of the reads on a codon fall within the first of the three codon positions. This allows deduction of translation reading frames, if the reading frame is not known a priori. See [IGNW09] for more details.
Output files¶
- OUTBASE_phasing.txt
Read phasing for each read length
- OUTBASE_phasing.svg
Plot of phasing by read length
where OUTBASE is supplied by the user.
Note
To avoid double-counting of codons, users should either use an ROI file made by the
generatesubprogram of themetagenescript, or supply an annotation file that includes only one transcript isoform per gene.
- plastid.bin.phase_by_size.main(argv=['-T', '-E', '-b', 'html', '-d', '_build/doctrees', '-D', 'language=en', '.', '_build/html'])[source]¶
Command-line program
- Parameters
- argvlist, optional
A list of command-line arguments, which will be processed as if the script were called from the command line if
main()is called directly.- Default: `sys.argv[1:]`. The command-line arguments, if the script is
- invoked from the command line
- plastid.bin.phase_by_size.roi_row_to_cds(row)[source]¶
Helper function to extract coding portions from maximal spanning windows flanking CDS starts that are created by
metagenegeneratesubprogram.- Parameters
- row(int, Series)
Row from a
pandas.DataFrameof an ROI file made by themetagenegeneratesubprogram
- Returns
SegmentChainCoding portion of maximal spanning window