plastid.bin.phase_by_size module¶
Estimate sub-codon phasing in a ribosome profiling dataset, stratified by read length.
Because ribosomes step three nucleotides in each cycle of translation elongation, in many ribosome profiling datasets a triplet periodicity is observable in the distribution of ribosome-protected footprints.
In a good dataset, 70-90% of the reads on a codon fall within the first of the three codon positions. This allows deduction of translation reading frames, if the reading frame is not known a priori. See [IGNW09] for more details.
Output files¶
- OUTBASE_phasing.txt
Read phasing for each read length
- OUTBASE_phasing.svg
Plot of phasing by read length
where OUTBASE is supplied by the user.
Note
To avoid double-counting of codons, users should either use an ROI file made by the
generate
subprogram of themetagene
script, or supply an annotation file that includes only one transcript isoform per gene.
- plastid.bin.phase_by_size.main(argv=['-T', '-E', '-b', 'html', '-d', '_build/doctrees', '-D', 'language=en', '.', '_build/html'])[source]¶
Command-line program
- Parameters
- argvlist, optional
A list of command-line arguments, which will be processed as if the script were called from the command line if
main()
is called directly.- Default: `sys.argv[1:]`. The command-line arguments, if the script is
- invoked from the command line
- plastid.bin.phase_by_size.roi_row_to_cds(row)[source]¶
Helper function to extract coding portions from maximal spanning windows flanking CDS starts that are created by
metagene
generate
subprogram.- Parameters
- row(int, Series)
Row from a
pandas.DataFrame
of an ROI file made by themetagene
generate
subprogram
- Returns
SegmentChain
Coding portion of maximal spanning window