plastid.bin.psite module¶
This script estimates P-site offsets, stratified by read length, in a ribosome profiling dataset. To do so, read alignments are mapped to their fiveprime ends, and a metagene profile surrounding the start codon is calculated separately for each read length.
The start codon peak for each read length is heuristically identified as the largest peak upstream of the start codon, or within a region defined by the user. The distance between that peak and the start codon itself is taken to be the P-site offset for that read length.
Notes¶
- Generate an ROI file first
This script requires an ROI file of maximal spanning windows surrounding each gene’s start codon. This file can be generated by the
generate
subprogram of themetagene
script.- Check the data
Users should examine the graphical output to make sure the P-site estimates are reasonable, because if clear start codon peaks are not present in the data, the algorithm described above will have trouble.
- For RNase I only
This algorithm presumes that the RNase used to prepare the ribosome-protected footprints has no appreciable cutting bias, so that footprints may be clearly resolved to the edge of the ribosome.
Output files¶
- OUTBASE_p_offsets.txt
Tab-delimited text file with two columns. The first is read length, and the second the offset from the fiveprime end of that read length to the ribosomal P-site. This table can be supplied as the argument for
--offset
when using--fiveprime_variable
mapping in any of the other scripts inplastid.bin
- OUTBASE_p_offsets.[svg | png | pdf | et c]
Plot of metagene profiles for each read length, when reads are mapped to their 5’ ends, P-site offsets are applied.
- OUTBASE_metagene_profiles.txt
Metagene profiles, stratified by read length, before P-site offsets are applied.
- OUTBASE_K_rawcounts.txt
Saved if
--keep
is given on command line. Raw count vectors for each metagene window specified in input ROI file, without P-site mapping rules applied, for reads of length K- OUTBASE_K_normcounts.txt
Saved if
--keep
is given on command line. Normalized count vectors for each metagene window specified in input ROI file, without P-site mapping rules applied, for reads of length K
where OUTBASE is supplied by the user.
- plastid.bin.psite.do_count(roi_table, ga, norm_start, norm_end, min_counts, min_len, max_len, aggregate=False, printer=NullWriter())[source]¶
Calculate a metagene profile for each read length in the dataset
- Parameters
- roi_table
pandas.DataFrame
Table specifying regions of interest, generated by
plastid.bin.metagene.do_generate()
- ga
BAMGenomeArray
Count data
- norm_startint
Coordinate in window specifying normalization region start
- norm_endint
Coordinate in window specifying normalization region end
- min_countsfloat
Minimum number of counts in window[norm_start:norm_end] required for inclusion in metagene profile
- min_lenint
Minimum read length to include
- max_lenint
Maximum read length to include
- aggregatebool, optional
Estimate P-site from aggregate reads at each position, instead of median normalized read density. Potentially noisier, but helpful for lower-count data or read lengths with few counts. (Default: False)
- printerfile-like, optional
filehandle to write logging info to (Default:
NullWriter()
)
- roi_table
- Returns
- dict
Dictionary of
numpy.ndarray
s of raw counts at each position (column) for each window (row)- dict
Dictionary of
numpy.ndarray
s of normalized counts at each position (column) for each window (row), normalized by the total number of counts in that row from norm_start to norm_endpandas.DataFrame
Metagene profile of median normalized counts at each position across all windows, and the number of windows included in the calculation of each median, stratified by read length
- plastid.bin.psite.main(argv=['-T', '-E', '-b', 'html', '-d', '_build/doctrees', '-D', 'language=en', '.', '_build/html'])[source]¶
Command-line program
- Parameters
- argvlist, optional
A list of command-line arguments, which will be processed as if the script were called from the command line if
main()
is called directrly.- Default: `sys.argv[1:]`. The command-line arguments, if the script is
- invoked from the command line