plastid.bin.psite module¶

This script estimates P-site offsets, stratified by read length, in a ribosome profiling dataset. To do so, read alignments are mapped to their fiveprime ends, and a metagene profile surrounding the start codon is calculated separately for each read length.

The start codon peak for each read length is heuristically identified as the largest peak upstream of the start codon, or within a region defined by the user. The distance between that peak and the start codon itself is taken to be the P-site offset for that read length.

Notes¶

Generate an ROI file first: This script requires an ROI file of maximal spanning windows surrounding each gene’s start codon. This file can be generated by the generate subprogram of the metagene script.
Check the data: Users should examine the graphical output to make sure the P-site estimates are reasonable, because if clear start codon peaks are not present in the data, the algorithm described above will have trouble.
For RNase I only: This algorithm presumes that the RNase used to prepare the ribosome-protected footprints has no appreciable cutting bias, so that footprints may be clearly resolved to the edge of the ribosome.

Output files¶

OUTBASE_p_offsets.txt
Tab-delimited text file with two columns. The first is read length, and the second the offset from the fiveprime end of that read length to the ribosomal P-site. This table can be supplied as the argument for --offset when using --fiveprime_variable mapping in any of the other scripts in plastid.bin

OUTBASE_p_offsets.[svg | png | pdf | et c]
Plot of metagene profiles for each read length, when reads are mapped to their 5’ ends, P-site offsets are applied.

OUTBASE_metagene_profiles.txt
Metagene profiles, stratified by read length, before P-site offsets are applied.

OUTBASE_K_rawcounts.txt
Saved if --keep is given on command line. Raw count vectors for each metagene window specified in input ROI file, without P-site mapping rules applied, for reads of length K

OUTBASE_K_normcounts.txt
Saved if --keep is given on command line. Normalized count vectors for each metagene window specified in input ROI file, without P-site mapping rules applied, for reads of length K

where OUTBASE is supplied by the user.

plastid.bin.psite.do_count(roi_table, ga, norm_start, norm_end, min_counts, min_len, max_len, aggregate=False, printer=NullWriter())[source]¶

Calculate a metagene profile for each read length in the dataset

Parameters

roi_tablepandas.DataFrame: Table specifying regions of interest, generated by plastid.bin.metagene.do_generate()
gaBAMGenomeArray: Count data
norm_startint: Coordinate in window specifying normalization region start
norm_endint: Coordinate in window specifying normalization region end
min_countsfloat: Minimum number of counts in window[norm_start:norm_end] required for inclusion in metagene profile
min_lenint: Minimum read length to include
max_lenint: Maximum read length to include
aggregatebool, optional: Estimate P-site from aggregate reads at each position, instead of median normalized read density. Potentially noisier, but helpful for lower-count data or read lengths with few counts. (Default: False)
printerfile-like, optional: filehandle to write logging info to (Default: NullWriter())

Returns

dict: Dictionary of numpy.ndarray s of raw counts at each position (column) for each window (row)
dict: Dictionary of numpy.ndarray s of normalized counts at each position (column) for each window (row), normalized by the total number of counts in that row from norm_start to norm_end
pandas.DataFrame: Metagene profile of median normalized counts at each position across all windows, and the number of windows included in the calculation of each median, stratified by read length

plastid.bin.psite.main(argv=['-T', '-E', '-b', 'html', '-d', '_build/doctrees', '-D', 'language=en', '.', '_build/html'])[source]¶

Command-line program

Parameters

argvlist, optional: A list of command-line arguments, which will be processed as if the script were called from the command line if main() is called directrly.
Default: `sys.argv[1:]`. The command-line arguments, if the script is
invoked from the command line