plastid.bin.psite module

This script estimates P-site offsets, stratified by read length, in a ribosome profiling dataset. To do so, read alignments are mapped to their fiveprime ends, and a metagene profile surrounding the start codon is calculated separately for each read length.

The start codon peak for each read length is heuristically identified as the largest peak upstream of the start codon, or within a region defined by the user. The distance between that peak and the start codon itself is taken to be the P-site offset for that read length.

Notes

Generate an ROI file first

This script requires an ROI file of maximal spanning windows surrounding each gene’s start codon. This file can be generated by the generate subprogram of the metagene script.

Check the data

Users should examine the graphical output to make sure the P-site estimates are reasonable, because if clear start codon peaks are not present in the data, the algorithm described above will have trouble.

For RNase I only

This algorithm presumes that the RNase used to prepare the ribosome-protected footprints has no appreciable cutting bias, so that footprints may be clearly resolved to the edge of the ribosome.

Output files

OUTBASE_p_offsets.txt

Tab-delimited text file with two columns. The first is read length, and the second the offset from the fiveprime end of that read length to the ribosomal P-site. This table can be supplied as the argument for --offset when using --fiveprime_variable mapping in any of the other scripts in plastid.bin

OUTBASE_p_offsets.[svg | png | pdf | et c]

Plot of metagene profiles for each read length, when reads are mapped to their 5’ ends, P-site offsets are applied.

OUTBASE_metagene_profiles.txt

Metagene profiles, stratified by read length, before P-site offsets are applied.

OUTBASE_K_rawcounts.txt

Saved if --keep is given on command line. Raw count vectors for each metagene window specified in input ROI file, without P-site mapping rules applied, for reads of length K

OUTBASE_K_normcounts.txt

Saved if --keep is given on command line. Normalized count vectors for each metagene window specified in input ROI file, without P-site mapping rules applied, for reads of length K

where OUTBASE is supplied by the user.

plastid.bin.psite.do_count(roi_table, ga, norm_start, norm_end, min_counts, min_len, max_len, aggregate=False, printer=NullWriter())[source]

Calculate a metagene profile for each read length in the dataset

Parameters
roi_tablepandas.DataFrame

Table specifying regions of interest, generated by plastid.bin.metagene.do_generate()

gaBAMGenomeArray

Count data

norm_startint

Coordinate in window specifying normalization region start

norm_endint

Coordinate in window specifying normalization region end

min_countsfloat

Minimum number of counts in window[norm_start:norm_end] required for inclusion in metagene profile

min_lenint

Minimum read length to include

max_lenint

Maximum read length to include

aggregatebool, optional

Estimate P-site from aggregate reads at each position, instead of median normalized read density. Potentially noisier, but helpful for lower-count data or read lengths with few counts. (Default: False)

printerfile-like, optional

filehandle to write logging info to (Default: NullWriter())

Returns
dict

Dictionary of numpy.ndarray s of raw counts at each position (column) for each window (row)

dict

Dictionary of numpy.ndarray s of normalized counts at each position (column) for each window (row), normalized by the total number of counts in that row from norm_start to norm_end

pandas.DataFrame

Metagene profile of median normalized counts at each position across all windows, and the number of windows included in the calculation of each median, stratified by read length

plastid.bin.psite.main(argv=['-T', '-E', '-b', 'html', '-d', '_build/doctrees', '-D', 'language=en', '.', '_build/html'])[source]

Command-line program

Parameters
argvlist, optional

A list of command-line arguments, which will be processed as if the script were called from the command line if main() is called directrly.

Default: `sys.argv[1:]`. The command-line arguments, if the script is
invoked from the command line