plastid 0.4.7


plastid is a Python library for genomics and sequencing. It includes tools for exploratory data analysis (EDA) as well as a handful of scripts that implement common tasks.

plastid differs from other packages in its design goals. Namely:

  • its intended audience includes both bench and computational biologists. We tried to make it easy to use, and wrote lots of Tutorials

  • It is designed for analyses in which data at each position within a gene or transcript are of interest, such as analysis of ribosome profiling data. To this end, plastid

    • uses Read mapping functions to extract the biology of interest from read alignments – e.g. in the case of ribosome profiling, a ribosomal P-site, in DMS-seq, sites of nucleotide modification, et c. – and turn these into quantitative data, usually numpy arrays of counts at each nucleotide position in a transcript.
    • encapsulates multi-segment features, such as spliced transcripts, as single objects. This facilitates many common tasks, such as converting coordinates between genome and feature-centric spaces.
  • It separates data from its representation on disk by providing consistent interfaces to many of the various file formats, found in the wild.

  • It is designed for expansion to new or unknown assays. Frequently, writing a new mapping rule is sufficient to enable all of plastid‘s tools to interpret data coming from a new assay.

plastid was written by Joshua Dunn in Jonathan Weissman’s lab at UCSF. Versions of it have been used in several publications ([DFB+13][FRJ+15]).

Where to go next