Getting started¶
Genomic analysis requires some setup. This page provides a quick overview of those pieces.
To get started, you need:
For those looking to try plastid
out, or to explore sequencing concepts,
we have included a Demo dataset, which includes sequence and annotation
for the hCMV genome, and ribosome profiling and RNA-seq datasets.
For those setting up their own data, please continue reading:
A genome sequence & annotation¶
The starting point for most genomics research is to obtain a genome sequence and matching genome annotation. Good sources for these include:
It is critical that the genome sequence and feature annotations use the same coordinates, so be sure to download corresponding versions from a single build (i.e. it is unhelpful to mix mouse the mm9 genome sequence with the mm10 annotation).
Often it is useful to do some pre-processing of files once they have been downloaded. Detailed discussion is provided in Setting up a genome for analysis
Aligned sequence data¶
The starting point for analysis with Plastid
is aligned sequence data,
preferably in BAM format.
An brief overview of the relevant steps in setting up alignments and exploring data may be found in A simple alignment and quantitation workflow.
That said, choice of alignment parameters merits careful consideration, which is a weighty topic, beyond the scope of this tutorial. For a more detailed discussion, see the documentation for the read alignment program you use (e.g. Bowtie, Bowtie 2, Tophat, bwa, STAR).
Other background info¶
Most of the plastid
documentation assumes familiarty with a handful
of concepts and conventions. We encourage those new to sequencing analysis
to check the Tutorials and Glossary of terms as needed.