Genomic analysis requires some setup. This page provides a quick overview of those pieces.
To get started, you need:
For those looking to try
plastid out, or to explore sequencing concepts,
we have included a Demo dataset, which includes sequence and annotation
for the hCMV genome, and ribosome profiling and RNA-seq datasets.
For those setting up their own data, please continue reading:
The starting point for most genomics research is to obtain a genome sequence and matching genome annotation. Good sources for these include:
It is critical that the genome sequence and feature annotations use the same coordinates, so be sure to download corresponding versions from a single build (i.e. it is unhelpful to mix mouse the mm9 genome sequence with the mm10 annotation).
Often it is useful to do some pre-processing of files once they have been downloaded. Detailed discussion is provided in Setting up a genome for analysis
The starting point for analysis with
Plastid is aligned sequence data,
preferably in BAM format.
An brief overview of the relevant steps in setting up alignments and exploring data may be found in A simple alignment and quantitation workflow.
That said, choice of alignment parameters merits careful consideration, which is a weighty topic, beyond the scope of this tutorial. For a more detailed discussion, see the documentation for the read alignment program you use (e.g. Bowtie, Bowtie 2, Tophat, bwa, STAR).