Coordinate systems used in genomics¶
plastid's readers automatically convert coordinates from
any of the supported file formats into a 0-indexed and half-open
space (i.e. following typical Python convention), so users don’t need to worry
about off-by-one errors in their annotations.
Nonetheless, this tutorial describes various coordinate representations used in genomics:
Genomic coordinates are typically specified as a set of:
a chromosome name
a start position
an end position
a chromosome strand:
- ‘+’ for the forward strand
- ‘-‘ for the reverse stranded
- ‘.’ for both strands / unstranded features
This gives rise to several non-obvious considerations:
In the vast majority of annotation formats, the start coordinate refers to the lowest-numbered (i.e. leftmost, chromosome-wise) coordinate relative to the genome rather than the feature. So, for reverse-stand features, the start coordinate actually denotes the 3’ end of the feature, while the end coordinate denotes the 5’ end.
XbaI ______ ChrI: ACCGATGCTAGCTCTAGACTACATCTACTCCGTCGTCTAGCATGATGCTAGCTGAC | |^^^^^^ | | | 0-index: 0 10 20 30 40 1-index: 1 11 21 31 41
In the context of genomics, both 0-indexed and 1-indexed
systems are used, depending upon file format.
plastid knows which file
formats use which representation, and automatically converts all coordinates
to a 0-indexed representation, following Python idioms.
Similarly, coordinate systems can represent end coordinates in two ways:
XbaI ______ ChrI: ACCGATGCTAGCTCTAGACTACATCTACTCCGTCGTCTAGCATGATGCTAGCTGAC | ^^^^^^ | | | 0-index: 0 | | 20 30 40 | | Start & end: 11 16
And the length of the feature equals:\[\ell = end - start + 1 = 16 - 11 + 1 = 6\]
In contrast, in a half-open coordinate system, the end coordinate is defined as the first position NOT included in the feature. In a 0-indexed, half-open representation, the XbaI site starts at position 11, and ends at position 17. In this case, the length of the feature equals:\[\ell = end - start = 17 - 11 = 6\]
Half-open Fully-closed 0-indexed start: 11 end: 17 start: 11 end: 16 1-indexed start: 12 end: 18 start: 12 end: 17
chromosome/contig: 'ChrI' start: 11 end: 17 strand: '.'