plastid.genomics.seqtools module¶
Utilities for mutating and searching nucleic acid sequences.
Contents¶
Adaptor class that makes a |
|
|
Generate all sequences within mutations distance from a reference sequence |
|
Convert a nucleotide sequence of IUPAC nucleotide characters as a regular expression. |
Dictionary mapping IUPAC nucleotide symbols to tuples of nucleotides they represent (e.g. |
- class plastid.genomics.seqtools.TwoBitSeqRecordAdaptor(fh)[source]¶
Bases:
object
Adaptor class that makes a
twobitreader.TwoBitFile
behave like a dictionary ofBio.SeqRecord.SeqRecord
objects.
- plastid.genomics.seqtools.mutate_seqs(seqs, nucleotides='NACTG', mutations=1)[source]¶
Generate all sequences within mutations distance from a reference sequence
- Parameters
- seqsstr or list of str
Single reference sequence (a string) or a group of strings
- nucleotideslist of char, optional
Permitted nucleotide substitutions (Default: ‘NACTG’)
- mutationsint, optional
Number of substitutions to make (Default: 1)
- Returns
- set
all sequences within mutations substitutions from the sequence(s) specified in seqs
- plastid.genomics.seqtools.random_seq(size, nucleotides='ACTG')[source]¶
Generate a random nucleotide sequence of length size and composition nucleotides
- Parameters
- sizeint
length of desired sequence
- nucleotidesstr, optional
string of nucleotides to use in sequence, in desired base composition (i.e. need not be unique; can supply ‘AATCG’ to increase ‘A’ bias. Default: ‘ACTG’)
- Returns
- strrandomized DNA sequence
- plastid.genomics.seqtools.seq_to_regex(inp, flags=0)[source]¶
Convert a nucleotide sequence of IUPAC nucleotide characters as a regular expression. Ambiguous IUPAC characters are converted to groups (e.g. ‘Y’ to ‘[CTU]’), and T and U are considered equivalent.
- Parameters
- inpstr
Nucleotide sequence using IUPAC nucleotide codes
- flagsint, optional
Flags to pass to
re.compile()
(Default: 0 / no flags)
- Returns
re.RegexObject
Regular expression pattern corresponding to IUPAC sequence in inp
Examples
Convert a sequence to a regex:
>>> seq_to_regex("CARYYA").pattern 'CA[AG][CTU][CTU]A'
- plastid.genomics.seqtools.IUPAC_TABLE = {'A': 'A', 'B': ('C', 'G', 'T', 'U'), 'C': 'C', 'D': ('A', 'G', 'T', 'U'), 'G': 'G', 'H': ('A', 'C', 'T', 'U'), 'K': ('G', 'T', 'U'), 'M': ('A', 'C'), 'N': ('A', 'C', 'T', 'G', 'U'), 'R': ('A', 'G'), 'S': ('G', 'C'), 'T': ('T', 'U'), 'U': ('T', 'U'), 'V': ('A', 'C', 'G'), 'W': ('A', 'T', 'U'), 'Y': ('C', 'T', 'U')}¶
Dictionary mapping IUPAC nucleotide symbols to tuples of nucleotides they represent (e.g. R -> (A, G) )