plastid.genomics.seqtools module

Utilities for mutating and searching nucleic acid sequences.

Contents

TwoBitSeqRecordAdaptor(fh) Adaptor class that makes a twobitreader.TwoBitFile behave like a dictionary of Bio.SeqRecord.SeqRecord objects.
mutate_seqs(seqs[, nucleotides, mutations]) Generate all sequences within mutations distance from a reference sequence
seq_to_regex(inp[, flags]) Convert a nucleotide sequence of IUPAC nucleotide characters as a regular expression.
IUPAC_TABLE Dictionary mapping IUPAC nucleotide symbols to tuples of nucleotides they represent (e.g.
class plastid.genomics.seqtools.TwoBitSeqRecordAdaptor(fh)[source]

Bases: object

Adaptor class that makes a twobitreader.TwoBitFile behave like a dictionary of Bio.SeqRecord.SeqRecord objects.

plastid.genomics.seqtools.mutate_seqs(seqs, nucleotides='NACTG', mutations=1)[source]

Generate all sequences within mutations distance from a reference sequence

Parameters:
seqs : str or list of str

Single reference sequence (a string) or a group of strings

nucleotides : list of char, optional

Permitted nucleotide substitutions (Default: ‘NACTG’)

mutations : int, optional

Number of substitutions to make (Default: 1)

Returns:
set

all sequences within mutations substitutions from the sequence(s) specified in seqs

plastid.genomics.seqtools.random_seq(size, nucleotides='ACTG')[source]

Generate a random nucleotide sequence of length size and composition nucleotides

Parameters:
size : int

length of desired sequence

nucleotides : str, optional

string of nucleotides to use in sequence, in desired base composition (i.e. need not be unique; can supply ‘AATCG’ to increase ‘A’ bias. Default: ‘ACTG’)

Returns:
str : randomized DNA sequence
plastid.genomics.seqtools.revive(twobitreader, seqname)[source]
plastid.genomics.seqtools.seq_to_regex(inp, flags=0)[source]

Convert a nucleotide sequence of IUPAC nucleotide characters as a regular expression. Ambiguous IUPAC characters are converted to groups (e.g. ‘Y’ to ‘[CTU]’), and T and U are considered equivalent.

Parameters:
inp : str

Nucleotide sequence using IUPAC nucleotide codes

flags : int, optional

Flags to pass to re.compile() (Default: 0 / no flags)

Returns:
:py:class:`re.RegexObject`

Regular expression pattern corresponding to IUPAC sequence in inp

Examples

Convert a sequence to a regex:

>>> seq_to_regex("CARYYA").pattern
'CA[AG][CTU][CTU]A'
plastid.genomics.seqtools.IUPAC_TABLE = {'A': 'A', 'B': ('C', 'G', 'T', 'U'), 'C': 'C', 'D': ('A', 'G', 'T', 'U'), 'G': 'G', 'H': ('A', 'C', 'T', 'U'), 'K': ('G', 'T', 'U'), 'M': ('A', 'C'), 'N': ('A', 'C', 'T', 'G', 'U'), 'R': ('A', 'G'), 'S': ('G', 'C'), 'T': ('T', 'U'), 'U': ('T', 'U'), 'V': ('A', 'C', 'G'), 'W': ('A', 'T', 'U'), 'Y': ('C', 'T', 'U')}

Dictionary mapping IUPAC nucleotide symbols to tuples of nucleotides they represent (e.g. R -> (A, G) )