plastid.genomics.seqtools module

Utilities for mutating and searching nucleic acid sequences.

Contents

TwoBitSeqRecordAdaptor(fh)

Adaptor class that makes a twobitreader.TwoBitFile behave like a dictionary of Bio.SeqRecord.SeqRecord objects.

mutate_seqs(seqs[, nucleotides, mutations])

Generate all sequences within mutations distance from a reference sequence

seq_to_regex(inp[, flags])

Convert a nucleotide sequence of IUPAC nucleotide characters as a regular expression.

IUPAC_TABLE

Dictionary mapping IUPAC nucleotide symbols to tuples of nucleotides they represent (e.g.

class plastid.genomics.seqtools.TwoBitSeqRecordAdaptor(fh)[source]

Bases: object

Adaptor class that makes a twobitreader.TwoBitFile behave like a dictionary of Bio.SeqRecord.SeqRecord objects.

plastid.genomics.seqtools.mutate_seqs(seqs, nucleotides='NACTG', mutations=1)[source]

Generate all sequences within mutations distance from a reference sequence

Parameters
seqsstr or list of str

Single reference sequence (a string) or a group of strings

nucleotideslist of char, optional

Permitted nucleotide substitutions (Default: ‘NACTG’)

mutationsint, optional

Number of substitutions to make (Default: 1)

Returns
set

all sequences within mutations substitutions from the sequence(s) specified in seqs

plastid.genomics.seqtools.random_seq(size, nucleotides='ACTG')[source]

Generate a random nucleotide sequence of length size and composition nucleotides

Parameters
sizeint

length of desired sequence

nucleotidesstr, optional

string of nucleotides to use in sequence, in desired base composition (i.e. need not be unique; can supply ‘AATCG’ to increase ‘A’ bias. Default: ‘ACTG’)

Returns
strrandomized DNA sequence
plastid.genomics.seqtools.revive(twobitreader, seqname)[source]
plastid.genomics.seqtools.seq_to_regex(inp, flags=0)[source]

Convert a nucleotide sequence of IUPAC nucleotide characters as a regular expression. Ambiguous IUPAC characters are converted to groups (e.g. ‘Y’ to ‘[CTU]’), and T and U are considered equivalent.

Parameters
inpstr

Nucleotide sequence using IUPAC nucleotide codes

flagsint, optional

Flags to pass to re.compile() (Default: 0 / no flags)

Returns
re.RegexObject

Regular expression pattern corresponding to IUPAC sequence in inp

Examples

Convert a sequence to a regex:

>>> seq_to_regex("CARYYA").pattern
'CA[AG][CTU][CTU]A'
plastid.genomics.seqtools.IUPAC_TABLE = {'A': 'A', 'B': ('C', 'G', 'T', 'U'), 'C': 'C', 'D': ('A', 'G', 'T', 'U'), 'G': 'G', 'H': ('A', 'C', 'T', 'U'), 'K': ('G', 'T', 'U'), 'M': ('A', 'C'), 'N': ('A', 'C', 'T', 'G', 'U'), 'R': ('A', 'G'), 'S': ('G', 'C'), 'T': ('T', 'U'), 'U': ('T', 'U'), 'V': ('A', 'C', 'G'), 'W': ('A', 'T', 'U'), 'Y': ('C', 'T', 'U')}

Dictionary mapping IUPAC nucleotide symbols to tuples of nucleotides they represent (e.g. R -> (A, G) )