plastid.readers.autosql module

This module contains parsers for data structures written in the autoSql object specification language, used by the UCSC genome browser, BigBed files and BigWig files.

Summary

Parsers are constructed by initializing an AutoSqlDeclaration with a block of autoSql text:

>>> declaration = '''table easy_table
"A table with a comment on the next line" 
    (
    uint number auto; "a number with a token"
    uint [3] points ; "r,g,b values"
    lstring  my_string ; "a long string"
    uint a_field_size ; "the size for the next field"
    float [a_field_size] float_array ; "an array of floats"
    set(a,b,c) alpha ; "the first three letters of the alphabet"
    )
'''
>>> record_parser = AutoSqlDeclaration(declaration)

The parser that is created can then be called to parse text records into dictionaries:

>>> record_parser("3    1,2,3    my string with spaces    5    1.1,1.2,1.3,1.4,1.5    a,b")
OrderedDict([("number",3),
             ("points",(1,2,3)),
             ("my_string","my string with spaces"),
             ("a_field_size",5),
             ("float_array",(1.1,1.2,1.3,1.4,1.5)),
             ("alpha",{'a','b'}]))

Module contents

AutoSqlDeclaration

Parses autoSql declarations for table, simple, and object declaration types. Delegates parsing of individual fields to appropriate subclasses (e.g. AutoSqlField, SizedAutoSqlField, and ValuesAutoSqlField).

AutoSqlField, SizedAutoSqlField, ValuesAutoSqlField

Parse various sorts of fields within an autoSql declaration block

Notes

  1. These parsers seek only to provide Python bindings for autoSql declarations. They do NOT generate C or SQL code from autoSql, as those functions are already provided by Jim Kent’s utilities

  2. set and enum field types are parsed as sets of strings

  3. primary, index, and auto autoSQL tags are accepted in line declarations, but are ignored because they are not relevant for parsing

  4. The parsers assume that they will be parsing tab-delimited text blocks

  5. Although declarations are routinely nested as fields within other declarations in C struct s and in SQL databases, in the absence of a standard, it is unclear how these would be serialized within tab-delimited columns of BigBed files. Therefore, nested declarations are not supported..

See Also

Updated autoSql Grammar specification

Explanation of autoSql grammar

The ENCODE project’s tests for autoSql parsers

Official autoSql unit tests

Kent & Brumbaugh, 2002

First publication of autoSql & autoXml

class plastid.readers.autosql.AutoSqlDeclaration(autosql, parent=None, delim='\n')[source]

Bases: plastid.readers.autosql.AbstractAutoSqlElement

Parser factory that converts delimited text blocks into OrderedDicts, following the field names and types described by an autoSql declaration element

Parameters
autosqlstr

Block of autoSql text specifying format of element

parentinstance of subclass of |AbstractAutoSqlObject| or None, optional

Parent / enclosing element. Default: None

delimstr, optional

Field delimiter (default: tab)

Attributes
attrdict

Dictionary of descriptive attributes (e.g. name, type, declare_type, et c)

field_formattersOrderedDict

Dictionary mapping field names to type names

field_commentsOrderedDict

Dictionary mapping field names to comments

field_typesdict

Dictionary matching type names (as strings) to formatters that parse them from plaintext

autosqlstr

Block of autoSql text specifying format of element

match_patternre.RegexObject

Pattern that determines whether or not a block of autoSql matches this object

parentinstance of subclass of |AbstractAutoSqlObject|, or None

Parent / enclosing element. Default: None

delimstr, optional

Text delimiter for fields in blocks called by :py:meth:~__call__~ (Default: ” “)

Methods

:py:meth:`AutoSqlDeclaration.__call__`

Parse autoSql-formatted blocks of text according to this declaration

add_type(name, formatter)

Add a type to the parser

Parameters
namestr

Name of data type

formattercallable

Function/callable that, when applied to autoSql text, yields an object of the type specified by name

static mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

Parameters
textstr

autoSql-formatted text

Returns
str

Text with comments replaced by “xxxxxx” of same length

list

List of (comment.start,comment.end), including quotes, for each comment in text

classmethod matches(text)

Determine whether autoSql formatting text matches this autoSql element

Parameters
textstr

Block of autoSql-formatted declaration text

Returns
bool

True an autoSql parser of this class’s type can be made from this specification, otherwise False

match_pattern = re.compile('^\\s*(?P<declare_type>object|simple|table)\\s+(?P<declare_name>\\w+)\\s+\\"(?P<comment>[^\\"]*)\\"\\s*\\(\\s*(?P<field_text>.*)\\)', re.DOTALL)
match_str = '^\\s*(?P<declare_type>object|simple|table)\\s+(?P<declare_name>\\w+)\\s+\\"(?P<comment>[^\\"]*)\\"\\s*\\(\\s*(?P<field_text>.*)\\)'
class plastid.readers.autosql.AutoSqlField(autosql, parent=None, delim='')[source]

Bases: plastid.readers.autosql.AbstractAutoSqlElement

Parser factory for autoSql fields of type fieldType fieldName ';' comment

Parameters
autosqlstr

Block of autoSql text specifying format of element

parentinstance of subclass of |AbstractAutoSqlObject| or None, optional

Parent / enclosing element. Default: None

delimstr, optional

Field delimiter (default: tab)

Attributes
attrdict

Dictionary of descriptive attributes (e.g. name, type, et c)

formattercallable

Callable/function that converts plain text into an object of the correct type

autosqlstr

Block of autoSql text specifying format of element

match_patternre.RegexObject

Pattern that determines whether or not a block of autoSql matches this object

parentinstance of subclass of AbstractAutoSqlObject or None

Parent / enclosing element (Default: None)

delimstr, optional

Text delimiter for fields in blocks called by __call__() (Default: newline)

Methods

__call__(text[, rec])

Parse an value matching the field described by self.autosql from a block of delimited text

add_type(name, formatter)

Add a type to the parser

mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

matches(text)

Determine whether autoSql formatting text matches this autoSql element

add_type(name, formatter)

Add a type to the parser

Parameters
namestr

Name of data type

formattercallable

Function/callable that, when applied to autoSql text, yields an object of the type specified by name

static mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

Parameters
textstr

autoSql-formatted text

Returns
str

Text with comments replaced by “xxxxxx” of same length

list

List of (comment.start,comment.end), including quotes, for each comment in text

classmethod matches(text)

Determine whether autoSql formatting text matches this autoSql element

Parameters
textstr

Block of autoSql-formatted declaration text

Returns
bool

True an autoSql parser of this class’s type can be made from this specification, otherwise False

match_pattern = re.compile('^\\s*(?P<type>\\w+)\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+prima)
match_str = '^\\s*(?P<type>\\w+)\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*;\\s*\\"(?P<comment>[^\\"]*)\\"'
class plastid.readers.autosql.SizedAutoSqlField(autosql, size=1, parent=None, delim=',')[source]

Bases: plastid.readers.autosql.AutoSqlField

Parser factory for autoSql fields of type fieldType `[` fieldSize `]` fieldName ';' comment

Parameters
autosqlstr

Block of autoSql text specifying format of element

parentinstance of subclass of |AbstractAutoSqlObject| or None, optional

Parent / enclosing element. Default: None

delimstr, optional

Field delimiter (default: tab)

Attributes
attrdict

Dictionary of descriptive attributes (e.g. name, size, type, et c)

formattercallable

Callable/function that converts plain text into an object of the correct type

autosqlstr

Block of autoSql text specifying format of element

match_patternre.RegexObject

Pattern that determines whether or not a block of autoSql matches this object

parentinstance of subclass of AbstractAutoSqlObject or None

Parent / enclosing element (Default: None)

delimstr, optional

Text delimiter for fields in blocks called by __call__() (Default: newline)

Methods

:py:meth:`SizedAutoSqlField.__call__`

Parse autoSql-formatted blocks of text into the tuples of the object type specified by this field

add_type(name, formatter)

Add a type to the parser

Parameters
namestr

Name of data type

formattercallable

Function/callable that, when applied to autoSql text, yields an object of the type specified by name

static mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

Parameters
textstr

autoSql-formatted text

Returns
str

Text with comments replaced by “xxxxxx” of same length

list

List of (comment.start,comment.end), including quotes, for each comment in text

classmethod matches(text)

Determine whether autoSql formatting text matches this autoSql element

Parameters
textstr

Block of autoSql-formatted declaration text

Returns
bool

True an autoSql parser of this class’s type can be made from this specification, otherwise False

match_pattern = re.compile('^\\s*(?P<type>\\w+)\\s*\\[\\s*(?P<size>\\w+)\\s*\\]\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\)
match_str = '^\\s*(?P<type>\\w+)\\s*\\[\\s*(?P<size>\\w+)\\s*\\]\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*;\\s*\\"(?P<comment>[^\\"]*)\\"'
class plastid.readers.autosql.ValuesAutoSqlField(autosql, parent=None, delim=',')[source]

Bases: plastid.readers.autosql.AbstractAutoSqlElement

Parser factory for autoSql fields of type fieldType `(` fieldValues `)` fieldName ';' comment where fieldType would typically be set or enum

Parameters
autosqlstr

Block of autoSql text specifying format of element

parentinstance of subclass of |AbstractAutoSqlObject| or None, optional

Parent / enclosing element. Default: None

delimstr, optional

Field delimiter (default: tab)

Methods

__call__(text[, rec])

Parse an value matching the field described by self.autosql from a block of delimited text

add_type(name, formatter)

Add a type to the parser

mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

matches(text)

Determine whether autoSql formatting text matches this autoSql element

add_type(name, formatter)

Add a type to the parser

Parameters
namestr

Name of data type

formattercallable

Function/callable that, when applied to autoSql text, yields an object of the type specified by name

static mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

Parameters
textstr

autoSql-formatted text

Returns
str

Text with comments replaced by “xxxxxx” of same length

list

List of (comment.start,comment.end), including quotes, for each comment in text

classmethod matches(text)

Determine whether autoSql formatting text matches this autoSql element

Parameters
textstr

Block of autoSql-formatted declaration text

Returns
bool

True an autoSql parser of this class’s type can be made from this specification, otherwise False

match_pattern = re.compile('^\\s*(?P<type>\\w+)\\s*\\(\\s*(?P<value_names>[^()]+)\\s*\\)\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*)
match_str = '^\\s*(?P<type>\\w+)\\s*\\(\\s*(?P<value_names>[^()]+)\\s*\\)\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*;\\s*\\"(?P<comment>[^\\"]*)\\"'