plastid.readers.autosql module

This module contains parsers for data structures written in the autoSql object specification language, used by the UCSC genome browser, BigBed files and BigWig files.

Summary

Parsers are constructed by initializing an AutoSqlDeclaration with a block of autoSql text:

>>> declaration = '''table easy_table
"A table with a comment on the next line" 
    (
    uint number auto; "a number with a token"
    uint [3] points ; "r,g,b values"
    lstring  my_string ; "a long string"
    uint a_field_size ; "the size for the next field"
    float [a_field_size] float_array ; "an array of floats"
    set(a,b,c) alpha ; "the first three letters of the alphabet"
    )
'''
>>> record_parser = AutoSqlDeclaration(declaration)

The parser that is created can then be called to parse text records into dictionaries:

>>> record_parser("3    1,2,3    my string with spaces    5    1.1,1.2,1.3,1.4,1.5    a,b")
OrderedDict([("number",3),
             ("points",(1,2,3)),
             ("my_string","my string with spaces"),
             ("a_field_size",5),
             ("float_array",(1.1,1.2,1.3,1.4,1.5)),
             ("alpha",{'a','b'}]))

Module contents

AutoSqlDeclaration
Parses autoSql declarations for table, simple, and object declaration types. Delegates parsing of individual fields to appropriate subclasses (e.g. AutoSqlField, SizedAutoSqlField, and ValuesAutoSqlField).
AutoSqlField, SizedAutoSqlField, ValuesAutoSqlField
Parse various sorts of fields within an autoSql declaration block

Notes

  1. These parsers seek only to provide Python bindings for autoSql declarations. They do NOT generate C or SQL code from autoSql, as those functions are already provided by Jim Kent’s utilities
  2. set and enum field types are parsed as sets of strings
  3. primary, index, and auto autoSQL tags are accepted in line declarations, but are ignored because they are not relevant for parsing
  4. The parsers assume that they will be parsing tab-delimited text blocks
  5. Although declarations are routinely nested as fields within other declarations in C struct s and in SQL databases, in the absence of a standard, it is unclear how these would be serialized within tab-delimited columns of BigBed files. Therefore, nested declarations are not supported..

See Also

Updated autoSql Grammar specification
Explanation of autoSql grammar
The ENCODE project’s tests for autoSql parsers
Official autoSql unit tests
Kent & Brumbaugh, 2002
First publication of autoSql & autoXml
class plastid.readers.autosql.AutoSqlDeclaration(autosql, parent=None, delim='n')[source]

Bases: plastid.readers.autosql.AbstractAutoSqlElement

Parser factory that converts delimited text blocks into OrderedDicts, following the field names and types described by an autoSql declaration element

Parameters:
autosql : str

Block of autoSql text specifying format of element

parent : instance of subclass of |AbstractAutoSqlObject| or None, optional

Parent / enclosing element. Default: None

delim : str, optional

Field delimiter (default: tab)

Attributes:
attr : dict

Dictionary of descriptive attributes (e.g. name, type, declare_type, et c)

field_formatters : OrderedDict

Dictionary mapping field names to type names

field_comments : OrderedDict

Dictionary mapping field names to comments

field_types : dict

Dictionary matching type names (as strings) to formatters that parse them from plaintext

autosql : str

Block of autoSql text specifying format of element

match_pattern : re.RegexObject

Pattern that determines whether or not a block of autoSql matches this object

parent : instance of subclass of |AbstractAutoSqlObject|, or None

Parent / enclosing element. Default: None

delim : str, optional

Text delimiter for fields in blocks called by :py:meth:~__call__~ (Default: ” “)

Methods

:py:meth:`AutoSqlDeclaration.__call__` Parse autoSql-formatted blocks of text according to this declaration
add_type(name, formatter)

Add a type to the parser

Parameters:
name : str

Name of data type

formatter : callable

Function/callable that, when applied to autoSql text, yields an object of the type specified by name

static mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

Parameters:
text : str

autoSql-formatted text

Returns:
str

Text with comments replaced by “xxxxxx” of same length

list

List of (comment.start,comment.end), including quotes, for each comment in text

classmethod matches(text)

Determine whether autoSql formatting text matches this autoSql element

Parameters:
text : str

Block of autoSql-formatted declaration text

Returns
bool

True an autoSql parser of this class’s type can be made from this specification, otherwise False

match_pattern = <_sre.SRE_Pattern object at 0x3eaa8d0>
match_str = '^\\s*(?P<declare_type>object|simple|table)\\s+(?P<declare_name>\\w+)\\s+\\"(?P<comment>[^\\"]*)\\"\\s*\\(\\s*(?P<field_text>.*)\\)'
class plastid.readers.autosql.AutoSqlField(autosql, parent=None, delim='')[source]

Bases: plastid.readers.autosql.AbstractAutoSqlElement

Parser factory for autoSql fields of type fieldType fieldName ';' comment

Parameters:
autosql : str

Block of autoSql text specifying format of element

parent : instance of subclass of |AbstractAutoSqlObject| or None, optional

Parent / enclosing element. Default: None

delim : str, optional

Field delimiter (default: tab)

Attributes:
attr : dict

Dictionary of descriptive attributes (e.g. name, type, et c)

formatter : callable

Callable/function that converts plain text into an object of the correct type

autosql : str

Block of autoSql text specifying format of element

match_pattern : re.RegexObject

Pattern that determines whether or not a block of autoSql matches this object

parent : instance of subclass of AbstractAutoSqlObject or None

Parent / enclosing element (Default: None)

delim : str, optional

Text delimiter for fields in blocks called by __call__() (Default: newline)

Methods

__call__(text[, rec]) Parse an value matching the field described by self.autosql from a block of delimited text
add_type(name, formatter) Add a type to the parser
mask_comments(text) Mask all comments in an autoSql block in order to facilitate parsing by regular expressions
matches(text) Determine whether autoSql formatting text matches this autoSql element
add_type(name, formatter)

Add a type to the parser

Parameters:
name : str

Name of data type

formatter : callable

Function/callable that, when applied to autoSql text, yields an object of the type specified by name

static mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

Parameters:
text : str

autoSql-formatted text

Returns:
str

Text with comments replaced by “xxxxxx” of same length

list

List of (comment.start,comment.end), including quotes, for each comment in text

classmethod matches(text)

Determine whether autoSql formatting text matches this autoSql element

Parameters:
text : str

Block of autoSql-formatted declaration text

Returns
bool

True an autoSql parser of this class’s type can be made from this specification, otherwise False

match_pattern = <_sre.SRE_Pattern object at 0x3eae3e0>
match_str = '^\\s*(?P<type>\\w+)\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*;\\s*\\"(?P<comment>[^\\"]*)\\"'
class plastid.readers.autosql.SizedAutoSqlField(autosql, size=1, parent=None, delim=', ')[source]

Bases: plastid.readers.autosql.AutoSqlField

Parser factory for autoSql fields of type fieldType `[` fieldSize `]` fieldName ';' comment

Parameters:
autosql : str

Block of autoSql text specifying format of element

parent : instance of subclass of |AbstractAutoSqlObject| or None, optional

Parent / enclosing element. Default: None

delim : str, optional

Field delimiter (default: tab)

Attributes:
attr : dict

Dictionary of descriptive attributes (e.g. name, size, type, et c)

formatter : callable

Callable/function that converts plain text into an object of the correct type

autosql : str

Block of autoSql text specifying format of element

match_pattern : re.RegexObject

Pattern that determines whether or not a block of autoSql matches this object

parent : instance of subclass of AbstractAutoSqlObject or None

Parent / enclosing element (Default: None)

delim : str, optional

Text delimiter for fields in blocks called by __call__() (Default: newline)

Methods

:py:meth:`SizedAutoSqlField.__call__` Parse autoSql-formatted blocks of text into the tuples of the object type specified by this field
add_type(name, formatter)

Add a type to the parser

Parameters:
name : str

Name of data type

formatter : callable

Function/callable that, when applied to autoSql text, yields an object of the type specified by name

static mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

Parameters:
text : str

autoSql-formatted text

Returns:
str

Text with comments replaced by “xxxxxx” of same length

list

List of (comment.start,comment.end), including quotes, for each comment in text

classmethod matches(text)

Determine whether autoSql formatting text matches this autoSql element

Parameters:
text : str

Block of autoSql-formatted declaration text

Returns
bool

True an autoSql parser of this class’s type can be made from this specification, otherwise False

match_pattern = <_sre.SRE_Pattern object at 0x3eaef10>
match_str = '^\\s*(?P<type>\\w+)\\s*\\[\\s*(?P<size>\\w+)\\s*\\]\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*;\\s*\\"(?P<comment>[^\\"]*)\\"'
class plastid.readers.autosql.ValuesAutoSqlField(autosql, parent=None, delim=', ')[source]

Bases: plastid.readers.autosql.AbstractAutoSqlElement

Parser factory for autoSql fields of type fieldType `(` fieldValues `)` fieldName ';' comment where fieldType would typically be set or enum

Parameters:
autosql : str

Block of autoSql text specifying format of element

parent : instance of subclass of |AbstractAutoSqlObject| or None, optional

Parent / enclosing element. Default: None

delim : str, optional

Field delimiter (default: tab)

Methods

__call__(text[, rec]) Parse an value matching the field described by self.autosql from a block of delimited text
add_type(name, formatter) Add a type to the parser
mask_comments(text) Mask all comments in an autoSql block in order to facilitate parsing by regular expressions
matches(text) Determine whether autoSql formatting text matches this autoSql element
add_type(name, formatter)

Add a type to the parser

Parameters:
name : str

Name of data type

formatter : callable

Function/callable that, when applied to autoSql text, yields an object of the type specified by name

static mask_comments(text)

Mask all comments in an autoSql block in order to facilitate parsing by regular expressions

Parameters:
text : str

autoSql-formatted text

Returns:
str

Text with comments replaced by “xxxxxx” of same length

list

List of (comment.start,comment.end), including quotes, for each comment in text

classmethod matches(text)

Determine whether autoSql formatting text matches this autoSql element

Parameters:
text : str

Block of autoSql-formatted declaration text

Returns
bool

True an autoSql parser of this class’s type can be made from this specification, otherwise False

match_pattern = <_sre.SRE_Pattern object at 0x3eb4b50>
match_str = '^\\s*(?P<type>\\w+)\\s*\\(\\s*(?P<value_names>[^()]+)\\s*\\)\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*;\\s*\\"(?P<comment>[^\\"]*)\\"'