plastid.readers.autosql module¶
This module contains parsers for data structures written in the autoSql object specification language, used by the UCSC genome browser, BigBed files and BigWig files.
Summary¶
Parsers are constructed by initializing an AutoSqlDeclaration
with a block of
autoSql text:
>>> declaration = '''table easy_table
"A table with a comment on the next line"
(
uint number auto; "a number with a token"
uint [3] points ; "r,g,b values"
lstring my_string ; "a long string"
uint a_field_size ; "the size for the next field"
float [a_field_size] float_array ; "an array of floats"
set(a,b,c) alpha ; "the first three letters of the alphabet"
)
'''
>>> record_parser = AutoSqlDeclaration(declaration)
The parser that is created can then be called to parse text records into dictionaries:
>>> record_parser("3 1,2,3 my string with spaces 5 1.1,1.2,1.3,1.4,1.5 a,b")
OrderedDict([("number",3),
("points",(1,2,3)),
("my_string","my string with spaces"),
("a_field_size",5),
("float_array",(1.1,1.2,1.3,1.4,1.5)),
("alpha",{'a','b'}]))
Module contents¶
AutoSqlDeclaration
Parses autoSql declarations for table, simple, and object declaration types. Delegates parsing of individual fields to appropriate subclasses (e.g.
AutoSqlField
,SizedAutoSqlField
, andValuesAutoSqlField
).AutoSqlField
,SizedAutoSqlField
,ValuesAutoSqlField
Parse various sorts of fields within an autoSql declaration block
Notes¶
These parsers seek only to provide Python bindings for autoSql declarations. They do NOT generate C or SQL code from autoSql, as those functions are already provided by Jim Kent’s utilities
set
andenum
field types are parsed assets
of strings
primary
,index
, andauto
autoSQL tags are accepted in line declarations, but are ignored because they are not relevant for parsingThe parsers assume that they will be parsing tab-delimited text blocks
Although declarations are routinely nested as fields within other declarations in C
struct
s and in SQL databases, in the absence of a standard, it is unclear how these would be serialized within tab-delimited columns of BigBed files. Therefore, nested declarations are not supported..
See Also¶
- Updated autoSql Grammar specification
Explanation of autoSql grammar
- The ENCODE project’s tests for autoSql parsers
Official autoSql unit tests
- Kent & Brumbaugh, 2002
First publication of autoSql & autoXml
- class plastid.readers.autosql.AutoSqlDeclaration(autosql, parent=None, delim='\n')[source]¶
Bases:
plastid.readers.autosql.AbstractAutoSqlElement
Parser factory that converts delimited text blocks into OrderedDicts, following the field names and types described by an autoSql declaration element
- Parameters
- autosqlstr
Block of autoSql text specifying format of element
- parentinstance of subclass of |AbstractAutoSqlObject| or None, optional
Parent / enclosing element. Default: None
- delimstr, optional
Field delimiter (default: tab)
- Attributes
- attrdict
Dictionary of descriptive attributes (e.g. name, type, declare_type, et c)
- field_formattersOrderedDict
Dictionary mapping field names to type names
- field_commentsOrderedDict
Dictionary mapping field names to comments
- field_typesdict
Dictionary matching type names (as strings) to formatters that parse them from plaintext
- autosqlstr
Block of autoSql text specifying format of element
- match_pattern
re.RegexObject
Pattern that determines whether or not a block of autoSql matches this object
- parentinstance of subclass of |AbstractAutoSqlObject|, or None
Parent / enclosing element. Default: None
- delimstr, optional
Text delimiter for fields in blocks called by :py:meth:~__call__~ (Default: ” “)
Methods
:py:meth:`AutoSqlDeclaration.__call__`
Parse autoSql-formatted blocks of text according to this declaration
- add_type(name, formatter)¶
Add a type to the parser
- Parameters
- namestr
Name of data type
- formattercallable
Function/callable that, when applied to autoSql text, yields an object of the type specified by
name
- static mask_comments(text)¶
Mask all comments in an autoSql block in order to facilitate parsing by regular expressions
- Parameters
- textstr
autoSql-formatted text
- Returns
- str
Text with comments replaced by “xxxxxx” of same length
- list
List of (comment.start,comment.end), including quotes, for each comment in
text
- classmethod matches(text)¶
Determine whether autoSql formatting text matches this autoSql element
- Parameters
- textstr
Block of autoSql-formatted declaration text
- Returns
- bool
True an autoSql parser of this class’s type can be made from this specification, otherwise False
- match_pattern = re.compile('^\\s*(?P<declare_type>object|simple|table)\\s+(?P<declare_name>\\w+)\\s+\\"(?P<comment>[^\\"]*)\\"\\s*\\(\\s*(?P<field_text>.*)\\)', re.DOTALL)¶
- match_str = '^\\s*(?P<declare_type>object|simple|table)\\s+(?P<declare_name>\\w+)\\s+\\"(?P<comment>[^\\"]*)\\"\\s*\\(\\s*(?P<field_text>.*)\\)'¶
- class plastid.readers.autosql.AutoSqlField(autosql, parent=None, delim='')[source]¶
Bases:
plastid.readers.autosql.AbstractAutoSqlElement
Parser factory for autoSql fields of type
fieldType fieldName ';' comment
- Parameters
- autosqlstr
Block of autoSql text specifying format of element
- parentinstance of subclass of |AbstractAutoSqlObject| or None, optional
Parent / enclosing element. Default: None
- delimstr, optional
Field delimiter (default: tab)
- Attributes
- attrdict
Dictionary of descriptive attributes (e.g. name, type, et c)
- formattercallable
Callable/function that converts plain text into an object of the correct type
- autosqlstr
Block of autoSql text specifying format of element
- match_pattern
re.RegexObject
Pattern that determines whether or not a block of autoSql matches this object
- parentinstance of subclass of
AbstractAutoSqlObject
or None Parent / enclosing element (Default: None)
- delimstr, optional
Text delimiter for fields in blocks called by
__call__()
(Default: newline)
Methods
__call__
(text[, rec])Parse an value matching the field described by
self.autosql
from a block of delimited textadd_type
(name, formatter)Add a type to the parser
mask_comments
(text)Mask all comments in an autoSql block in order to facilitate parsing by regular expressions
matches
(text)Determine whether autoSql formatting text matches this autoSql element
- add_type(name, formatter)¶
Add a type to the parser
- Parameters
- namestr
Name of data type
- formattercallable
Function/callable that, when applied to autoSql text, yields an object of the type specified by
name
- static mask_comments(text)¶
Mask all comments in an autoSql block in order to facilitate parsing by regular expressions
- Parameters
- textstr
autoSql-formatted text
- Returns
- str
Text with comments replaced by “xxxxxx” of same length
- list
List of (comment.start,comment.end), including quotes, for each comment in
text
- classmethod matches(text)¶
Determine whether autoSql formatting text matches this autoSql element
- Parameters
- textstr
Block of autoSql-formatted declaration text
- Returns
- bool
True an autoSql parser of this class’s type can be made from this specification, otherwise False
- match_pattern = re.compile('^\\s*(?P<type>\\w+)\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+prima)¶
- match_str = '^\\s*(?P<type>\\w+)\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*;\\s*\\"(?P<comment>[^\\"]*)\\"'¶
- class plastid.readers.autosql.SizedAutoSqlField(autosql, size=1, parent=None, delim=',')[source]¶
Bases:
plastid.readers.autosql.AutoSqlField
Parser factory for autoSql fields of type
fieldType `[` fieldSize `]` fieldName ';' comment
- Parameters
- autosqlstr
Block of autoSql text specifying format of element
- parentinstance of subclass of |AbstractAutoSqlObject| or None, optional
Parent / enclosing element. Default: None
- delimstr, optional
Field delimiter (default: tab)
- Attributes
- attrdict
Dictionary of descriptive attributes (e.g. name, size, type, et c)
- formattercallable
Callable/function that converts plain text into an object of the correct type
- autosqlstr
Block of autoSql text specifying format of element
- match_pattern
re.RegexObject
Pattern that determines whether or not a block of autoSql matches this object
- parentinstance of subclass of
AbstractAutoSqlObject
or None Parent / enclosing element (Default: None)
- delimstr, optional
Text delimiter for fields in blocks called by
__call__()
(Default: newline)
Methods
:py:meth:`SizedAutoSqlField.__call__`
Parse autoSql-formatted blocks of text into the tuples of the object type specified by this field
- add_type(name, formatter)¶
Add a type to the parser
- Parameters
- namestr
Name of data type
- formattercallable
Function/callable that, when applied to autoSql text, yields an object of the type specified by
name
- static mask_comments(text)¶
Mask all comments in an autoSql block in order to facilitate parsing by regular expressions
- Parameters
- textstr
autoSql-formatted text
- Returns
- str
Text with comments replaced by “xxxxxx” of same length
- list
List of (comment.start,comment.end), including quotes, for each comment in
text
- classmethod matches(text)¶
Determine whether autoSql formatting text matches this autoSql element
- Parameters
- textstr
Block of autoSql-formatted declaration text
- Returns
- bool
True an autoSql parser of this class’s type can be made from this specification, otherwise False
- match_pattern = re.compile('^\\s*(?P<type>\\w+)\\s*\\[\\s*(?P<size>\\w+)\\s*\\]\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\)¶
- match_str = '^\\s*(?P<type>\\w+)\\s*\\[\\s*(?P<size>\\w+)\\s*\\]\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*;\\s*\\"(?P<comment>[^\\"]*)\\"'¶
- class plastid.readers.autosql.ValuesAutoSqlField(autosql, parent=None, delim=',')[source]¶
Bases:
plastid.readers.autosql.AbstractAutoSqlElement
Parser factory for autoSql fields of type
fieldType `(` fieldValues `)` fieldName ';' comment
wherefieldType
would typically beset
orenum
- Parameters
- autosqlstr
Block of autoSql text specifying format of element
- parentinstance of subclass of |AbstractAutoSqlObject| or None, optional
Parent / enclosing element. Default: None
- delimstr, optional
Field delimiter (default: tab)
Methods
__call__
(text[, rec])Parse an value matching the field described by
self.autosql
from a block of delimited textadd_type
(name, formatter)Add a type to the parser
mask_comments
(text)Mask all comments in an autoSql block in order to facilitate parsing by regular expressions
matches
(text)Determine whether autoSql formatting text matches this autoSql element
- add_type(name, formatter)¶
Add a type to the parser
- Parameters
- namestr
Name of data type
- formattercallable
Function/callable that, when applied to autoSql text, yields an object of the type specified by
name
- static mask_comments(text)¶
Mask all comments in an autoSql block in order to facilitate parsing by regular expressions
- Parameters
- textstr
autoSql-formatted text
- Returns
- str
Text with comments replaced by “xxxxxx” of same length
- list
List of (comment.start,comment.end), including quotes, for each comment in
text
- classmethod matches(text)¶
Determine whether autoSql formatting text matches this autoSql element
- Parameters
- textstr
Block of autoSql-formatted declaration text
- Returns
- bool
True an autoSql parser of this class’s type can be made from this specification, otherwise False
- match_pattern = re.compile('^\\s*(?P<type>\\w+)\\s*\\(\\s*(?P<value_names>[^()]+)\\s*\\)\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*)¶
- match_str = '^\\s*(?P<type>\\w+)\\s*\\(\\s*(?P<value_names>[^()]+)\\s*\\)\\s*\\s+(?P<name>\\w+)\\s*(?P<opt1>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt2>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*(?P<opt3>\\s+primary|\\s+auto|\\s+index\\s*(\\[\\s*\\d+\\s*\\])?)?\\s*;\\s*\\"(?P<comment>[^\\"]*)\\"'¶