plastid.bin.get_count_vectors module

Fetch vectors of counts at each nucleotide position in one or more regions of interest (ROIs).

Output files

Vectors are saved as individual line-delimited files – one position per line – in a user-specified output folder. Each file is named for the ROI to which it corresponds. If a mask file – e.g. from crossmap – is provided, masked positions will be have value nan in output.


Command-line arguments

Positional arguments

Argument Description
out_folder Folder in which to save output vectors

Optional arguments

Argument Description
-h, --help show this help message and exit
--out_prefix  OUT_PREFIX Prefix to prepend to output files (default: no prefix)
--format  FORMAT printf-style format string for output (default: ‘%.8f’)

Warning/error options

Argument Description
-q, --quiet Suppress all warning messages. Cannot use with ‘-v’.
-v, --verbose Increase verbosity. With ‘-v’, show every warning. With ‘-vv’, turn warnings into exceptions. Cannot use with ‘-q’. (Default: show each type of warning once)

Count & alignment file options

Open alignment or count files and optionally set mapping rules

Argument Description
--count_files  COUNT_FILES [COUNT_FILES ...] One or more count or alignment file(s) from a single sample or set of samples to be pooled.
--countfile_format  {BAM,bigwig,bowtie,wiggle} Format of file containing alignments or counts (Default: BAM)
--normalize Whether counts should be normalized to counts per million (usually not. default: False)
--sum  SUM Sum used in normalization of counts and RPKM/RPNT calculations (Default: total mapped reads/counts in dataset)
--min_length  N Minimum read length required to be included (BAM & bowtie files only. Default: 25)
--max_length  N Maximum read length permitted to be included (BAM & bowtie files only. Default: 100)
--big_genome Use slower but memory-efficient implementation for big genomes or for memory-limited computers. For wiggle & bowtie files only.

Alignment mapping functions (bam & bowtie files only)

For BAM or bowtie files, one of the mutually exclusive read mapping functions is required:

Argument Description
--fiveprime_variable Map read alignment to a variable offset from 5’ position of read, with offset determined by read length. Requires –offset below
--fiveprime Map read alignment to 5’ position.
--threeprime Map read alignment to 3’ position
--center Subtract N positions from each end of read, and add 1/(length-N), to each remaining position, where N is specified by –nibble

Filtering and alignment mapping options

The remaining arguments are optional and affect the behavior of specific mapping functions:

Argument Description
--offset  OFFSET For –fiveprime or –threeprime, provide an integer representing the offset into the read, starting from either the 5’ or 3’ end, at which data should be plotted. For –fiveprime_variable, provide the filename of a two-column tab-delimited text file, in which first column represents read length or the special keyword ‘default’, and the second column represents the offset from the five prime end of that read length at which the read should be mapped. (Default: 0)
--nibble  N For use with –center only. nt to remove from each end of read before mapping (Default: 0)

Annotation file options (one or more annotation files required)

Open one or more genome annotation files

Argument Description
--annotation_files  infile.[BED | BigBed | GTF2 | GFF3] [infile.[BED | BigBed | GTF2 | GFF3] ...] Zero or more annotation files (max 1 file if BigBed)
--annotation_format  {BED,BigBed,GTF2,GFF3} Format of annotation_files (Default: GTF2). Note: GFF3 assembly assumes SO v.2.5.2 feature ontologies, which may or may not match your specific file.
--add_three If supplied, coding regions will be extended by 3 nucleotides at their 3’ ends (except for GTF2 files that explicitly include stop_codon features). Use if your annotation file excludes stop codons from CDS.
--tabix annotation_files are tabix-compressed and indexed (Default: False). Ignored for BigBed files.
--sorted annotation_files are sorted by chromosomal position (Default: False)

Bed-specific options

Argument Description
--bed_extra_columns  BED_EXTRA_COLUMNS [BED_EXTRA_COLUMNS ...] Number of extra columns in BED file (e.g. in custom ENCODE formats) or list of names for those columns. (Default: 0).
--mask_bed_extra_columns  MASK_BED_EXTRA_COLUMNS [MASK_BED_EXTRA_COLUMNS ...] Number of extra columns in BED file (e.g. in custom ENCODE formats) or list of names for those columns. (Default: 0).

Bigbed-specific options

Argument Description
--maxmem  MAXMEM Maximum desired memory footprint in MB to devote to BigBed/BigWig files. May be exceeded by large queries. (Default: 0, No maximum)
--mask_maxmem  MASK_MAXMEM Maximum desired memory footprint in MB to devote to BigBed/BigWig files. May be exceeded by large queries. (Default: 0, No maximum)

Gff3-specific options

Argument Description
--gff_transcript_types  GFF_TRANSCRIPT_TYPES [GFF_TRANSCRIPT_TYPES ...] GFF3 feature types to include as transcripts, even if no exons are present (for GFF3 only; default: use SO v2.5.3 specification)
--gff_exon_types  GFF_EXON_TYPES [GFF_EXON_TYPES ...] GFF3 feature types to include as exons (for GFF3 only; default: use SO v2.5.3 specification)
--gff_cds_types  GFF_CDS_TYPES [GFF_CDS_TYPES ...] GFF3 feature types to include as CDS (for GFF3 only; default: use SO v2.5.3 specification)
--mask_gff_transcript_types  MASK_GFF_TRANSCRIPT_TYPES [MASK_GFF_TRANSCRIPT_TYPES ...] GFF3 feature types to include as transcripts, even if no exons are present (for GFF3 only; default: use SO v2.5.3 specification)
--mask_gff_exon_types  MASK_GFF_EXON_TYPES [MASK_GFF_EXON_TYPES ...] GFF3 feature types to include as exons (for GFF3 only; default: use SO v2.5.3 specification)
--mask_gff_cds_types  MASK_GFF_CDS_TYPES [MASK_GFF_CDS_TYPES ...] GFF3 feature types to include as CDS (for GFF3 only; default: use SO v2.5.3 specification)

Mask file options (optional)

Add mask file(s) that annotate regions that should be excluded from analyses (e.g. repetitive genomic regions).

Argument Description
--mask_annotation_files  infile.[BED | BigBed | GTF2 | GFF3 | PSL] [infile.[BED | BigBed | GTF2 | GFF3 | PSL] ...] Zero or more annotation files (max 1 file if BigBed)
--mask_annotation_format  {BED,BigBed,GTF2,GFF3,PSL} Format of mask_annotation_files (Default: GTF2). Note: GFF3 assembly assumes SO v.2.5.2 feature ontologies, which may or may not match your specific file.
--mask_add_three If supplied, coding regions will be extended by 3 nucleotides at their 3’ ends (except for GTF2 files that explicitly include stop_codon features). Use if your annotation file excludes stop codons from CDS.
--mask_tabix mask_annotation_files are tabix-compressed and indexed (Default: False). Ignored for BigBed files.
--mask_sorted mask_annotation_files are sorted by chromosomal position (Default: False)

Script contents

plastid.bin.get_count_vectors.main(args=['-T', '-E', '-b', 'readthedocs', '-d', '_build/doctrees-readthedocs', '-D', 'language=en', '.', '_build/html'])[source]

Command-line program

Parameters:
argv : list, optional

A list of command-line arguments, which will be processed as if the script were called from the command line if main() is called directly.

Default: sys.argv[1:]. The command-line arguments, if the script is invoked from the command line