plastid.bin.findjuncs module

This script identify all the unique splice junctions in one or more transcript annotations, and exports these as a BED file with one splice junction per line Optionally, this script can also export junctions as a Tophat .juncs file.

If a splice junction appears multiple times (e.g. used by more than one transcript), only the first occurrence of the junction will be reported. Scores, if present, are exported unaltered in BED output.

Examples:

# identify splice junctions from a transcript annotation supplied in GTF2
# creates output file 'annotation.bed'
$ findjuncs my_annotation --annotation_format GTF2 \
            --annotation_files transcripts.gtf
 
# merge unique annotations from annotation.bed and newly_discovered.bed,
# export only unique junctions to 'merged_unique.bed'
$ findjuncs merged_unique --annotation_format BED \
            --annotation_files annotation.bed newly_discovered.bed

See also

plastid.bin.slidejuncs
Script that makes richer comparisons between discovered and annotated junctions, using genomic sequence and plastid.bin.crossmap results to classify junctions

Command-line arguments

Positional arguments

Argument Description
outbase Basename for output files

Optional arguments

Argument Description
-h, --help show this help message and exit
--export_tophat Export tophat .juncs file in addition to BED output

Warning/error options

Argument Description
-q, --quiet Suppress all warning messages. Cannot use with ‘-v’.
-v, --verbose Increase verbosity. With ‘-v’, show every warning. With ‘-vv’, turn warnings into exceptions. Cannot use with ‘-q’. (Default: show each type of warning once)

Annotation file options (one or more annotation files required)

Open one or more genome annotation files

Argument Description
--annotation_files  infile.[BED | BigBed | GTF2 | GFF3 | PSL] [infile.[BED | BigBed | GTF2 | GFF3 | PSL] ...] Zero or more annotation files (max 1 file if BigBed)
--annotation_format  {BED,BigBed,GTF2,GFF3,PSL} Format of annotation_files (Default: GTF2). Note: GFF3 assembly assumes SO v.2.5.2 feature ontologies, which may or may not match your specific file.
--add_three If supplied, coding regions will be extended by 3 nucleotides at their 3’ ends (except for GTF2 files that explicitly include stop_codon features). Use if your annotation file excludes stop codons from CDS.
--tabix annotation_files are tabix-compressed and indexed (Default: False). Ignored for BigBed files.
--sorted annotation_files are sorted by chromosomal position (Default: False)

Bed-specific options

Argument Description
--bed_extra_columns  BED_EXTRA_COLUMNS [BED_EXTRA_COLUMNS ...] Number of extra columns in BED file (e.g. in custom ENCODE formats) or list of names for those columns. (Default: 0).

Bigbed-specific options

Argument Description
--maxmem  MAXMEM Maximum desired memory footprint in MB to devote to BigBed/BigWig files. May be exceeded by large queries. (Default: 0, No maximum)

Gff3-specific options

Argument Description
--gff_transcript_types  GFF_TRANSCRIPT_TYPES [GFF_TRANSCRIPT_TYPES ...] GFF3 feature types to include as transcripts, even if no exons are present (for GFF3 only; default: use SO v2.5.3 specification)
--gff_exon_types  GFF_EXON_TYPES [GFF_EXON_TYPES ...] GFF3 feature types to include as exons (for GFF3 only; default: use SO v2.5.3 specification)
--gff_cds_types  GFF_CDS_TYPES [GFF_CDS_TYPES ...] GFF3 feature types to include as CDS (for GFF3 only; default: use SO v2.5.3 specification)

Script contents

plastid.bin.findjuncs.main(argv=['-T', '-E', '-b', 'readthedocs', '-d', '_build/doctrees-readthedocs', '-D', 'language=en', '.', '_build/html'])[source]

Command-line program

Parameters:
argv : list, optional

A list of command-line arguments, which will be processed as if the script were called from the command line if main() is called directly.

Default: sys.argv[1:] (actually command-line arguments)