TOOLS FOR DISCOVERY

Software

This page serves as an index for the applications written and distributed by the Yandell, Marth, and Quinlan labs. Each item may include links to: documentation, code, and publications.

Software is listed with most recent releases first.

Filter by:

JIGV

Quinlan Lab

igv.js server and automatic configuration to view bam/cram/vcf/bed. igv.js requires that the files are hosted on a server, like apache or nginx and it requires writing html and javascript. In a single binary, jigv provides a server and some default configuration, javascript, and HTML enabling a simple entrypoint... jigv --open-browser --region chr1:34566-34999 *.bam /path/to/some.cram my.vcf.gz

SOMALIER

Quinlan Lab

Extract informative sites, evaluate relatedness, and perform quality-control on BAM, CRAM, BCF, VCF, and GVCF. somalier makes checking any number of samples for identity easy directly from the alignments.

SLIVAR

Quinlan Lab

Search, and install genomic data packages. Build and check new ggd data packages. ggd provides easy access to processed genomic data. It removes the difficulties and complexities with finding and processing the data sets and annotations germane to your experiments and/or analyses. You can quickly and easily search and install data package using ggd. ggd also offers tools to easily create and contribute data packages to ggd.

GGD

Quinlan Lab

Search, and install genomic data packages. Build and check new ggd data packages. ggd provides easy access to processed genomic data. It removes the difficulties and complexities with finding and processing the data sets and annotations germane to your experiments and/or analyses. You can quickly and easily search and install data package using ggd. ggd also offers tools to easily create and contribute data packages to ggd.

D4-FORMAT

Quinlan Lab

The D4 Quantatative Data Format. We sought to improve on existing formats such as BigWig and compressed BED files by creating the Dense Depth Data Dump (D4) format and tool suite. The D4 format is adaptive in that it profiles a random sample of aligned sequence depth from the input BAM or CRAM file to determine an optimal encoding that minimizes file size, while also enabling fast data access. We show that D4 uses less disk space for both RNA-Seq and whole-genome sequencing and offers 3 to 440 fold speed improvements over existing formats for random access, aggregation and summarization for scalable downstream analyses that would be otherwise intractable.

SMOOVE-NF

Quinlan Lab

Nextflow implementation of the smoove workflow, integrating several other tools meant to facilate variant calling and quality control of discovered variants.

SEQCOVER

Quinlan Lab

seqcover is a tool for viewing and evaluating depth-of-coverage with the following aims... show a global view where it's easy to see problematic samples and genes offer an interactive gene-wise view to explore coverage characteristics of individual samples within each gene not require a server (single html page) be responsive for up to 20 samples * 200 genes and be useful for a single-sample see how we do this highlight outlier samples based on any number of (summarized) background samples

SAMPLOT

Quinlan Lab

samplot is a command line tool for rapid, multi-sample structural variant visualization. samplot takes SV coordinates and bam files and produces high-quality images that highlight any alignment and depth signals that substantiate the SV.

ONCOGEMINI

OncoGEMINI is an adaptation of GEMINI intended for the improved identification of biologically and clincally relevant tumor variants from multi-sample and longitudinal tumor sequencing data. Using a GEMINI-compatible database (generated from an annotated VCF file), OncoGEMINI is able to filter tumor variants based on included genomic annotations and various allele frequency signatures.

MOSDEPTH

Quinlan Lab

Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing. mosdepth can output... per-base depth about 2x as fast samtools depth--about 25 minutes of CPU time for a 30X genome. mean per-window depth given a window size--as would be used for CNV calling. the mean per-region given a BED file of regions. the mean or median per-region cumulative coverage histogram given a window size a distribution of proportion of bases covered at or above a given threshold for each chromosome and genome-wide. quantized output that merges adjacent bases as long as they fall in the same coverage bins e.g. (10-20) threshold output to indicate how many bases in each region are covered at the given thresholds. A summary of mean depths per chromosome and within specified regions per chromosome. a d4 file (better than bigwig).

IDPLOT

Quinlan Lab

Designed to accelerate SARS-CoV-2 research, idplot allows one to quickly compare similar sequences (*.fasta) to a reference (.fasta) with options to inspect recombination and similarity within an interactive report.

FREEBAYES-NF

Quinlan Lab

A simplified version of freebayes-parallel written in Nextflow to handle job distribution on HPC resources. Intervals can be supplied by the user or created automatically to optimize compute utilization.

COVVIZ

Quinlan Lab

A many-sample coverage browser. The aim of covviz is to highlight regions of significant and sustained deviation of coverage depth from the majority of samples.

DUPHOLD

Quinlan Lab

Uphold your DUP and DEL calls. SV callers like lumpy look at split-reads and pair distances to find structural variants. This tool is a fast way to add depth information to those calls. This can be used as additional information for filtering variants; for example we will be skeptical of deletion calls that do not have lower than average coverage compared to regions with similar gc-content.

SMOOVE

Quinlan Lab

smoove simplifies and speeds calling and genotyping SVs for short reads. It also improves specificity by removing many spurious alignment signals that are indicative of low-level noise and often contribute to spurious calls.

STRLING

Quinlan Lab

STRling (pronounced like "sterling") is a method to detect large STR expansions from short-read sequencing data. It is capable of detecting novel STR expansions, that is expansions where there is no STR in the reference genome at that position (or a different repeat unit from what is in the reference). It can also detect STR expansions that are annotated in the reference genome. STRling uses kmer counting to recover mis-mapped STR reads. It then uses soft-clipped reads to precisely discover the position of the STR expansion in the reference genome.

INDEXCOV

Quinlan Lab

Crazy fast genome coverage estimates! The BAM and CRAM formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample.

GIGGLE

Quinlan Lab

Giggle is Google for genomic features and intervals. That is, scalable, multi-file index for fast queries of genomic intervals.

VARPRISM (VARiant PRIoritization SuM)

Variant Prioritization

Yandell Lab

A software package that identifies genes with a statistical excess of damaging de novo mutations among individuals with a genetic disease. VARPRISM incorporates functional variant prediction information (the VAAST CASM score) to improve the statistical power of risk gene mapping and controls for local mutation rate heterogeneity. The beta version of VARPRISM is currently available for download.

VCFAnno

Variant Annotation

Quinlan Lab

Annotates a VCF with any number of sorted and tabixed input BED, BAM, and VCF files in parallel. It does this by finding overlaps as it streams over the data and applying user-defined operations on the overlapping annotations.

Taxonomer

Metagenomics

Yandell Lab,Marth Lab,Eilbeck Lab

Taxonomer is an ultrafast web-tool for comprehensive metagenomics data analysis and interactive results visualization. Taxonomer is unique in providing integrated nucleotide and protein-based classification and simultaneous host mRNA transcript profiling.

RUFUS

Variant Calling

Marth Lab

A new approach to variant detection that does not rely on mapping or whole genome assembly methods.

WHAM (WHole-genome Alignment Metrics)

Variant Calling

Yandell Lab

A structural variant (SV) caller that integrates several sources of mapping information to identify SVs. WHAM classifies SVs using a flexible and extendable machine-learning algorithm (random forest).

Genotype Query Tools - GQT

Genome Query Tools (GQT)

Data Management,Query Tools

Quinlan Lab

A command line tool and a C API for storing and querying large-scale genotype data sets like those produced by 1000 Genomes, the Uk100K, and forthcoming datasets involving millions of genomes.

SpeedSeq

Variant Annotation,Variant Calling

Marth Lab,Quinlan Lab

An open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement.

Peddy

Pedigree Analysis

Quinlan Lab

Compares familial-relationships and sexes as reported in a PED file with those inferred from a VCF.

MAKER-P

Genome Annotation

Yandell Lab

A pipeline designed to make the annotation of novel plant genomes tractable for small groups with limited bioinformatics experience and resources, and faster and more transparent for large groups with more experience and resources.

Iobio

Data Visualization

Marth Lab

iobio uses immediate visual feedback to make understanding complex genomic datasets more intuitive, and analysis more interactive.

Poretools

Quinlan Lab

A flexible toolkit for exploring datasets generated by nanopore sequencing devices from MinION for the purposes of quality control and downstream analysis.

Tangram

Variant Calling

Marth Lab

A C/C++ command line toolbox for structural variation(SV) detection.

BEDTools

Data Management

Quinlan Lab

A swiss-army knife of tools for a wide-range of genomics analysis tasks. Intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely used genomic file formats such as BAM, BED, GFF, VCF.

Lumpy

Variant Calling

Quinlan Lab

A probabilistic framework to integrate multiple structural variation signals such as discordant paired-end alignments and split-read alignments.

pVAAST (pedigree Variant Annotation, Analysis & Search Tool)

Pedigree Analysis

Yandell Lab,Jorde Lab

A disease-gene identification tool designed for high-throughput sequence data in pedigrees.

PHEVOR (Phenotype Driven Variant Ontological Re-ranking tool)

Phenotype Tools,Variant Prioritization

Yandell Lab,Eilbeck Lab

Integrates phenotype, gene function, and disease information with personal genomic data for improved power to identify disease-causing alleles.

MOSAIK

Marth Lab

A stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome.

GEMINI

Data Management,Query Tools

Quinlan Lab

A powerful framework for exploring genetic variation in the context of the wealth of existing genome annotations that are available for the human genome.

GPAT ++ (Genotype Phenotype Association Toolkit)

Phenotype Tools

Yandell Lab

The application of population genomics to non-model organisms is greatly facilitated by the low cost of next generation sequencing (NGS).

ImagePlane

Data Visualization

Yandell Lab

Python based software for the automated analysis of images of the animal S. mediterranea. This software allows quantification and categorization of the animal's morphology.

MAKER

Genome Annotation

Yandell Lab

A portable and easily configurable genome annotation pipeline.

VAAST 2 (Variant Annotation, Analysis & Search Tool)

Variant Prioritization

Yandell Lab,Jorde Lab

Probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences.

BamTools

Variant Calling

Marth Lab,Quinlan Lab

A Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.

RepeatRunner

Genome Annotation

Yandell Lab

A CGL-based program that integrates RepeatMasker with BLASTX to provide a comprehensive means of identifying repetitive elements.

CGL (Comparitive Genomics Library, and pronounced as “Seagull”)

Yandell Lab

Provides an informatics infrastructure for a laboratory, department, or research institute engaged in the large-scale analysis of genomes and their annotations.

Freebayes

Marth Lab

A Bayesian genetic variant detector designed to find small polymorphisms.

Scissors

Marth Lab

A split-read aligner that maps orphaned read mates (i.e. where one end-mate is aligned with high mapping quality, but the other mate is unmapped), as well as re-maps severely clipped reads (reads mapped with many unaligned or “clipped-off” bases).

SAY HELLO

Talk with us

Questions about a role, a graduate program rotation, or a partnership idea — we'd rather hear from you than miss you.