Search Software

  • AbokiaBLAST is a parallel implementation of NCBI BLAST created by the inventors of the open-source mpiBLAST project. AbokiaBLAST inherits the super-scalable architecture from mpiBLAST but is re-factored and re-engineered to offer production quality.

  • ABySS is a de novo, parallel, paired-end genomic sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

  • The TAS software provides analysis capabilities specifically for Affymetrix GeneChip Tiling Arrays. 
  • ALLPATHS-LG is a whole-genome shotgun assembler that can generate high-quality genome assemblies using short reads (~100bp) such as those produced by the new generation of sequencers.

  • AMOS is collection of tools and class interfaces for the assembly of DNA sequencing reads. The package includes a robust infrastructure, modular assembly pipelines, and tools for overlapping, consensus generation, contiging, and assembly manipulation.

  • Apollo is a Genome Brower developed by the Berkeley Drosophila Genome Project (www.bdgp.org) and Ensembl (www.ensembl.org).

  • Artemis is a genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation. It can read complete EMBL and GENBANK database entries or sequence in FASTA or raw format.

  • Artemis DNA Comparison Tool (ACT) is a graphic viewer for comparing sequences and features of two entries. It can read complete EMBL and GENBANK entries or sequence in FASTA or raw format. Extra sequence features can be in EMBL, GENBANK, or GFF format.

  • AUGUSTUS is a gene prediction program for eukaryotes written by Mario Stanke and Oliver Keller. It can be used as an ab initio program, which means it bases its prediction purely on the sequence.

  • BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files (SAMtools is a superior SAM/BAM toolkit).

  • BCFtools are meant as a faster replacement for most of the perl VCFTools

  • BEAST is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models.

  • The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM.

  • BEERS is a simulation engine for generating RNA-Seq data. BEERS was designed to benchmark RNA-Seq alignment algorithms and also algorithms that aim to reconstruct different isoforms and alternate splicing from RNA-Seq data. 
     
  • BFAST facilitates the fast and accurate mapping of short reads to reference sequences, where mapping billions of short reads with variants is of utmost importance.

  • Bioconductor is an open source and open development software project to provide tools for the analysis and comprehension of genomic data.

  • BioJava is an open-source framework for building tools for biological data analysis; including manipulating sequences, file parsers, CORBA interoperability, DAS, access to ACeDB, dynamic programming, and simple statistical routines.
  • BioPerl is a toolkit of Perl modules useful for building bioinformatics programs in perl.

  • Read SAM/BAM databases within Bioperl. http://search.cpan.org/~lds/Bio-SamTools/lib/Bio/DB/Sam.pm

  • BioRuby is a package of Open Source Ruby code, with classes for DNA and protein sequence analysis, alignment, database parsing, and other Bioinformatics tools. 

  • The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences.

  • The Basic Local Alignment Search Tool (BLAST) is the most widely used sequence similarity tool.

  • Blast2GO is an all-in-one tool for functional annotation of (novel) sequences and the analysis of annotation data.

  • BLAT, the BLAST-like alignment tool, is a DNA/Protein Sequence Analysis program. It is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more.
     
  • BlueGnome BlueFuse for Microarrays provides leading solutions for detection of copy number variation in the human genome.

  • Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour.
  • BreakDancer-1.1, released under GPLv3, is a Perl/Cpp package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads. It includes two complementary programs.

  • The contig assembly program (CAP) is an effective program for assembling DNA fragments. 

  • Illumina's Consensus Assessment of Sequence and Variation (CASAVA) software captures summary information for resequencing and counting studies and places the data in a compact structure for visualization within GenomeStudio Software or publicly available analysis tools.

  • CLC Genomics Workbench (CLCGWB) is a cross-platform desktop application and graphical user interface for visualizing and analyzing next-generation sequencing data.

  • CLC Main workbench is a software environment for large number of DNA, RNA, and protein sequence analyses, combined with gene expression analysis, data management, and graphical viewing and output options.

  • Clover is a program for identifying functional sites in DNA sequences. For a set of DNA sequences that share a common function, it will compare them to a library of sequence motifs (e.g.

  • ClustalW is a general purpose multiple alignment program for DNA or proteins. ClustalX is a graphical user interface for the ClustalW multiple sequence alignment program.

  • "CMfinder is a RNA motif prediction tool. It is an expectation maximization algorithm using covariance models for motif description, carefully crafted heuristics for effective motif search, and a novel Bayesian framework for structure prediction combining folding energy and sequence covariation.

  • CREST (Clipping Reveals Structure) is a new algorithm for detecting genomic structural variations at base-pair resolution using next-generation sequencing data. Publication: http://www.ncbi.nlm.nih.gov/pubmed/21666668

     

  • Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts.

  • Cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.

  • The DeCypher enterprise biocomputing system combines bioinformatics applications with high throughput FPGA accerator hardware to achieve an ideal blend of accuracy, performance and value. It can be used for Tera-BLAST, Smith-Waterman, HMM analysis, GeneDetective and Tera-Probe searches.

  • deFuse is a software package for gene fusion discovery using RNA-Seq data.

  • Lasergene Core Suite is a comprehensive DNA, RNA, and protein sequence analysis software suite comprised of ten applications which include functions ranging from sequence assembly and SNP detection, to automated virtual cloning and primer design, to creating publication-quality illustrations of y

  • DOTUR is a computer program that takes a distance matrix describing the genetic distance between DNA sequence data and assigns sequences to operational taxonomic units (OTUs) using either the furthest, average, or nearest neighbor algorithms for all possible distances that can be described using

  • ea-utils provides command-line tools for processing biological sequencing data using methods such as barcode demultiplexing, adapter trimming, etc.  It was primarily written to support an Illumina based pipeline - but should work with any FASTQs.

  • EMBOSS (The European Molecular Biology Open Software Suite) is a new, free open source software analysis package specially developed for the needs of the molecular biology user community.
  • Exonerate is a generic tool for pairwise sequence comparison. 

    Home Page: http://www.ebi.ac.uk/~guy/exonerate/

  • FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines.

  • FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million of sequences in a reasonable amount of time and memory.

  • The FASTX-Toolkit is a collection of command line tools for preprocessing short nucleotide reads in FASTA and FASTQ formats, usually produced by Next-Generation sequencing machines.

  • FIRE is a motif discovery component of IGET. The software is available at: https://tavazoielab.c2b2.columbia.edu/FIRE/

  • Galaxy is a web-based platform for data intensive biomedical research.  Anyone with an account at MSI can access the Galaxy server at galaxy.msi.umn.edu

  • The GATK is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy. Secondly, it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas.

  • Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).

  • GeneHunter is a program for linkage analysis.

  • The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary.  It is based on a C library named “libgenometools” which consists of several modules.

  • Genscan is a program for predicting the locations and exon-intron structures of genes in genomic sequences from a variety of organisms.   GENSCAN was developed by Chris Burge in the research group of Samuel Karlin, Department of Mathematics, Stanford University.

  • Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria and archaea. Glimmer (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA.

  • GMAP is a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets.

  • Goby is a next-gen gene sequence data management framework designed to facilitate the implementation of efficient data analysis pipelines.

  • HMMER is an implementation of profile hidden Markov model (HMM) methods for sensitive database searches using multiple sequence alignments as queries.  HMMER takes a multiple sequence alignment as input.
  • HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.

  • The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.

  • GenomeStudio is a Software package from Illumina. It can be used for data analysis to support wide range of genetic analysis assays. It also provides data visualization and result analysis for Illumina assay platforms. 

  • Jody Hey's Isolation with Migration model software.

  • The Integrated Genome Browser (IGB, pronounced ig-bee) is an application intended for visualization and exploration of genomes and corresponding annotations from multiple data sources. It is an extension of the Affymetrix software suite.

  • JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence.

  • The JoinMap is an advanced computer software for the calculation of genetic linkage maps in experimental populations. It provides high quality tools that allow detailed study of the experimental data and the generation of publication-ready map charts.

  • LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.
     
  • libsequence is a C++ library designed to aid writing applications for genomics and evolutionary genetics. A large amount of the library is dedicated to the analysis of "single nucleotide polymorphism", or SNP data.

  • lumpy is a general probabilistic framework for structural variant discovery.

  • MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites.

  • MacVector is a sequence analysis software for Macintosh computers.

  • A portable and easily configurable genome annotation pipeline.

  • MapSplice is an algorithm for mapping RNA-seq data to reference genome for splice junction discovery.

  • Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data.

  • MEGA is an integrated tool for conducting automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses.

  • MEGAN  (MEta-Genome ANalyzer) is a program for analyzing metagenomic datasets.  MEGAN can perform classification calculations, as well as allow for taxonomic visualization.

  • The MEME/MAST is software system used for discovering motif (highly conserved regions) in groups of related DNA or protein (MEME) and searching sequence databases using motifs (MAST).

  • Meta-MEME is a software toolkit for building and using motif-based hidden Markov models of DNA and proteins. The input to Meta-MEME is a set of similar protein sequences, as well as a set of motif models discovered by MEME.

  • MIRA is software for sequence assembly.

  • Molpopgen is software for Molecular Population Genetics

  • MOSAIK is a reference-guided assembler comprising of two main modular programs:

  • Mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data.
     
  • mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST.

  • MrBayes is a program for Bayesian inference of phylogeny using Markov Chain Monte Carlo methods. MrBayes has a console interface and uses a modified NEXUS format for data and batch files.
  • msff is a software filter program based on minor allele frequency 

  • msstats is a software for sequence analysis 

  • MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form.
  • MUSCLE, MUltiple Sequence Comparison by Log-Expectation, is a program for creating multiple alignments of amino acid or nucleotide sequences.

  • Novoalign is a highly accurate program for mapping next-generation sequencing reads to a reference database. It is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser.
  • Oases is a de novo transcriptome assembler designed to produce transcripts from short read sequencing technologies, such as Illumina, SOLiD, or 454 in the absence of any genomic assembly.

  • The Paired-end synchronization check package includes two scripts for determining if the reads in paired-end fastq files are in the proper order (synchronized). 
  • PAML is a program package for phylogenetic analyses of DNA or protein sequences using maximum likelihood.

    Possible uses of the programs are:

  • Partek Genomics Suite (Partek GS) is a software suite of statistics and interactive data visualization designed to extract biological signals from noisy data.

  • PAUP (Phylogentic Analysis Using Parsimony) is a package for the inference of evolutionary trees. It can be used for analyzing molecular sequences, morphological data, and others. The program uses maximum likelihood, parsimony, and distance methods.

  • PerM is a software package which was designed to perform highly efficient genome scale alignments for hundreds of millions of short reads produced by the ABI SOLiD and Illumina sequencing platforms.

  • Phred - Base calling software with quality estimation; Phrap - Program for shotgun sequence assembly; Consed - Sequence assembly editor companion to Phrap; Swat and CrossMatch - Sequence alignment tools; Phrapview - A graphical tool that provides a "global" view of the phrap assembly, complementa

  • PHYLIP (the PHYLogeny Inference Package) is one of most popular package of programs for inferring phylogenies (evolutionary trees).

    These programs include:

  • "The PHYLIPNEW programs are EMBOSS conversions of the programs in Joe Felsenstein's PHYLIP package, version 3.69."

    http://saf.bio.caltech.edu/hhmi_manuals/embassy_apps/phylipnew/

  • Picard is a collection of Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files. Both SAM text format and SAM binary (BAM) format are supported.

  • PLINK is a open-source toolset for analysis of genotype/phenotype data for whole genome association study.

  • PolyPhred is a program that compares fluorescence-based sequences across traces obtained from different individuals to identify heterozygous sites for single nucleotide substitutions. PolyPhred is not a stand alone application.

  • Primer3 is a program for designing primers for PCR reactions.

  • QIIME (canonically pronounced ‘Chime’) is a pipeline for performing microbial community analysis that integrates many third party tools which have become standard in the field. 

     

    For more information on Qiime specific tutorials, please refer to:

  • Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Illumina sequencing reads.

  • rhothetapost is software to estimate mutation and recombination rates from multilocus data. 

  • The riboPicker tool can be used to automatically identify and efficiently remove rRNA-like sequences from metatranscriptomic and metagenomic datasets.

    At MSI, we provide a stand-alone version of the tool.  So the user must have some linux background to use this tool.

  • A collection of utility scripts developed by the RISS group at MSI, mainly geared towards next-generation sequence data.

  • "RMBlast is a RepeatMasker compatible version of the standard NCBI BLAST+ suite. The primary difference between this distribution and the NCBI distribution is the addition of a new program "rmblastn" for use with RepeatMasker and RepeatModeler."

  •  

    "RNAshapes offers three powerful RNA analysis tools in one single software package:

  • The 454/Roche Genome Sequencer FLX System software contains tools for data processing and analysis associated with GS FLX instrument. It contains:
    • GS De Novo Assembler Software
    • GS Reference Mapper Software
    • GS Amplicon Variant Analyzer Software
  • SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAM tools provide efficient utilities for manipulating alignments in the SAM and Bam formats.

  • Sequencher DNA software for sequencing works including contig assembly, editing, restriction enzyme mapping, heterozygote detection, cDNA to genomic DNA large gap alignment, ORF, motif and SNP analysis

  • Sequtils contains simple useful tools and libraries for Sequence Analysis and Processing. 

  • Software which uses a clustering approach for identification of enriched domains from histone modification ChIP-Seq data.

  • SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads.

  • SPADE: Spanning Tree Progression of Density Normalized Events

    SPADE is a visualization and analysis tool for high-dimensional flow cytometry data. SPADE is implemented as an R package.

  • SSAHA2 (Sequence Search and Alignment by Hashing Algorithm) is a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences.SSAHA2 reads of most sequencing platforms (ABI-Sanger, Roche 454, Illumina-Solexa) and a range of output for

  • T-Coffee is a versatile tool for making multiple sequence alignments. Its aim is to combine heterogeneous sources of information. The current implementation has been especially designed for combining local and global alignments, respectively from ClustalW and Lalign.

  • TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. 

  • Trans-ABySS is a software pipeline for analyzing ABySS-assembled contigs from shotgun transcriptome data. The pipeline accepts assemblies that were generated across a wide range of k values in order to address variable transcript expression levels.

  • trf

    "TRF (Tandem Repeats Finder) is a program to locate and display tandem repeats in DNA sequences."

    Homepage:

    http://www.mybiosoftware.com/sequence-analysis/4083

  • Trinity is a program for the efficient and robust de novo (no reference genome required) reconstruction of transcriptomes from RNA-seq data.

  • UNAFold is a comprehensive software package for nucleic acid folding and hybridization prediction. The name is derived from "Unified Nucleic Acid Folding". Folding of single-stranded RNA or DNA, or hybridization between two single-strands, is accomplished in a variety of ways.

  • Vector NTI is a bioinformatics software package for DNA and protein sequence analysis.

  • Velvet is a software for De Novo Short Read Assembly Using De Bruijn Graphs.It can be used for Solexa and 454 sequencing data assembly.

  • The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.

  • Windows QTL Cartographer is a tool for mapping quantitative traits. It allows importing and exporting data in various formats and also includes a powerful tool for summarizing and presenting results.