Programs and databases used in the storing, retrieving, analysis and manipulation of biological information including (but not limited to) DNA/RNA and protein sequences.
ACT (Artemis Comparison Tool) is a DNA sequence comparison viewer based on Artemis. It can read complete EMBL and GENBANK entries or sequence in FASTA or raw format. Extra sequence features can be in EMBL, GENBANK, or GFF format.
http://www.sanger.ac.uk/Software/ACT/
None
The user manual can be found at http://www.sanger.ac.uk/Software/ACT/v2/manual/
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load act
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load act endif ...
Once you have setup your environment, you can use the any of the programs directly from the command line. egact
Version: v2
Labs: Computational Genetics Laboratory
System(s): Unix workstations
Categories: Bioinformatics
Apollo is a Genome Brower developed by the Berkeley Drosophila Genome Project (www.bdgp.org) and Ensembl (www.ensembl.org).
http://www.ensembl.org/apollo/
none
Apollo User Guide can be found at http://www.ensembl.org/apollo/apolloguide.html
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load apollo
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for Apollo every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load apollo endif ...
Once you have setup your environment, you can use program directly from the command line. egapollo
Version: 0.2
Labs: Computational Genetics Laboratory
System(s): UNIX workstations
Categories: Bioinformatics
Arachne is a tool for assembling genome sequences from whole genome shotgun reads, mostly in forward-reverse pairs obtained by sequencing clone ends. It was developed by Massachusetts Institute of Technology.As input, Arachne expects the base calls and associated quality scores of each read (as is produced by most base-calling software, such as PHRED), as well as ancillary information about each read (in a standard format described herein).
As output, Arachne produces a list of supercontigs ("scaffolds"), each of which consists of an ordered list of contigs, all forward-oriented, and the estimates for the gaps between them within the supercontig. Base calls and quality scores are provided for each contig, along with the approximate locations of the reads which were used to build it. We also produce a summary and brief analysis of the assembly.
http://www-genome.wi.mit.edu/wga/
None
Arachne is described in detail in
User's Manual
Release notes
faq
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this enter the following command:module load arachne
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for Arachne every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load arachne endif ...
After you login into dtccbl.msi.umn.edu and setup your environment,
1). Create a directory for your data: % mkdir ~/arachne_data
2). Set the env variable to point to your data dir: % setenv ARACHNE_DATA_DIR /home/cbla/username/arachne_data
3). Copy the directories vector, e_coli_transposons, e_coli and dtds from
/usr/local/Arachne_v2.0.1/data/ to your arachne_data directory
4). Copy your data to a directory under arachne_data directory. e.g. mouse_exampleTo run the program:
%cd /usr/local/Arachne_v2.0.1/bin
% Assemble DATA=mouse_example RUN=run
Version: Arachne v2.0.1
Labs: Computational Biology Laboratory
System(s): Unix workstations
Categories: Bioinformatics
Artemis is a genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation. It can read complete EMBL and GENBANK database entries or sequence in FASTA or raw format. Extra sequence features can be in EMBL, GENBANK or GFF format.
http://www.sanger.ac.uk/Software/Artemis/
None
The user manual can be found at http://www.sanger.ac.uk/Software/Artemis/stable/manual/
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load artemis
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load artemis endif ...
Once you have setup your environment, you can use the any of the programs directly from the command line. egart
Version: v5
Labs: Computational Genetics Laboratory
System(s): Unix workstations
Categories: Bioinformatics
BLAT is a fast sequence search command line tool. Developed by Jim Kent at the University of California at Santa Cruz, it is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 22 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein blat on land vertebrates.BLAT is not BLAST. DNA BLAT works by keeping an index of the entire genome in memory. The index consists of all non-overlapping 11-mers except for those heavily involved in repeats. The index takes up a bit less than a gigabyte of RAM. The index is used to find areas of probable homology, which are then loaded into memory for a detailed alignment. Protein BLAT works in a similar manner, except with 4-mers rather than 11-mers. The protein index takes a little more than 2 gigabytes
http://genome.ucsc.edu/cgi-bin/hgBlat?command=start
None
blat command option
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for BLAT every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, login at bs1.msi.umn.edu and then you can use the any of the BLAT programs directly from the command line. egblat database query [-ooc=11.ooc] output.psl
Version: v 19
Labs: Basic Sciences Computing Lab, Computational Genetics Laboratory
System(s): SUN workstations
Categories: Bioinformatics
BioJava is an open-source framework for building tools for biological data analysis; including manipulating sequences, file parsers, CORBA interoperability, DAS, access to ACeDB, dynamic programming, and simple statistical routines. Bioinformatics programs, from simple scripts to complete applications, can be built using BioJava.
http://www.biojava.org/
Tutorial
http://www.biojava.org/docs/api/index.html
You must initialize the class paths in order to access the library. To do this, enter the following command:module load biolibrary
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for BioJava every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module add biolibrary endif ...
javac yourjavaprogram.java
java yourjavaprogram
Version: v1.3
Labs:Computational Genetics Laboratory
System(s): All UNIX workstations
Categories: Bioinformatics
Bioperl is a tookit of perl modules useful in building bioinformatics programs in perl. The collection of modules in the bioperl makes easy for developing bioinformatics applications such as sequences analysis, as well as creating graphical interfaces (bioperl-gui), persistent storage in RDMBS (bioperl-db), running and parsing the results from hundreds of bioinformatics applications (bioperl-run).
http://www.bioperl.org/
Tutorial
http://www.bioperl.org/Core/Latest/modules.html
The bioperl has been configured into the same path as perl's. In order to access bioperl modules, include the following to your perl scripts:
1. #!/usr/local/bin/perl
2.use Bio::modulename e.g. use Bio::DB::GenBank;
Version: v1.2.2
Labs:Computational Genetics Laboratory
System(s): SUN workstations
Categories: Bioinformatics
Bioconductor is an open source and open development software project to provide tools for the analysis and comprehension of genomic data. The broad goals of the projects are to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data, to facilitate the integration of biological metadata in the analysis of experimental data, and to allow the rapid development of extensible, scalable, and interoperable software.
http://www.bioconductor.org/
http://www.bioconductor.org/
Version: 1.4
Labs: Scientific Development and Visualization Lab, Computational Genetics Laboratory,
Scientific Development and Visualization Lab-sdvlapp1
System(s): All Sun machines and PC(sdvlapp1)
Categories: Bioinformatics, Microarray Analysis
The Bioinformatics Toolbox extends MATLAB to provide an integrated software environment for genome and proteome analysis. People can use the basic bioinformatic functions provided with this toolbox to create more complex algorithms and applications in drug discovery, genetic engineering, and biological research.
http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ug/
http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ug/
Version: 1.1
Labs: Scientific Development and Visualization Lab, Computational Genetics Laboratory,
Scientific Development and Visualization Lab-sdvlapp1
System(s): All Sun machines and PC(sdvlapp1)
Categories: Bioinformatics, Microarray Analysis
The contig assembly program (CAP) is an effective program for assembling DNA fragement. It was developed by Xiaoqiu Huang at Iowa State University.The CAP3 program includes a number of improvements and new features. The program has a capability to clip 5' and 3' low-quality regions of reads. It uses base quality values in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. The program also uses forward-reverse constraints to correct assembly errors and link contigs. Results of CAP3 on four BAC data sets are presented. The performance of CAP3 was compared with that of PHRAP on a number of BAC data sets. PHRAP often produces longer contigs than CAP3 whereas CAP3 often produces fewer errors in consensus sequences than PHRAP. It is easier to construct scaffolds with CAP3 than with PHRAP on low-pass data with forward-reverse constraints.
http://genome.cs.mtu.edu/cap/cap3.html
None
The documnentation associated with program doc is available
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for CAP3 every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, then you can use the any of the these programs directly from the command line. egcap3 file_of_reads
Version:3
Labs: Basic Sciences Computing Lab, Computational Genetics Laboratory
System(s): All SGI and SUN workstations
Categories: Bioinformatics
Bruker Daltonics CLINPROT system is an integrated set of tools for biomarker discovery and clinial proteomics search.CLINPROT supports a profiling workflow to detect biomarker patterns indicative of specific diseases in biological fluids. In a second workflow individual biomarker candidates can be identified by using Bruker Daltonics TOF/TOF technology.
http://www.bdal.de
None
User manual available when you login mist.msi.umn.edu
Login Dell PC (mist.msi.umn.edu) at BSCL
Start -> All Programs -> Bruker Daltonics -> Clinprotools
Version:1.0
Labs: Computational Genetics Laboratory, Basic Science Computing Lab
System(s): Windows
Categories: Proteomics, Bioinformatics
Clustal W is a general purpose multiple alignment program for DNA or proteins.Clustal X is a windows interface for the ClustalW multiple sequence alignment program. It provides an integrated environment for performing multiple sequence and profile alignments and analysing the results. The sequence alignment is displayed in a window on the screen. A versatile coloring scheme has been incorporated allowing you to highlight conserved features in the alignment. The pull-down menus at the top of the window allow you to select all the options required for traditional multiple sequence and profile alignment. You can cut-and-paste sequences to change the order of the alignment; you can select a subset of sequences to be aligned; you can select a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted.
http://bess.u-strasbg.fr/BioInfo/ClustalX/Top.html
None
There is extensive online documentation within the program.
On-line help can be found for clustalw.doc and Clustal X.
To use UNIX version of program, you must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module add bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module add bioinformatics endif ...
Once you have setup your environment, you can use the program directly from the command line. egclustalw or clustalx
Version:v1.83
Labs: IBM SP, Basic Sciences Computing Lab, Computational Genetics Laboratory, Scientific Development and Visualization Lab-sdvlapp1, Medicinal Chemistry/Supercomputing Institute Visualization-Workstation Laboratory
System(s): all UNIX workstations and windows if applicable
Categories: Bioinformatics
Clover is a program for identifying functional sites in DNA sequences. For a set of DNA sequences that share a common function, it will compare them to a library of sequence motifs (e.g. transcription factor binding patterns), and identify which if any of the motifs are statistically overrepresented in the sequence set.
http://zlab.bu.edu/clover/
http://nar.oupjournals.org/cgi/content/abstract/32/4/1372
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized automatically every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, you can use programs directly from the command line. egclover -t 0.05 mymotifs myseqs.fa background1.fa background2.fa
Version: NA
Labs: Computational Genetics Laboratory
System(s): Sun
Categories: Bioinformatics
Cluster and TreeView are programs that provide a computational graphical environment for analyzing data from DNA microarray experiments, or other genomic datasets. The program Cluster organizes and analyzes the data in a number of different ways. TreeView allows the organized data to be visualized and browsed.
http://rana.lbl.gov/EisenSoftware.htm
None
Manual available
Version: v2.11 (Cluster) and v1.60 (TreeView)
Labs: Scientific Development and Visualization Lab-sdvlapp1
System(s): PC
Categories:Microarray Analysis, Data Analysis and Data Mining, Bioinformatics
Cytoscape is a bioinformatics software platform for visualizing molecular interaction networks and integrating these interactions with gene expression profiles and other state data.
http://cytoscape.org
Online tutorial can be found at http://cytoscape.org/intro.html
Manual can be found at http://cytoscape.org/
Software was installed at PCs at CGL
Version:1.1.1
Labs:Computational Genetics Laboratory
System(s):Windows
Categories: Bioinformatics
DNAStar - Lasergene sequence analysis software is a comprehensive suite of tools for Sequence Assembly, Gene Discovery, Protein Structure Prediction, Sequence Alignment, Primer Design, Restriction Mapping.
http://www.dnastar.com
None
Production documents can be found at http://www.dnastar.com
Login sdvlapp1.msi.umn.edu and run DNAStar program there.Start -> Programs -> DNASTAR -> ...
Version: 6.0
Labs: Scientific Development and Visualization Lab
System(s): Windows
Categories: Bioinformatics
DNA Protein Search (DPS) is a DNA sequence against a protein sequence database program. It was developed by Xiaoqiu Huang at Iowa State University.The DPS program compares a DNA sequence to a protein database. The DPS enhances the existing methods by addressing the problems of frameshifts and introns. DPS computes high-scoring chains of segment pairs, where segment pairs in a chain can be from different reading frames and there can be an intervening DNA sequence between adjacent segment pairs in a chain.
http://genome.cs.mtu.edu/sas.html
None
The documentation part of program dps is available
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for DPS every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, login at bs1.msi.umn.edu and then you can use dps program directly from the command line. egdps DNA_seq Protein_Database BLOSUM62 [options] > result
Version:3
Labs: Basic Sciences Computing Lab
System(s): SUN workstations
Categories: Bioinformatics
EMBOSS (The European Molecular Biology Open Software Suite) is a new, free open source software analysis package specially developed for the needs of the molecular biology user community. Within EMBOSS you will find around 100 programs (applications) for sequence alignment, database searching with sequence patterns, protein motif identification and domain analysis, nucleotide sequence pattern analysis, codon usage analysis for small genomes, and much more.
A list of applications that are included with the EMBOSS package can be found in http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/
http://www.uk.embnet.org/Software/EMBOSS/
Introduction to Sequence Analysis using EMBOSS (http://www.uk.embnet.org/Software/EMBOSS/Doc/Tutorial/)
To see what kind of databases accessible by EMBOSS package, on UNIX prompt type:
module load bioinformatics
showdb
http://www.uk.embnet.org/Software/EMBOSS/Doc/You can also display a program's help documentation manual by typing:
tfm program_name
Web Interface to EMBOSS can be accessed at http://cgls1.msi.umn.edu/software/emboss.htmlNote: You need to be a CGL user and register with your University of Minnesota's X.500 username to the Institute before you can use it.
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for EMBOSS every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module add bioinformatics endif ...
Once you have setup your environment, you can use the any of the programs directly from the command line. egwossname search
will find EMBOSS programs containing "search" in their one-line documentation.or
profit -infile myprofile -sequence mysequence -outfile myoutput
will scan a sequence or database with a matrix or profile.
Version: v2.7.1
Labs: Basic Sciences Computing Lab, Computational Genetics Laboratory
System(s): All UNIX workstations
Categories: Bioinformatics
The GeneData Expressionist suite is a computational system from GeneData Inc. for analyzing gene expression data from any one- or two-channel microarrays.
It consists of three closely integrated modules:
Refiner Data quality diagnosis and correction tool, which can be set up to pre-process data either in a fully interactive or in a entirely automated way.
CoBi: Oracle®-based data management system, which includes project and user management capabilities as well as gene annotation content for all commercially available Affymetrix® chips.
Analyst: Tools used for statistical analysis. Among other functionalities it features the latest machine learning algorithms to perform experiment classification and allows for automation of repetitive analysis tasks.
http://cgls1.msi.umn.edu/
http://www.genedata.com/
Getting Start Tour is available at http://cgls1.msi.umn.edu:12053/docu/getting_started_tour.pdf
Documentation Library is available at http://cgls1.msi.umn.edu:12080/docu/index.html
and http://cgls1.msi.umn.edu:12053/docu/index.html
Please register with Computational Genetics Laboratory to get your Expressionist username and password.
You then can logon to Expressionist at http://cgls1.msi.umn.edu/
Client requirement: Window PC with a minimum of 256 MB RAM; web browser supporting Java Web Start,
Version:5.0, 4.0, and 3.1
Labs:Computational Genetics Laboratory
System(s): Web Access
Categories: Microarray Analysis, Bioinformatics
FASTA a suite of programs which compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library. Please see W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448 for more information.These programs include:
- fasta34
FASTA searches a protein or DNA sequence data bank- ssearch34 ssearch34_t
SSEARCH searches a sequence database using the Smith-Waterman algorithm- fastx34 fastx34_t fasty34 fasty34_t
FASTXY compares a DNA sequence to a protein sequence data bank- fastf34 fastf34_t
FASTF compares mixed peptides to a protein databank- tfastf34 tfastf34_t
TFASTF compares mixed peptides to a translated DNA data bank- fasts34 fasts34_t tfasts34 tfasts34_t
FASTS or TFASTS compares linked peptides to a protein databank- tfasta34 tfasta34_t
TFASTA translates and searches a DNA sequence data bank- prss34
PRSS compares a query sequence to shuffled sequences using the Smith-Waterman algorithm
http://fasta.bioch.virginia.edu/
none
None
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for FASTA every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, you can use the any of the FASTA programs directly from the command line. egfasta34
Version: v3.4t23d6
Labs: IBM Power4, Basic Sciences Computing Lab, Computational Genetics Laboratory, Medicinal Chemistry/Supercomputing Institute Visualization-Workstation Laboratory
System(s): All Unix workstations if applicable
Categories: Bioinformatics
FASTLINK is a significantly modified and improved version of the main programs of LINKAGE that runs much faster sequentially, can run in parallel, allows the user to recover gracefully from a computer crash, and provides abundant new documentation.These programs included:
- ilink
- linkmap
- lodscore
- mlink
- unknown
http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/fastlink.html
none
FASTLINK references can be found at http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/fastref.html
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load fastlink
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for FASTLINK every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load fastlink endif ...
Once you have setup your environment, you can use the fastlink programs directly from the command line. eglinkmap ...
Version: 4.1P
Labs: Computational Genetics Laboratory
System(s): UNIX workstations
Categories: Bioinformatics
FinchTV is program to view sequence traces
http://www.geospiza.com/finchtv/index.htm
Version: v1.1
Labs: Scientific Development and Visualization Lab-sdvlapp1
System(s): Windows
Categories: Bioinformatics
FingerPrinted Contigs (FPC) is an interactive program for building contigs from fingerprinted clones.
http://www.genome.arizona.edu/fpc/
Tutotorial can be downloaded from http://www.genome.arizona.edu/software/fpc/download/
FPC User's Guider and User's Manual at http://www.genome.arizona.edu/software/fpc/download/ are available.
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load genomics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load genomics endif ...
Once you have setup your environment, you can use the programs directly from the command line. egfpc &
Version: 7.0
Labs: Computational Genetics Laboratory, Basic Sciences Computing Lab
System(s): UNIX(SUN)
Categories: Bioinformatics
The GCG Wisconsin Package is an integrated package featuring a comprehesive collection of DNA-, RNA-, and protein-sequence-analysis tools. It integrates many software such as recent the MEME/MAST suite of de novo pattern-discovery programs, the Staden sequence-analysis package, the HMMER Markov-model construction suite, and the PFAM protein-motif database.
Two user interfaces, Command Line and SeqLab are currently provided.
This package provides access to over 130 programs. The major functions of GCG include:
- Comparison
- Database Searching and Retrieval
- DNA/RNA Secondary Structure
- Editing and Publication
- Evolution
- Fragment Assembly
- Gene Finding and Pattern Recognition
- Importing and Exporting
- Mapping
- Primer Selection
- Protein Analysis
- Translation
http://www.accelrys.com/products/gcg_wisconsin_package/index.html
The available databases, release number, and release date for GCG Wisconsin Package will be displayed when you initiate your GCG working environment by typing in gcg command at the prompt. More information can be found at the GCG Database Tables
The SeqLab Tutorial book from GCG is available at bscl:
These User Guides from GCG are available at bscl:
More information can be accessed by typing in genhelp or genmanual on UNIX terminal.
- Program Manual (volume 1)
- Program Manual (volume 2)
- User's Guide UNIX
- Command-Line Summary
- SeqLab Guide
- SeqLab Tutorial
- Training Course
Online documentation can be found for GCG Help, GCG Manual, and User's Guide (access is allowed only to University of Minnesota Network).
To use gcg, login into cgls1.msi.umn.edu or bi7.msi.umn.edu using ssh. You then call gcgstartup script to define key GCG environment variables and "gcg" alias by typing:source /usr/local/gcg/gcgstartupOr put these lines into .cshrc file at your home directory
if ( -e /usr/local/gcg/gcgstartup ) thenThen type:
source /usr/local/gcg/gcgstartup
endif
gcgto initialize GCG working environment. You will see a brief information about GCG Wisconsin Package and available databases. Then you are ready to use any of program in GCG package.To use GCG X-window interface SeqLab, on UNIX terminal, type in
seqlab &
Version: 10.2 and 10.3
Labs: Basic Sciences Computing Lab, Computational Genetics Laboratory
System(s): SGI workstation (bi7) and SUN (cgls1)
Categories: Bioinformatics
GENECONV is a statistical tests for detecting gene conversion. With given an alignment of DNA or protein sequences, GENECONV finds the most likely candidates for aligned gene conversion events between pairs of sequences in the alignment, as well as the most likely candidates for a gene conversion events from outside of the alignment. Candidate events are ranked by multiple-comparison corrected P-values and listed to a spreadsheet-like output file
http://www.math.wustl.edu/~sawyer/geneconv/
none
geneconv Manual
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load geneconv
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for geneconv every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load geneconv endif ...
Once you have setup your environment, you can use the any of the geneconv programs directly from the command line. eggeneconv
Version: 1.81
Labs: Basic Sciences Computing Lab
System(s): SGI Unix workstations
Categories: Bioinformatics, Evolution
GENSCAN is a general-purpose gene identification program which analyzes genomic DNA sequences from a variety of organisms including human, other vertebrates, invertebrates and plants. For each sequence, the program determines the most likely "parse" (gene structure) under a probabilistic model of the gene structural and compositional properties of the genomic DNA for the given organism. This set of exons/genes is then printed to an output file (the text output) together with the corresponding predicted peptide sequences. A graphical (PostScript) output may also be created which displays the location and DNA strand of each predicted exon.
http://genes.mit.edu/
None
README file can be found /usr/local/genscan/README
You must initialize your environment including default paths and environmental variables which this package uses to access the programs and associated files. To do this, enter the following command:module add bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized correctly every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module add bioinformatics endif ...
Once you have setup your environment, you can use the command name directly.
For example, genscan /usr/local/genscan/current/HumanIso.smat your_sequence_file
Version:1.0.
Labs: Basic Sciences Computing Lab, Computational Genetics Laboratory
System(s): All SGI workstations, Sun Solaris
Categories: Bioinformatics
GeneChip Operating Software (GCOS) from Affymetrix is the software system for expression and DNA analysis.GCOS, GCOS Manager, and GCOS Administrator are a trio of applications that:
capture and analyze the array image provide workflow tracking of experiment data (image, cell intensities, probe analysis data) manage experiment data automate basic expression analysis and publishing
http://www.affymetrix.com/support/technical/tutorial/gcos/index.affx
Tutorial can be found at Affymetrix web site at: http://www.affymetrix.com/support/technical/tutorial/gcos/index.affx
User guides is available at http://www.affymetrix.com/support/technical/manuals.affx
You can run GCOS as regular PC program
Version: 1.1.1.052
Labs: Computational Genetics Laboratory, Basic Sciences Computing Lab
System(s): Windows (cpc2 at CGL)
Categories:Microarray Analysis, Bioinformatics
GeneHunter is a program for linkage analysis.
http://www-genome.wi.mit.edu/ftp/distribution/software/genehunter/
http://linkage.rockefeller.edu/soft/gh/
none
Genehunter documentation can be found at http://linkage.rockefeller.edu/soft/gh/
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load genehunter
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for GeneHunter every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load genehunter endif ...
Once you have setup your environment, you can use the genehunter programs directly from the command line. eggh
Version: 2.1 r2
Labs: Computational Genetics Laboratory
System(s): UNIX workstations
Categories: Bioinformatics
Description
GenePix(R) Pro is a software from Axon Instruments, Inc. for analyzing DNA and protein micrarrays, tissue arrays and cell arrays. These images acquired from either Axon Scanner or third-party scanners can be used.
http://www.axon.com/GN_GenePixSoftware.html
The user manual and tutorial book from Axon Instruments Inc. is available at bscl.
The user manual and tutorial book from Axon Instruments Inc. is available at bscl.
You can use GenePix after you loggin to the sdvlapp1.msi.umn.edu.Start -> Programs -> Axon Laboratory -> GenePix Pro 6.0 -> GenePix Pro 6.0
Version: 6.0
Labs: Scientific Development and Visualization Lab-sdvlapp1
System(s): PC
Categories:Microarray Analysis, Bioinformatics
The GeneSpring from SiliconGenetics is a visualization and analysis tool designed for use with gene expression data. It is capable of displaying and analyzing large data set on a typical desktop computer. GeneSpring provides a flexible set of analysis tools. Data from a variety of sources can be imported with case.
http://www.silicongenetics.com/cgi/SiG.cgi/Products/GeneSpring/index.smf
Online presentations are provided by Silicon Genetics
.
You can Download Demo software directly from Silicon Genetics.
GeneSpring User Manual can be viewed from GeneSpring "Help" menu.
GeneSpring is stand-alone software. Currently, it needs to be installed on the user local machine (PC, MAC, or Unix). However, the license key was installed on cgls1 server. The license key monitors and controls the registered users. In order to access and use this software:
1. Register as CGL user. See instruction
2. Contact MSI user support by phone 612 624 0802, email: help@msi.umn.edu, or directly contact Dr. Wayne Xu, or Dr. Zhengjin Tu. We come to your lab to install and set up geneSpring on your local computer.
Currently, geneSpring is not available on our sdvlapp1 Windows server. We are still working that. This web page will be updated as soon as geneSpring works on sdvlapp1 windows server.
Version:6.2
Labs:Computational Genetics Laboratory
System(s): ALL
Categories: Microarray Analysis, Bioinformatics
GeneTraffic Multi is web-based software for either one- or two-color microarray data management and data analysis. It can also be used for web publication of your microarray data.
http://cgl1.msi.umn.edu/
http://www.iobion.com/products/products.html
The online tutorials (IOBION VIDEOS) can be accessed through Iobion's web site
The User Manuals can be downloaded through Iobion's web site
Software are web accessible. However you need register with Computational Genetics Laboratory to get your username/passwd.
URL for accessing GeneTraffic: http://cgl1.msi.umn.edu/
Client requirement: Internet Explorer 6 running on Windows platform plus Flush 6
Version:3.1-4 Multi
Labs:Computational Genetics Laboratory
System(s): PC with Internet Explorer 5 and Internet Access
Categories: Microarray Analysis, Bioinformatics
Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria and archaea. Glimmer (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA
http://www.tigr.org/software/glimmer/
None
Reference articles can be found at http://www.tigr.org/~salzberg/glimmer2.pdf and http://www.tigr.org/software/glimmer/glimmer-nar.pdf
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Please check glimmer.readme file for how to run Glimmer program.
Version: v2.10
Labs: Computational Genetics Laboratory
System(s): Unix workstations
Categories: Bioinformatics
GlimmerM is a gene finder derived from Glimmer, but developed specifically for eukaryotes. It is based on a dynamic programing algorithm that considers all combinations of possible exons for inclusion in a gene model and chooses the best of these combinations. The decision about what gene model is best is a combination of the strength of the splice sites and the score of the exons generated by an interpolated Markov model (IMM). The system has been trained for Arabidopsis thaliana, Oryza sativa (rice), and Plasmodium falciparum (the malaria parasite), and should work well on closely related organisms.
http://www.tigr.org/software/glimmerm/
None
The user manual can be found at http://www.tigr.org/software/glimmerm/man.html
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, you can use the any of the programs directly from the command line. egglimmerm
[options]
Version: v2.5.1
Labs: Computational Genetics Laboratory
System(s): Unix workstations
Categories: Bioinformatics
HMMER is an implementation of profile hidden Markov model (HMM) methods for sensitive database searches using multiple sequence alignments as queries.HMMER takes a multiple sequence alignment as input. It can then build a statistical model called a "hidden Markov model" which can be used as a query into a sequence database to find (and/or align) additional homologues of the sequence family.
There are currently nine programs supported in the HMMER 2 package:
- hmmalign
align sequences to an existing model- hmmbuild
build a model from a multiple sequence alignment- hmmcalibrate
takes an HMM and empirically determinates parameters that are used to make searches more sensitive, by calculating more accurate expection value scores (E-value)- hmmconvert_hmmer
convert a model file into different formats, including a compact HMMER 2 binnary format, and "best effect" emulation of GCG profiles- hmmemit
emit sequences probabilistically from a profile HMM- hmmfetch
get a single model from an HMM database- hmmindex
index an HMM database- hmmpfam
search an HMM database for matches to a query sequence.- hmmsearch
search a sequence database for matches to an HMMHMMER also provides a number of utility programs which are not HMM programs, but may be useful. These programs are from the SQUID sequence utility library that HMMER uses:
- afetch
retrieve an alignment from an alignment database- alistat
show some simple statistics about a sequence alignment file- seqstat
show some simple statistics about a sequence file- sfetch
retrieve an (sub-)sequence from a sequence file- shuffle
randomize sequences in a sequence file- sreformat
reformat a sequence file into a different format
http://hmmer.wustl.edu/
Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. These files are installed at /usr/local/db/pfam/current/ directory.Pfam_ls: HMM library built from Pfam-A seeds, glocal alignment
Pfam_fs: HMM library built from Pfam-A seeds, local alignments
For more information about Pfam, please look site: http://pfam.wustl.edu
A tutorial may be found in the HMMER User's GuideThe tutorial directory can be copied from /usr/local/hmmer/current/tutorial/
A HMMER User's Guide can be found at Man pages are available for the following:
hmmalign hmmbuild hmmcalibrate hmmconvert hmmemit hmmer hmmfetch hmmindex hmmpfam hmmsearch
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, you can use the any of the programs directly from the command line. eghmmbuild globin.hmm globins50.msf
will build a profile HMM from an alignment of globin sequences found in globins50.msf and place the results in globins.hmm.
Version: v2.3.1
Labs: IBM SP, Basic Sciences Computing Lab, Computational Genetics Laboratory,
Medicinal Chemistry/Supercomputing Institute Visualization-Workstation
Laboratory
System(s): All UNIX workstations if applicable
Categories: Bioinformatics
Image is a package of analysis algorithms for processing gel images from restriction digest fingerprinting experiments. Image has been tightly integrated with a friendly user interface and provides a robust tool for large scale physical mapping. Image is able to process gels from a wide variety of scanning technologies and has been tested on various fingerprinting protocols, producing normalized band and gel images as output.
http://www.sanger.ac.uk/Software/Image/
Tutotorial available at http://www.sanger.ac.uk/Software/Image/tutorial/index.shtml#3
Online Help available
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load genomics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load genomics endif ...
Note: If you want to execute image 3.9 version 3.9, you can do:
module load image3.9
Once you have setup your environment, you can use the programs directly from the command line. egim3
Version: 3.10b
Labs: Computational Genetics Laboratory, Basic Sciences Computing Lab
System(s): UNIX(SUN)
Categories: Bioinformatics
InsightII is a molecular modeling package consisting of several programs, including Insight II, BioPolymer, Analysis, Discover. InsightII is a comprehensive graphic molecular modeling program. Used in conjunction with the molecular mechanics/dynamics program Discover, InsightII can be used to build and manipulate virtually any class of molecule or molecule system. Molecular properties can be studied through InsightII's interface with other Biosym products such as DelPhi, DMol, and Discover.
The following modules are available to researchers:
| Package | Concurrent Users | Description (from the Molecular Simulations/Biosym web site) |
|---|---|---|
| Insight II | 10 | Insight creates, modifies, manipulates, displays, and analyzes molecular systems and related data and provides the core requirements for all Insight II software modules. |
| Affinity | 1 |
Affinity is a suite of programs for automatically docking ligand to a receptor. Specifically, for a given assembly consisting of a ligand molecule and a receptor molecule. |
| Analysis | 12 |
Analysis revolves around mathematical and geometric modeling of molecular properties. It is a software program that allows users to abstract molecular properties dynamically from molecular structures and simulations. Analysis is a "dynamic molecular information system". Molecular properties are defined interactively, evaluated dynamically, and visualized interactively through dynamically linked spreadsheets, 2D and 3D graphs, and 3D molecular graphics representations. |
| Biopolymer | 8 | Biopolymer constructs models of peptides, proteins, carbohydrates, and nucleic acids for visualizing complex macromolecular structures and for use in further simulation work. |
| CHARMM | 2 | CHARMM is a simulation program available within insightII. CHARMm uses empirical energy functions to describe the forces on atoms in molecules. These functions, plus the parameters for thei functions, constitute the CHARMm force field. Well-validated energy and force calculations form the core of a broad range of calculation and simulation capabilities, including calculation of interaction and conformational energies, local minima, barriers to rotation, time-dependent dynamic behavior, free energy, and vibrational frequencies. |
| Consensus | 1 |
Consensus builds a 3D model of a protein from its amino acid sequence and the known structures of related proteins using distance constraints derived from the reference protein structures. |
| Converter | 4 |
Converter automatically generates 3D molecules from a database containing 2D representations of molecular structures. It reads 2 D information from the MOL and SD files produced with Molecular Design, Ltd. (MDL) software and outputs 3D structures that are fully compatible with the IsightII and MDL software. | DeCipher | 1 |
DeCipher's philosophy revolves around mathematical and geometric modeling of molecular properties. It is a software program that allows users to abstract molecular properties dynamically from molecular structures and simulations. DeCipher is a "dynamic molecular information system". Molecular properties are defined interactively, evaluated dynamically, and visualized interactively through dynamically linked spreadsheets, 2D and 3D graphs, and 3D molecular graphics representations. |
| Delphi | 4 |
DelPhi calculates electrostatic potentials and solvation energies of both large and small molecules, including nucleic acids. You can use DelPhi to rigorously examine the effects of charge distribution, ionic strength, and dielectric constant on the electrostatic potentials of macromolecules. |
| Discover and Discover3 | 10 | Discover is a simulation program available within Insight II. It incorporates a range of well validated forcefields for dynamics simulations, minimization, and conformational searches, allowing you to predict the structure, energetics and properties of organic, inorganic, organometallic, and biological systems. Discover also implements IPC (Inter Process Communications), which allows users to instruct Discover to turn processing control over to external programs, and retrieve the results of those external processes, incorporating them into the continuing Discover computations. |
| Homology | 2 |
Homology builds a 3D model of a protein from its amino acid sequence and the known structure of related proteins. Standard techniques of backbone building, loop modeling, structural overlay and statistical analysis of the resulting models are available. |
| Ludi | 1 |
The Ludi program runs in both receptor and active analogue mode. It allows user to design de nove candidate ligands for the active sites of proteins, suggest modifications of known ligands, manage libraries of candidate fragments, and scores ligand-complex complexes. |
| Modeler | 3 |
Modeler is an automated homology modeling scheme designed to find the most probable three dimensional structure of a protein, given its amino acid sequence and its alignment with related structures. It derives 3D protein models without the time consuming separate stages of core region identification and loop region building or searching that are inherent to manual homology modeling schemes. |
| NMR Refine | 3 |
NMR_Refine is dedicated to structure generation and refinement. This module contains the tools necessary for constructing a comprehensive NMR database; generating an approximate molecule structure using simulated annealing or distance geometry; further refining the structure with the iterative relaxation matrix approach (IRMA) and direct NOE methods; and evaluating the structures obtained at each step of the process, enabling their accuracy and precision to be assessed. NMR_Refine also provides restraint analysis. NMR_Refine uses the programs DGII, IRMA, Discover, and X-PLOR to perform these various tasks. |
| Search_Compare | 1 |
The Search_Compare module contains the pulldown Volume, Overlap, SC_Search, Vector_Map, Conformer, Distance_Map, Spreadsheet, Graph and Background_Job. The commands in Search_Compare enable users to calculate and operate on molecular volumes, to superimpose two or more molecules, to search systematically for sterically allowed conformations of a molecule and to quickly find and examine conformations of interest after they have been generated. |
To run insightII, you must first set some environment varibles. This is easy. Just type
source /usr/local/accelrys/accelrys.csh
Now type
insightII
Version: 2000.1
Labs:Scientific Development and Visualization Lab,
Medicinal Chemistry/Supercomputing Institute Visualization-Workstation
Laboratory, Basic Sciences Computing Lab, Computational Genetics Laboratory
System(s): all SGI workstations
Categories: Molecular Modeling, X-ray Crystallography, Molecular Simulation
For more information, see http://www.msi.umn.edu/software/biosym/tutorial/index.html.
InterProScan software combines several protein motifs/domains search tools together. It allows to scan protein sequences at one time against several signature databases including Prosite, PRINTS, PFAM, ProDom, Smart, TIGRFAMs, etc. It also gives GO annotation.
This local InterProScan can analyze up to 500 protein sequences per fasta file.
http://www.ebi.ac.uk/interpro/
Tutorial
http://www.ebi.ac.uk/interpro/documentation.html
module load proteinmotif
InterProScan.pl your.seq -ipr
cd /usr/local/interProScan/iprscan/tmp/yours_04-Feb-2004_10615
Replace "yours_04-Feb-2004_10615" with your resulted directory name.
sh -c "/usr/local/interProScan/iprscan/bin/SunOS/gmake htm -j10 -k 2> ERROR 1> OUT"
After done the job, move your result folder to your current directory by:
mv /usr/local/interProScan/iprscan/tmp/yours_04-Feb-2004_10615 .
View the merged file cnk_1/???.htm
from cgl Netscape or transfer to local pc and view by Internet Browser.
Version: v7.1
Labs:Computational Genetics Laboratory
System(s): All UNIX workstations
Categories: Bioinformatics
The JoinMap is an advanced computer software for the calculation of genetic linkage maps in experimental populations. It provides high quality tools that allow detailed study of the experimental data and the generation of publication-ready map charts. The intuitive MS-Windows ® user interface of JoinMap invites to a better exploration of the data. For instance, you can perform several diagnostical tests, both before and after the actual map calculation, and you can remove potentially erroneous loci and individuals from the map calculations by a simple mouse-click.
http://www.kyazma.nl/index1.php#
None
User Manual.
Login sdvlapp1.msi.umn.edu and run Joinmap there. Start -> Programs -> JoinMap -> JoinMap3.0.
Version:3.0
Labs:Scientific Development and Visualization Lab-sdvlapp1
System(s): ALL
Categories: Bioinformatics
The core of the LINKAGE package is a series of programs for maximum likelihood estimation of recombination rates, calculation of lod score tables, and analysis of genetic risks. The analysis programs are divided into two groups. The first group can be used for general pedigrees with marker and disease loci. Programs in the second group are for three-generation families and codominant marker loci, and are primarily intended for the construction of genetic maps from data on reference families.These programs are included:
- lcp
- lsp
- lrp
- ilink
- linkmap
- lodscore
- mlink
- unknown
http://linkage.rockefeller.edu/soft/linkage/
none
LINKAGE User Guide can be found at http://linkage.rockefeller.edu/soft/linkage/
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load linkage
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for LINKAGE every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load linkage endif ...
Once you have setup your environment, you can use the linkage programs directly from the command line. eglcp
Version: 5.2
Labs: Computational Genetics Laboratory
System(s): UNIX workstations
Categories: Bioinformatics
LUCY is the program for DNA sequence quality trimming and vector removal. It was designed to take the base-call quality assemssemnt of each base into consideration in the cleaning process, to make sure the processed sequences have the best overall quality possible based on their individual base quality value. Lucy's task is to identify the largest subsequence that is of sufficiently high quality and also free of contaiminating vector sequence.
http://www.tigr.org/software/
None
Lucy is fully described in: DNA sequence quality trimming and vector removal. H.-H. Chou and M.H. Holmes. Bioinformatics , 17:12, pp. 1093-1104, 2001manpage available.
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for LUCY every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module add bioinformatics endif ...
Once you have setup your environment, you can use the any of the programs directly from the command line. eglucy -v PUC19 PUC19splice atie.seq atie.qul atie.2nd -debug lucy.info
Version: v1.18p
Labs: Computational Genetics Laboratory
System(s): UNIX workstations
Categories: Bioinformatics
Mascot is a search engine that uses mass spectrometry data to identify proteins from primary sequence databases.Mascot search
- Peptide Mass Fingerprint:
The experimental data are a list of peptide mass values form an enzymatic digest of a protein.- Sequence Query One of more peptide mass values associated with information such as partial or ambiguous sequence strings, amino acid composition information, MS/MS fragment ion masses, etc. A super-set of a sequence tag query
- MS/MS Ion Search Identification based on raw MS/MS data from one or more peptides
http://www.matrixscience.com
Help provided by the Matrix Science at http://www.matrixscience.com/help_index.html
Help provided by the Matrix Science at http://www.matrixscience.com/help_index.html
Through http://cws.msi.umn.edu/mascot/Note: You need to be a CGL user and login any Institute's computer to access this URL.
Through Mascot Daemon installed in both CGL and BSCL PCs
Version:NA
Labs:Computational Genetics Laboratory
System(s): All computers
Categories: Proteomics, Bioinformatics
Description
MEGA3 (Molecular Evolutionary Genetics Analysis) is an integrated tool for automatic and manual sequence alignment, inferring phylogenetic trees, mining web-based databases, estimating rates of molecular evolution, and testing evolutionary hypotheses. It is free ware developed by Sudhir Kumar, Koichiro Tamura, and Masatoshi Nei.
http://www.megasoftware.net/mega3/index.html
Walk Through MEGA Tutorial
The online user manual is available from the HELP the Mega3.
You can use MEGA3 after you loggin to the CPC1, the Dell pc in CGL.Start -> Programs -> mega3
Version: 3.0
Labs: Computational Genetics Laboratory
System(s): Windows(CPC1)
Categories:Evolution, Bioinformatics
The Molecular Operating Environment is the next generation of chemical computing software. MOE is an integrated Applications Environment and Methodology Development Platform. MOE integrates visualization, simulation and application development in one package.
Additional details and documentation regarding MOE can be found at www.chemcomp.com.
module load moe
moe
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form.
http://www.tigr.org/software/mummer/
None
http://www.tigr.org/software/mummer/manual
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Please check Running MUMmer file for how to run MUMmer program.
Version: v3.0
Labs: Computational Genetics Laboratory
System(s): Unix workstations
Categories: Bioinformatics
MAPMAKER is a linkage analysis package designed to help construct primary linkage maps of markers segregating in experimental crosses. MAPMAKER performs full multipoint linkage analysis (simultaneous estimation of all recombination fractions from the primary data) for dominant, recessive, and co- dominant (e.g. RFLP-like) markers. MAPMAKER is an experimental-cross-only successor to the original MAPMAKER program. QTL is a companion program to MAPMAKER which allows one to map genes controlling polygenic quantitative traits in F2 intercrosses and BC1 backcrosses relative to a genetic linkage map. More information on MAPMAKER/QTL can be found in the technical report (included with MAPMAKER/QTL).
mapmgr.roswellpark.org/qtsoftware.html
None
None.
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for Mapmaker/QTL every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
If you run it from cgls1, login into cgls1.msi.umn.edu, setup your environment, then call the mapmaker program or QTL program directly from the command line. egmapmaker
or
qtl
Version: Mapmaker version 3.0, QTL version 1.1
Labs: Computational Genetics Laboratory, Scientific Development and Visualization Lab-sdvlapp1
System(s): All Unix workstations, Windows
Categories: Bioinformatics
The Meme/Mast are tools for motif discovery and search. Meme is used for discovering motifs from a group of DNA or protein sequences. Mast can search against databases such as NCBI nr, or Swissprot, using the Meme motifs. The Meme/mast system was developed by Timothy Bailey, Charles Elkan, and Bill Grundy at the UCSD Computer Science and Engineering department with input from Micheal Gribskov at the San Diego Supercomputer Center.
http://meme.sdsc.edu/meme/website/intro.html
None
Meme and Mast are described in detail in Papers
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for Meme/Mast every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Login into cgls1.msi.umn.edu, setup your environment, then call the Meme/Mast program directly from the command line. egmeme
or
mast
Version: Meme/Mast version 3.0
Labs: Computational Genetics Laboratory
System(s): All Unix workstations if applicable
Categories: Bioinformatics
Merlin carries out single-point and multipoint analyses of pedigree data, including IBD and kinship calculations, nonparametric and variance component linkage analyses, error detection and information content mapping. For multipoint analyses in dense maps, Merlin allows the user to impose constraints on the number of recombinants between consecutive markers. Merlin estimates haplotypes by finding the most likely path of gene flow or by sampling paths of gene flow at all markers jointly. It can also list all possible nonrecombinant haplotypes within short regions. Finally, Merlin provides swap-file support for handling very large numbers of markers as well as gene-dropping simulations for estimating empirical significance levels.These programs are available:
merlin merlin-regress minx pedmerge pedstats pedwipe
http://www.sph.umich.edu/csg/abecasis/Merlin/
http://www.sph.umich.edu/csg/abecasis/Merlin/tour/
Reference can be found at http://www.sph.umich.edu/csg/abecasis/Merlin/reference.html
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load linkage
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for Merlin every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load linkage endif ...
Once you have setup your environment, you can use the merlin programs directly from the command line. egmerlin -d datafile -p pedfile
Version: 0.9.12b
Labs: Computational Genetics Laboratory
System(s): UNIX workstations
Categories: Bioinformatics
MrBayes is a program for Bayesian inference of phylogeny using Markov Chain Monte Carlo methods. MrBayes has a console interface and uses a modified NEXUS format for data and batch files. It handles a wide range of probabilistic models for the evolution of nucleotide and aminoacid sequences, restriction sites, and standard binary data. The user can set the priors used for the parameters and search for trees under topological constraints. The behavior of the Markov chain can be controlled by setting proposal probabilities for different move types and by invoking heated chains (Metropolis Coupling) to improve performance for difficult problems. Various options are available for summarizing the posterior distribution of the model parameters, including topology and branch lengths, and drawing inferences about ancestral states and site rates.
http://morphbank.ebc.uu.se/mrbayes/
None
Check Manual for detail
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files.On the IBM SP, to use MrBayes verion 3.0 serial, enter the following command:
module load mrbayesOn the IBM SP, to use MrBayes verion 3.0 parallel, enter the following command:
module load mrbayes-mpiOn cgls1, to use MrBayes verion 3.01 serial, enter the following command:
module load mrbayesOn cgls1, to use MrBayes verion 3.01 parallel, enter the following command:
module load mrbayes-mpiIf you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for MrBayes every time you log in.
Once you have setup your environment, you can use the any of the MrBayes programs directly from the command line. egmb
Version: v2.01, v3.01 (serial and parallel)
Labs: IBM SP, Basic Sciences Computing Lab, Computational Genetics Laboratory
System(s): All Unix workstations if applicable
Categories: Evolution, Bioinformatics
The Multidivtime is for studying rates of molecular evolution and for estimating divergence times.
http://statgen.ncsu.edu/thorne/
None
Manual can be found at ftp://statgen.ncsu.edu/pub/thorne/Rutschmannguide.pdf
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load evolution
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for multidivtime every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load evolution endif ...
Once you have setup your environment, you can use multidivtime programs directly from the command line.There are three executables
- multidivtime
- paml2modelinf
- estbranches
Version: 09.25.03
Labs: Netfinity
System(s): Unix workstations
Categories: Evolution, Bioinformatics
MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences.
http://www.drive5.com/muscle/
none
http://www.drive5.com/muscle/docs.htm
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for muscle every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, you can use the muscle programs directly from the command line. egmuscle -in seqs.fa -out seqs.afa
Version: 3.51
Labs: Computational Genetics Laboratory
System(s): cl1 Linux workstations
Categories: Bioinformatics
BLAST (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses a heuristic algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share only isolated regions of similarity. see Altschul SF, et al. J Mol Biol 1990 Oct 5;215(3):403-10There are several different distributions of BLAST. This one is from the National Center for Biotechnology Information (NCBI) where BLAST was developed. WU-BLAST 2.0 (which we also have installed) and NCBI BLAST are distinctly different software packages. In spite of a common lineage for some portions of their code, in many important ways the two packages do their work differently and, consequently, obtain different results and offer different features.
BLAST is available via a web browser on several sites on the internet. We have the standalone version installed which allows users to create and manage their own databases if needed. This distribution also include the IMPALA (Integrating Matrix Profiles And Local Alignments) package.
The NCBI-BLAST programs include:
The IMPLALA package includes:
- blastall
performs all five flavors of blast comparison:
- blastp --- protein against protein
- blastn --- nucleotide against nucleotide
- blastx --- translated nucleotide against protein
- tblastn -- protein against translated nucleotide
- tblastx -- translated nucleotide against translated nucleotide
- blastpgp
takes a protein query and perform PSI-BLAST search to creates a position specific matrix using a protein database- blastclust
automatically and systematically clusters protein sequences based on pairwise matches found using the BLAST algorithm in case of proteins or Mega BLAST algorithm for DNA- megablast
Mega BLAST uses the greedy algorithm of Webb Miller et al. for nucleotide sequence alignment search and concatenates many queries to save time spent scanning the database- rpsblast
RPS-BLAST (Reverse PSI-BLAST) searches a query sequence against a database of profiles- bl2seq
performs a comparison between two sequences using either the blastn or blastp algorithm- fastacmd
retrives FASTA formatted sequences from a BLAST database- formatdb
formats FASTA sequence databases for BLAST- seedtop
- copymat
secondary profile preprocessor
(converts ASCII matrices, produced by the primary preprocessor, into database that can be read into memory quickly)- makemat
primary profile preprocessor
(converts a collection of binary profiles, created by the -C option of PSI-BLAST, into portable ASCII form)- impala
search program
(searches a database of score matrices, prepared by copymat, producing BLAST-like output)
http://www.ncbi.nlm.nih.gov/BLAST/
Tutorials may be found on the BLAST site at the NCBI:
README files can be found at /usr/local/ncbi_blast directory.
More documentation can be found at NCBI BLAST web site.
Web Interface to NCBI BLAST can be accessed at http://cgls1.msi.umn.edu/software/wwwblast/Note: You need to be a CGL user and register with your University of Minnesota's X.500 username to the Institute before you can use it.
Please copy .ncbirc file from /usr/local/bioinfo directory to your home directory. You only need to do this one time. On UNIX prompt type:
cp /usr/local/bioinfo/.ncbirc ~/.ncbircYou must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, you can use the any of the programs directly from the command line. egblastall -p blastp -d swissprot -i my_query_sequence.fa -o myblast.out
will compare a protein sequence found in my_query_sequence.fa against the swissprot protein database and writes the results to myblast.out file.
Version:v2.2.9
Labs: IBM Power4, Basic Sciences Computing Lab, Computational Genetics Laboratory, Medicinal Chemistry/Supercomputing Institute Visualization-Workstation Laboratory
System(s): All UNIX workstations
Categories: Bioinformatics
Pfaat (Protein Family Alignment Annotation Tool) is a Java-based protein sequence alignment application designed to facilitate the analysis, curation, and annotation of large protein sequence families. Key features of Pfaat include the ability to align collections of sequences, group sequences into specific families, analyze sequences based on a number of similarity criteria, and annotate sequences and specific residue positions with text descriptions.
http://pfaat.sourceforge.net/
none
http://pfaat.sourceforge.net/pfaat-documentation.htm
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load bioinformatics
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for PFAAT every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load bioinformatics endif ...
Once you have setup your environment, you can use the pfaat programs directly from the command line. egpfaat
Version: not specified
Labs: Computational Genetics Laboratory
System(s): UNIX workstations
Categories: Bioinformatics
PHYLIP (the PHYLogeny Inference Package) is one of most popular package of programs for inferring phylogenies (evolutionary trees). It was developed by Joe Felsenstein of the Department of Genome Sciences at the University of Washington.These programs include:
- dnacomp
Estimates phylogenies from nucleic acid sequence data using the compatibility criterion.- dnadist
Computes four different distances between species from nucleic acid sequences.- dnainvar
For nucleic acid sequence data on four species, computes Lake's and Cavender's phylogenetic invariants.- dnaml
Estimates phylogenies from nucleotide sequences by maximum likelihood.- dnamlk
Same as DNAML but assumes a molecular clock.- dnamove
Interactive construction of phylogenies from nucleic acid sequences.- dnapars
Estimates phylogenies by the parsimony method using nucleic acid sequences.- dnapenny
Finds all most parsimonious phylogenies for nucleic acid sequences by branch-and-bound search.- seqboot
Reads in a data set, and produces multiple data sets from it by bootstrap resampling.- consense
Computes consensus trees by the majority-rule consensus tree method.- protdist
Estimates phylogenies from protein sequences using the parsimony method.- protpars
Computes a distance measure for protein sequences.- restml
Estimation of phylogenies by maximum likelihood using restriction sites data .- fitch
Estimates phylogenies from distance matrix data under the "additive tree model" .- kitsch
Estimates phylogenies from distance matrix data under the "ultrametric" model.- neighbor
An implementation by Mary Kuhner and John Yamato of Saitou and Nei's "Neighbor Joining Method," and of the UPGMA (Average Linkage clustering) method.- contml
Estimates phylogenies from gene frequency data by maximum likelihood under a model.- gendist
Computes one of three different genetic distance formulas from gene frequency data.- contrast
Reads a tree from a tree file, and a data set with continuous characters data, and produces the independent contrasts for those characters, for use in any multivariate statistics package.- mix
Estimates phylogenies by some parsimony methods for discrete character data with two states (0 and 1).- move
Interactive construction of phylogenies from discrete character data with two states (0 and 1).- penny
Finds all most parsimonious phylogenies for discrete-character data with two states.- dollop
Estimates phylogenies by the Dollo or polymorphism parsimony criteria for discrete character data with two states (0 and 1).- dolmove
Interactive construction of phylogenies from discrete character data with two states (0 and 1) using the Dollo or polymorphism parsimony criteria.- dolpenny
Finds all most parsimonious phylogenies for discrete-character data with two states, for the Dollo or polymorphism parsimony criteria using the branch-and-bound method of exact search.- clique
Finds the largest clique of mutually compatible characters, and the phylogeny which they recommend, for discrete character data with two states.- factor
Takes discrete multistate data with character state trees and produces the corresponding data set with two states (0 and 1).- drawgram
Plots rooted phylogenies, cladograms, and phenograms in a wide variety of user-controllable formats.- drawtree
Similar to DRAWGRAM but plots unrooted phylogenies.- retree
Reads in a tree (with branch lengths if necessary) and allows you to reroot the tree, to flip branches, to change species names and branch lengths, and then write the result out.
http://evolution.genetics.washington.edu/phylip.html
none
Documentation (Microsoft Word Doc files) can be found at
/usr/local/phylip/current directory.
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load evolution
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized for PHYLIP every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load evolution endif ...
Once you have setup your environment, you can use the any of the PHYLIP programs directly from the command line. egdnamove
Version: v3.5c
Labs: Basic Sciences Computing Lab, Computational Genetics Laboratory,
Medicinal Chemistry/Supercomputing Institute Visualization-Workstation Laboratory
System(s): All Unix workstations
Categories: Evolution, Bioinformatics
This program takes sequential PHYLIP formatted DNA sequences followed by their maximum likelihood phylogeny. Using a likelihood approach with sliding window analysis and Monte Carlo simulation of the null distribution, anomalously evolving regions in the DNA sequences can be detected and their significance assessed. This may lead to the detection of, for example, recombination, gene conversion or convergence, or reveal variable selective pressures along the gene sequence.
http://evolve.zoo.ox.ac.uk/software.html?id=plato/
none
Manual available at http://evolve.zoo.ox.ac.uk/software/plato/manual.php
You must initialize your environment including default paths and environmental variables which the package uses to access the programs and associated files. To do this, enter the following command:module load evolution
If you access this package on a regular basis, you can add this line to your ~/.cshrc file so that your environment will be initialized every time you log in.
eg:
... # initialize and load modules if( -e /usr/local/share/modules/init/tcsh ) then unsetenv PATH MANPATH source /usr/local/share/modules/init/tcsh module load base module load evolution endif ...
Once you have setup your environment, you can use the plato programs directly from the command line. egplato
Version: 2.11
Labs: Computational Genetics Laboratory
System(s): SUN
Categories: Bioinformatics
Description
PathwayAssist, Stratagene commercial software, can build and examine biological association networks, including traditional pathways. One can see protein, small molecule and cellular processes all together for a more complete systems biology network view. In addition, it allows you to parse scientific text or build fact databases from your own research, such as information from PubMed. It also allows for the importation and overlay of microarray gene expression data within a built biological association network.
http://www.stratagene.com/products/displayProduct.aspx?pid=559
Training
download Manual
1. l