allpaths-lg

Genetics

Software Description

ALLPATHS-LG is a whole-genome shotgun assembler that can generate high-quality genome assemblies using short reads (\~100bp) such as those produced by the new generation of sequencers. The significant difference between ALLPATHS and traditional assemblers such as Arachne is that ALLPATHS assemblies are not necessarily linear, but instead are presented in the form of a graph. This graph representation retains ambiguities, such as those arising from polymorphism, uncorrected read errors, and unresolved repeats, thereby providing information that has been absent from previous genome assemblies.


Info

Module Name

allpathslg

Last Updated On

08/29/2023

Support Level

Secondary Support

Software Access Level

Open Access

Home Page

http://www.broadinstitute.org/software/allpaths-lg/blog/

Documentation

Software Description

ALLPATHS-LG is a whole-genome shotgun assembler that can generate high-quality genome assemblies using short reads (\~100bp) such as those produced by the new generation of sequencers. The significant difference between ALLPATHS and traditional assemblers such as Arachne is that ALLPATHS assemblies are not necessarily linear, but instead are presented in the form of a graph. This graph representation retains ambiguities, such as those arising from polymorphism, uncorrected read errors, and unresolved repeats, thereby providing information that has been absent from previous genome assemblies.

General Linux

To run this software interactively in a Linux environment run the commands:

module load allpathslg
PrepareAllPathsInputs.pl DATA_DIR=/path/to/data
RunAllPathsLG PRE=<pre> DATA_SUBDIR=<data> RUN=<ref> REFERENCE_NAME=<ref>

Note:

The PrepareAllPathsInputs.pl script requires one parameter, the path to the directory containing the input data.\<pre> is the root directory ALLPATHS-LG will use. \<data> is the subdirectory containing the input data. \<run> is the directory used for assembly pre-processing. \<ref> is the organism or reference genome name.

ALLPATHS-LG is composed of a number of modules, each of which performs a step in the assembly process. While each module can be run individually, ALLPATHS-LG provides a module that controls the entire assembly pipeline, called RunAllPathsLG. In addition, before ALLPATHS-LG can be used, data must be converted using the Perl script PrepareAllPathsInputs.pl.

AllPathsLG assembler has specific requirement for the paired-end read libraries. It requires the paired read to be actually interwinded.

A more detailed discussion of each of these directories, as well as a list of other command-line arguments, is avaible in the user manual .  Other ALLPATHS-LG utilities may be found in the directory

/soft/allpathslg/VER/bin

where VER is the version of ALLPATHS-LG you are using. An example PBS script for submitting ALLPATHS-LG jobs to the queue is shown below.

#PBS -l nodes=1:ppn=8,mem=1gb,walltime=4:00:00
#PBS -m abe
module load allpaths-lg

# Prepare input data
mkdir -p test.genome/data
PrepareAllPathsInput.pl \
DATA_DIR=$PWD/test.genome/data

# Assemble data
RunAllPathsLG \
PRE=$PWD \
DATA_SUBDIR=data \
RUN=run \
REFERENCE_NAME=test.genome

Additional Information

User Manual

Example Data

Agate Modules

Default

52488

Other Modules

42557, 52488

Mangi Modules

Default

52488

Other Modules

42557, 52488

Mesabi Modules

Default

52488

Other Modules

42557, 52488