allpaths-lg
Software Description
ALLPATHS-LG is a whole-genome shotgun assembler that can generate high-quality genome assemblies using short reads (\~100bp) such as those produced by the new generation of sequencers. The significant difference between ALLPATHS and traditional assemblers such as Arachne is that ALLPATHS assemblies are not necessarily linear, but instead are presented in the form of a graph. This graph representation retains ambiguities, such as those arising from polymorphism, uncorrected read errors, and unresolved repeats, thereby providing information that has been absent from previous genome assemblies.
Info
Module Name
allpathslg
Last Updated On
08/29/2023
Support Level
Secondary Support
Software Access Level
Open Access
Home Page
Documentation
Software Description
ALLPATHS-LG is a whole-genome shotgun assembler that can generate high-quality genome assemblies using short reads (\~100bp) such as those produced by the new generation of sequencers. The significant difference between ALLPATHS and traditional assemblers such as Arachne is that ALLPATHS assemblies are not necessarily linear, but instead are presented in the form of a graph. This graph representation retains ambiguities, such as those arising from polymorphism, uncorrected read errors, and unresolved repeats, thereby providing information that has been absent from previous genome assemblies.
General Linux
To run this software interactively in a Linux environment run the commands:
module load allpathslg
PrepareAllPathsInputs.pl DATA_DIR=/path/to/data
RunAllPathsLG PRE=<pre> DATA_SUBDIR=<data> RUN=<ref> REFERENCE_NAME=<ref>
Note:
The PrepareAllPathsInputs.pl script requires one parameter, the path to the directory containing the input data.\<pre> is the root directory ALLPATHS-LG will use. \<data> is the subdirectory containing the input data. \<run> is the directory used for assembly pre-processing. \<ref> is the organism or reference genome name.
ALLPATHS-LG is composed of a number of modules, each of which performs a step in the assembly process. While each module can be run individually, ALLPATHS-LG provides a module that controls the entire assembly pipeline, called RunAllPathsLG. In addition, before ALLPATHS-LG can be used, data must be converted using the Perl script PrepareAllPathsInputs.pl.
AllPathsLG assembler has specific requirement for the paired-end read libraries. It requires the paired read to be actually interwinded.
A more detailed discussion of each of these directories, as well as a list of other command-line arguments, is avaible in the user manual . Other ALLPATHS-LG utilities may be found in the directory
/soft/allpathslg/VER/bin
where VER is the version of ALLPATHS-LG you are using. An example PBS script for submitting ALLPATHS-LG jobs to the queue is shown below.
#PBS -l nodes=1:ppn=8,mem=1gb,walltime=4:00:00
#PBS -m abe
module load allpaths-lg
# Prepare input data
mkdir -p test.genome/data
PrepareAllPathsInput.pl \
DATA_DIR=$PWD/test.genome/data
# Assemble data
RunAllPathsLG \
PRE=$PWD \
DATA_SUBDIR=data \
RUN=run \
REFERENCE_NAME=test.genome
Additional Information
Agate Modules
Default
52488
Other Modules
42557, 52488
Mangi Modules
Default
52488
Other Modules
42557, 52488
Mesabi Modules
Default
52488
Other Modules
42557, 52488