gatk

Genetics

Software Description

From the GATK website:

A genomic analysis toolkit focused on variant discovery. The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic short variant calling, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data, and bundles the popular Picard toolkit. These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy. GATK consists of many tools each with many parameters. To run GATK effectively, review the GATK best practices documentation:

GATK Homepage

Getting started with GATK (includes best practices workflows)


Info

Module Name

gatk

Last Updated On

02/15/2024

Support Level

Primary Support

Software Access Level

Open Access

Home Page

https://gatk.broadinstitute.org/hc/en-us

Documentation

Software Description

From the GATK website:

A genomic analysis toolkit focused on variant discovery. The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic short variant calling, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data, and bundles the popular Picard toolkit. These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy. GATK consists of many tools each with many parameters. To run GATK effectively, review the GATK best practices documentation:

GATK Homepage

Getting started with GATK (includes best practices workflows)

General Linux

To run this software interactively in a Linux environment run the commands:

module load gatk
gatk

This will list the basic options for running GATK. To see a list of all available GATK tools, use:

gatk --list

To see options for an individual too, use:

gatk TOOL_NAME -help

where TOOL_NAME is the name of the GATK tool. For example, to see available options for HaplotypeCaller, run:

gatk HaplotypeCaller -help

To run HaplotypeCaller, or another GATK tool, use this:

gatk --java-options "-Xmx4g" TOOL_NAME

T he -Xmx{n} option specifies the amount of memory available to Java (e.g., 4g, 4096m or 4194304k). If Java exceeds the allocated limit the interactive session will be terminated.

NOTE: GATK requires at least 4GB of memory for Java. Java often exceeds memory limits imposed by the "-Xmx{n}" flag, so the default interactive environment only informs Java of a fraction of the memory, reserving the rest for padding.

Agate Modules

Default

4.1.2

Other Modules

1.6, 2.7.2, 3.1.1, 3.2.0, 3.2.2, 3.3.0, 3.4.0, 3.5.0, 3.6.0, 3.7.0, 4.0.11, 4.0.6, 4.1.2, 4.4.0

Mangi Modules

Default

4.1.2

Other Modules

1.6, 2.7.2, 3.1.1, 3.2.0, 3.2.2, 3.3.0, 3.4.0, 3.5.0, 3.6.0, 3.7.0, 4.0.11, 4.0.6, 4.1.2, 4.4.0

Mesabi Modules

Default

4.1.2

Other Modules

1.6, 2.7.2, 3.1.1, 3.2.0, 3.2.2, 3.3.0, 3.4.0, 3.5.0, 3.6.0, 3.7.0, 4.0.11, 4.0.6, 4.1.2, 4.4.0