Edit me

Important note

The Genome Analysis Toolkit (GATK) distributed by the Broad Institute of Harvard and MIT (see http://www.broadinstitute.org/gatk/) is a commonly used framework and toolbox for many of the tasks described below. In its latest 4.0 release, the GATK is now completely open source. While we recommend GATK tools for many of the tasks, we try also to provide alternatives for those organizations that cannot or do not wish to use GATK (i.e for licensing reasons with older GATK versions). Please also note that while the key analysis steps remain the same, the GATK4 is intended to become a spark-based rewrite of GATK3. Many GATK4 tools come in spark-capable or non-spark-capable modes, and can run locally, on a Spark cluster or on Google Cloud Dataproc 1. There are some differences in invocation highlighted below, and where tools names have changed this is also indicated. A better introduction to the GATK3 is found here: https://software.broadinstitute.org/gatk/documentation/quickstart?v=3 and to the GATK4 is found here https://software.broadinstitute.org/gatk/documentation/quickstart?v=4 . Their computational performance is discussed here 2.

Bibliography

  1. broadinstitute/gatk: Official code repository for GATK versions 4 and up. at https://github.com/broadinstitute/gatk 

  2. Heldenbrand, J. R. et al. Performance benchmarking of GATK3.8 and GATK4. BioRxiv (2018). doi:10.1101/348565