Edit me

Glossary of associated terms and jargon

The table below (partly borrowed from the GATK dictionary 1 ) provides definitions of the basic terms used.

Adapters
Short nucleotide sequences added on to the ends of the DNA fragments that are to be sequenced 2 ,3,4). Functions:
  1. permit binding to the flow cell;
  2. allow for PCR enrichment of adaptor-ligated DNA only;
  3. allow for indexing or “barcoding” of samples, so multiple DNA libraries can be mixed together into 1 sequencing lane.
Genetic variant
This can be:
  1. single nucleotide variation (SNV)
  2. small (<10 nt) insertions/deletions (indels)
  3. copy number variation (outside the scope of this document)
Lane
The basic machine unit for sequencing. The lane reflects the basic independent run of an NGS machine. For Illumina machines, this is the physical sequencing lane.
Library
A unit of DNA preparation that at some point is physically pooled together. Multiple lanes can be run from aliquots from the same library. The DNA library is the natural unit that is being sequenced. For example, if the library has limited complexity, then many sequences are duplicated and will result in a high duplication rate across lanes. See 5 for more details.
NGS
Next generation sequencing.
WGS
Whole Genome Sequencing.
Sample
A single individual, such as human CEPH NA12878. Multiple libraries with different properties can be constructed from the original sample DNA source. Here we treat samples as independent individuals whose genome sequence we are attempting to determine. From this perspective, tumor/normal samples are different despite coming from the same individual.
SNV
Single nucleotide variant.
  • In a non-coding region
  • In a coding region:
    1. synonymous
    2. nonsynonymous
      1. missense
      2. nonsense
Functional Equivalence specifications 6
Specifications intended to eliminate batch effects and promote data interoperability by standardizing pipeline implementations: used tools, versions of these tools, and versions of reference genomic files. Large genomic databases, like gnomAD and TOPmed are being processed by pipelines adhering to these specifications.
Functional equivalence 6
Two variant calling pipelines are functionally equivalent if they can be run independently on the same raw WGS data to produce aligned files (BAM or CRAM files) that yield genome variation maps (VCF files) that have >98% similarity when analyzed by the same variant caller(s).

Bibliography

  1. GATK Dictionary. at https://software.broadinstitute.org/gatk/documentation/topic?name=dictionary 

  2. Myllykangas, S., Buenrostro, J. & Ji, H. P. Overview of sequencing technology platforms. InBioinformatics for High Throughput Sequencing 11–25 (Springer New York, 2012). doi:10.1007/978-1-4614-0782-9_2 

  3. Schiemer, J. Illumina TruSeq DNA Adapters De-Mystifie available from http://tucf-genomics.tufts.edu/documents/protocols/TUCF_Understanding_Illumina_TruSeq_Adapters.pdf

  4. Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016). 

  5. Head, S. R. et al. Library construction for next-generation sequencing: overviews and challenges. BioTechniques 56, 61–4, 66, 68, passim (2014). 

  6. Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. BioRxiv (2018). doi:10.1101/269316  2