Edit me

Glossary of associated terms and jargon

FASTQ format & quality scores
FASTQ format is the standard format of raw sequence data. Quality scores assigned in the FASTQ files represent the probability that a certain base was called incorrectly. These scores are encoded in various ways and it is important to know the type of encoding for a given FASTQ file.
Single-end vs paired-end, read vs fragment
DNA fragments can be sequenced from one end or both ends, single-end (SE) or paired-end (PE) respectively. During data processing PE data will often be represented as fragments consisting of 2 reads, instead of 2 separate reads.
During library prep RNA can be processed into cDNA such the strand information is maintained. This is important in regions where there are overlapping genes on the two DNA strands
Acronyms that stand for:
  • TPM: Transcripts Per Million
  • RPKM - Reads Per Kilobase per Million mapped reads (for SE data)
  • FPKM - Fragments Per Kilobase per Million mapped reads (for PE data)
  • CPM - Counts Per Million
This is a type of normalization and is an acronym for “Trimmed Mean of Ms”1.


  1. Robinson, Mark D., and Alicia Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology 11.3 (2010): R25.