Glossary of associated terms and jargon
- FASTQ format & quality scores
- FASTQ format is the standard format of raw sequence data. Quality scores assigned in the FASTQ files represent the probability that a certain base was called incorrectly. These scores are encoded in various ways and it is important to know the type of encoding for a given FASTQ file.
- Single-end vs paired-end, read vs fragment
- DNA fragments can be sequenced from one end or both ends, single-end (SE) or paired-end (PE) respectively. During data processing PE data will often be represented as fragments consisting of 2 reads, instead of 2 separate reads.
- Strandedness
- During library prep RNA can be processed into cDNA such the strand information is maintained. This is important in regions where there are overlapping genes on the two DNA strands
- TPM, RPKM, FPKM, CPM
- Acronyms that stand for:
- TPM: Transcripts Per Million
- RPKM - Reads Per Kilobase per Million mapped reads (for SE data)
- FPKM - Fragments Per Kilobase per Million mapped reads (for PE data)
- CPM - Counts Per Million
- TMM
- This is a type of normalization and is an acronym for “Trimmed Mean of Ms”1.
Bibliography
-
Robinson, Mark D., and Alicia Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology 11.3 (2010): R25. ↩
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.