Edit me

Step 1.1: Adapter trimming

Sequencing facilities usually produce read files in fastq format 1, which contain a base sequence and a quality score for each base in a read. Usually the adapter sequences have already been removed from the reads, but sometimes bits of adapters are left behind, anywhere from 90% to 20% of the adapter length. These need to be removed from the reads. This can be done using your own script based on a sliding window algorithm. A number of tools will also perform this operation: Trimmomatic 2, Fastx-toolkit (fastx_clipper), Bioconductor (ShortRead package), Flexbar 3, as well as a number of tools listed on BioScholar 4 and Omics tools 5 databases.

Selection of the tool to use depends on the amount of adapter sequences leftover in the data. This can be assessed manually by grepping for parts of known adapter sequences on the command line.

Bibliography

  1. Cock, P. J. A., Fields, C. J., Goto, N., Heuer, M. L. & Rice, P. M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010). 

  2. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). 

  3. Dodt, M., Roehr, J. T., Ahmed, R. & Dieterich, C. FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms. Biology (Basel) 1, 895–905 (2012). 

  4. Tools to remove adapter sequences from next-generation sequencing data, Genomics Gateway. at http://bioscholar.com/genomics/tools-remove-adapter-sequences-next-generation-sequencing-data/ 

  5. Adapter trimming software tools, WGS analysis - OMICtools. at https://omictools.com/adapter-trimming-category