We have developed an integrated software package that combines together all the steps required in the 16S analysis. It takes raw 16S rDNA reads quality controls them, creates OTUs, does OTU classification and generates a phylogenetic tree of the OTU sequences. The output is a .biom
file and a Newick .tre
file that can be pulled into R for further analysis. The package is wrapped into a Nextflow pipeline which is accompanied by a configuration file whereby read processing parameters and classification database can be predefined. The resulting pipeline uses FastQC and MultiQC for QC reporting, usearch for reading QC, merging and OTU picking, and QIIME for classification and phylogenetic tree generation. The whole workflow is packaged in Singularity containers and this makes it portable to any system that has Singularity setup.
Two workflow languages were investigated for running this pipeline. CWL and Nextflow.
To access the CWL workflow go here (runs on Docker containers or a locally software installed setup)
To access the Nexftlow workflow go here (runs on Singularity containers)
The Nexflow workflow is the most updated version of the pipeline and for now the recommended to use.
Todos - please let us know if you want to help on any of this.
usearch
replaced with vsearch
. This will make containerisation and distribution much easier. usearch
is currently license and vsearch
not. We have have done comparisons locally and vsearch
performs just as well.To cite this pipeline, please use: Baichoo, S., Souilmi, Y., Panji, S., Botha, G., Meintjes, A., Hazelhurst, S., Bendou, H., Beste, E. de, et al. 2018. Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics. BMC Bioinformatics. 19(1):457. DOI: 10.1186/s12859-018-2446-1.