H3ABioNet Next Gen Training dataset
Practice data set for Variant Calling is available here
These are synthetic data, generated using the NEAT simulator1 to produce synthetic reads with “golden” variants inserted into the reference genome before “sequencing”. This practice dataset was generated as WES for chromosome 1 at 50X, with mutation rates of 0.0005 and sequencing error rates at 0.005. Also posted is the reference that was used to generate the data, along with the golden vcf. Finaly, posted are all the SNP and Indel files from the GATK bundle that were used to call variants to check concordance with GATK.
Bibliography
-
Stephens, Z. D. et al. Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models. PLoS ONE 11, e0167047 (2016). ↩
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.