Introduction
Exploring the genetic histories of ancient human populations is a vital aspect of understanding our collective past. However, working with ancient DNA (aDNA) presents unique challenges due to the highly fragmented and contaminated nature of the DNA samples. In-solution hybridization enrichment has become a popular technique to effectively retrieve genetic data from these precious ancient samples.
The commercial “Twist Ancient DNA” reagent from Twist Bioscience is one such in-solution enrichment tool that targets approximately 1.2 million single nucleotide polymorphisms (SNPs) across the human genome. This allows researchers to cost-effectively generate genome-wide datasets from ancient samples, even those with low endogenous DNA content.
In this article, we’ll explore the performance of the Twist Ancient DNA reagent, comparing it to deep shotgun sequencing and evaluating the impacts of one versus two rounds of enrichment. We’ll also examine the effects of pooling multiple samples into a single enrichment reaction, assessing any potential biases or cross-contamination concerns. By the end, you’ll have a comprehensive understanding of how to optimize ancient human genome enrichment using in-solution techniques.
Overcoming Challenges in Ancient DNA Research
One of the major challenges faced when working with ancient DNA (aDNA) is the high proportion of exogenous DNA contamination present in the DNA extract. This contamination is primarily due to microbes invading the organism post-mortem, present in the soil where the specimen was buried, or introduced during sample handling and laboratory processes.
To counteract this, in-solution enrichment of target genomic regions using pre-designed oligonucleotides as molecular “probes” or “baits” has become a popular method. Compared to shotgun sequencing, this technique increases the proportion of target DNA in a sequencing library, lowering sequencing costs required to produce adequate comparable data across individual samples.
In 2012, Patterson and colleagues proposed a molecular bait design for application in human paleogenomic research that made use of a particular ascertainment technique to enable population genetics studies of global human populations over time (Patterson et al., 2012). This bait design, known as the ‘1240k reagent’, has been widely used, leading to the generation of thousands of individual genome-wide datasets.
However, since the original publication of the molecular baits sequences in 2015, the legacy 1240k reagent has only been available through a commercial arrangement to a small number of research groups. This presented researchers with the choice of either collaborating with these groups to access the 1240k reagent or using the more expensive deep shotgun sequencing to obtain adequate data compatible with the 1240k SNP loci.
In 2021, two biotechnology firms, Daicel Arbor Biosciences and Twist Bioscience, produced commercial in-solution enrichment kits targeting the same 1240k SNPs plus, in each, an additional set of variants. This has made these kits available to every research group. However, recent studies have revealed a strong allelic technical bias in data generated with the Daicel Arbor Biosciences baits, while a comparatively mild allelic bias is also present in the legacy 1240k reagent.
Evaluating the Twist Ancient DNA Reagent
In this study, we aimed to benchmark the commercial “Twist Ancient DNA” reagent from Twist Biosciences, using 24 ancient human samples from four populations across three continents, with a range of endogenous DNA percentages (0.1–44%). We compared deep shotgun sequencing, one and two rounds of enrichment with the Twist Bioscience “Twist Ancient DNA” reagent for cost-effectiveness and allelic biases. We also compared enrichment efficacy and biases between single and pooled library enrichments.
Our experiments were performed at the Australian Centre for Ancient DNA (ACAD)’s ultra-clean laboratory facilities, following rigorous procedures to minimize contamination and ensure high standards of quality for the genetic data. All post-amplification experiments were completed in standard molecular biology laboratories at the University of Adelaide, with subsequent bioinformatics workflows executed on the University of Adelaide’s HPC.
Preparing Libraries for Enrichment
For each of the 24 ancient human libraries, we tested one and two rounds of enrichment with the Twist Bioscience “Twist Ancient DNA” reagent. We also prepared libraries in two pooling configurations: 3 reactions with 2 low endogenous DNA libraries each, and 3 reactions with 4 high endogenous DNA libraries each.
While each enrichment reaction contained 1000 ng of total DNA, the amount of DNA required per library was reduced due to pooling, allowing for a decrease in PCR cycles to avoid overamplification and maintain library complexity.
Sequencing and Data Processing
All shotgun and enriched libraries were sequenced using a NovaSeq 6000 System at the Kinghorn Centre for Clinical Genomics. Raw data were processed with the aDNA analysis workflow package nf-core/eager, including mapping, quality filtering, and deduplication.
Pseudohaploid variant calling was performed using pileupCaller, and ancient DNA authenticity, endogenous DNA percentage, fragment size distribution, and post-mortem damage rates were determined using DamageProfiler.
Evaluating Enrichment Performance
Comparing the efficacy of deep shotgun sequencing to one and two rounds of enrichment, we observe that TW2 (two rounds of enrichment) consistently captured more SNPs per sample. However, when normalizing the data per million sequenced paired reads, at least 3 out of 4 libraries with mappable endogenous DNA percentage > 38% produced fewer SNPs per million reads after a second round of enrichment.
Although two rounds of enrichment consistently yields higher sequenced, mappable, and filtered post-enrichment endogenous DNA percentage, the unique post-enrichment endogenous DNA percentage was higher only for libraries with mappable endogenous DNA ≤ 27%.
Assessing Cost-Effectiveness
We tested the cost-effectiveness of deep shotgun sequencing and Twist enrichment using one or two enrichment rounds. Our fitted logarithmic model predicts that 2 rounds of enrichment was more cost-effective per SNP than 1 round only for libraries with mappable endogenous DNA ≤ 27%.
Evaluating Allelic Bias
Our assessment of allelic bias using f4 statistics yielded reassuring results, suggesting the absence of observable assay bias introduced by the Twist Bioscience “Twist Ancient DNA” reagent. This finding consolidates the reliability of this reagent in producing unbiased results for paleogenomic analyses.
Exploring the Effects of Library Pooling
Our exploration of the effects of pooling several libraries into a single enrichment reaction also yielded reassuring outcomes. We suggest that pooling up to four libraries does not have a substantial impact on SNP yield compared to single library reactions.
We investigated the potential for cross-contamination due to dual index hopping between molecules originating from different libraries. Even though there is a significant difference in single-index hopping rates between pooled and unpooled libraries, the undetectability of dual-index hopping through the calculation of contamination estimates, coupled with its extremely low estimated occurrence rates, underscores the reliability and cost-effectiveness of the pooling approach.
Implications and Future Directions
This study provides researchers in the field of human paleogenomics with a comprehensive understanding of the strengths and limitations of different sequencing and enrichment strategies using the Twist Ancient DNA reagent. Our results offer practical guidance for optimizing experimental protocols to generate reliable and cost-effective ancient human genome data.
The findings from this research consolidate the reliability of the Twist Bioscience “Twist Ancient DNA” reagent, which produces unbiased results for paleogenomic analyses, addressing concerns about allelic biases present in previously reported enrichment data.
Furthermore, the validation of pooling multiple libraries into a single enrichment reaction without substantial impacts on data quality or cross-contamination underscores the cost-effectiveness of this approach. This can enable more researchers to access affordable paleogenomic data, expanding the diversity of ancient human populations represented in the growing body of published genome-wide datasets.
As the field of human paleogenomics continues to advance, studies like this one will be crucial in guiding researchers towards optimal experimental strategies and enrichment tools. By providing a comprehensive evaluation of the Twist Ancient DNA reagent, this article empowers the research community to make informed decisions about their ancient DNA projects and generate high-quality, unbiased data to further our understanding of human evolutionary history.
For more information on ancient DNA research at Stanley Park High School, please visit our website.