Abstract
Variant tracking is now a major goal of public health surveillance for SARS-CoV-2, but information from clinical samples is biased because only a fraction of samples can be sequenced. Wastewater containing SARS-CoV-2 RNA can be sequenced to learn about the majority of variants present in the sewershed at the level of the entire population. Unlike clinical samples taken from individuals, wastewater samples contain viral RNA derived from SARS-CoV-2 infections across tens to thousands of individuals, depending on sewershed size. This difference from clinical samples necessarily complicates data interpretation. Instead of seeking to identify a single variant, analysis of a wastewater sample must allow for the coexistence of multiple variants (Figure 1). Methods for wastewater-based variant tracking include quantitative PCR-based assays, probe-capture enrichment sequencing, tiled amplicon sequencing, and targeted amplicon sequencing. Bioinformatic approaches and data interpretation differ depending on the chosen method. Here we summarize wet-lab and bioinformatic approaches, discuss the advantages and limitations of each method, and share results from our own efforts. PCR-based variant assays can be highly quantitative (RT-qPCR or RT-ddPCR) and precise. Typically each assay can be used to identify a single mutation in the SARS-CoV-2 genome, and assays can be multiplexed to create variant panels (1,2). Since each PCR assay is different, depending on the locus, primers, and probe, it isn't straightforward to compare results of a variant assay to the results of another assay at a different locus (e.g. the CDC N1 diagnostic assay). To determine the relative frequency of a given variant, it is desirable to compare the results of assays specific to the wild type and mutant allele for each locus of interest, doubling the work involved (1). PCR-based assays have long lead times for development (1-3 months), but a short time-to-results (~2-6 hours depending on the protocol). During the course of the COVID-19 pandemic, commercial and academic laboratories have not been able to design and validate assays in real time with the emergence of new variants (when having such assays on hand would be most useful for public health). Nonetheless, these assays continue to be the most sensitive, quantitative metric of variants in the mixed population within wastewater. The probe-capture enrichment method makes use of a suite of probes designed to be complementary to regions across the SARS-CoV-2 genome (e.g. Illumina Respiratory Virus Panel or Twist Bioscience SARS-CoV-2 Research Panel). Typically, the RNA is reverse-transcribed and an Illumina sequencing library is created from the cDNA. When this library is combined with the beads or chip containing the probes, sequences matching the SARS-CoV-2 genome hybridizes to the probes and non-specific cDNA is washed away. The enriched portion of the library is sequenced by Illumina sequencing (MiSeq, NextSeq, HiSeq, or NovaSeq). Advantages of probe-capture enrichment are that it avoids amplification biases of tiled amplicon sequencing, it doesn't rely on primers that could be compromised by future mutations, and it can allow discovery of novel mutations across the genome as in Crits-Christoph et al. (3). Additionally, other targets such as respiratory viruses of interest can be included in the probe panel, allowing multiple targeted analyses in the same sequencing run. Tiled amplicon high-throughput sequencing methods include ARTIC (4,5), Swift (6), Midnight (7), and others (8). Each of these methods includes multiple sets of primers that target specific overlapping segments of the SARS-CoV-2 genome. To prevent generation of super-amplicons that amplify across primer sets, the primers are often divided into two pools and the PCR reaction products subsequently pooled for Illumina or Nanopore library preparation. The choice of sequencing platform and library preparation must be compatible with the amplicon length and the necessary depth required to achieve coverage of each amplicon. One well-noted issue is that some regions are amplified more efficiently, resulting in uneven sequencing depth across the genome. Another issue is amplicon drop-out, caused by mutations in the genome that prevent primers from binding. To overcome these issues, researchers have reported adjusting primer concentrations, and additional primer sets have been added. Bioinformatic analysis of whole-genome sequence data from tiled amplicon or probe-capture enrichment involves: 1) adapter trimming, 2) primer trimming (for tiled amplicons only), 3) read-mapping to align the data to the SARS-CoV-2 reference genome, 4) variant calling to identify mutations and their frequencies in the sample, and 5) determination of which variants were present. Most variant-calling pipelines result in tables of unlinked mutations and their frequencies. This means it is often impossible to determine which mutations coexisted on the same strand of genomic RNA in the sample prior to sequencing. Using a priori knowledge of variant lineages and their mutations (i.e. from GISAID) can allow determination of signature mutations (sometimes referred to as 'quasi-unique' because they are found frequently in one lineage and less frequently in all others) (9,10). If any of these signature mutations is identified in sequence data, it is interpreted that the specific lineage of interest was likely present. This interpretation can yield misleading results if signature mutations from different lineages co-occur within a novel lineage. The relative abundances of different variants can be semi-quantitatively determined based on the coverage depth at each mutation. However, due to PCR bias and uneven coverage depths, relative abundance is not always a reliable metric. Targeted amplicon methods use primers specifically designed to amplify a particular region(s) of interest. To ensure specificity, nested primers may be used (11,12). For SARS-CoV-2, some targeted amplicon sequencing methods have focused on the S gene, which harbors most mutations of interest that impact infectivity, immune evasion, and vaccine resistance. Wet-lab methods consist of PCR for a single amplicon, followed by library preparation and Illumina (or Nanopore) sequencing. Bioinformatic analysis makes use of the fact that all mutations on a single amplicon are derived from the same RNA molecule in the sample, and hence mutations within the amplicon are linked. The data can be analyzed via methods similar to 16S rRNA gene sequence data, following adapter removal, sequence denoising (to remove sequencing errors), chimera removal, and matching to known lineages. Novel mutations can also be detected by this method. The main disadvantage of this method is the inability to detect mutations outside of the targeted genomic region. Mutations in the primer region may also bias detection of novel variants. Results and sequencing challenges. Starting in January 2021, we tested ARTIC v3, Swift, and the Illumina Respiratory Virus probe-capture enrichment panel on RNA directly extracted from wastewater. Of these, the only successful method was ARTIC, but the resulting genomes were often highly incomplete. Genome coverage improved somewhat when direct extraction was followed by rRNA depletion, and when viruses were concentrated via ultrafiltration or Nanotrap beads prior to RNA extraction. Overall, we hypothesize that SARS-CoV-2 RNA integrity (13), RNA concentration, and RNA enrichment affect sequencing success. We searched for signature mutations of Variants of Concern in partial genomes from wastewater sampled in the San Francisco Bay Area (California) April - June, 2021 (Figure 2). This analysis primarily identified the Alpha and Delta variants, which were the predominant variants during this time period. Notably, in our retrospective study, the Delta variant was identified as early as April 8, 2021, corresponding to early reports of this variant in the US. More recently we have undertaken a comparison of RT-qPCR-based variant assays and targeted amplicon sequencing. These methods will provide quantitative data and information on linked mutations to describe the trajectory of the Delta lineage in California. Sequencing of wastewater samples is planned to continue through early 2022; thus, any results indicating the introduction of new variants in CA will be included in the presentation.
The following conference paper was presented at the Public Health and Water Conference & Wastewater Disease Surveillance Summit in Cincinnati, OH, March 21-24, 2022.
Author(s)R. Kantor1; S. Islam2; J. Bradley Silva3; S. Harris-Lovett4; K. Nelson5
SourceProceedings of the Water Environment Federation
Document typeConference Paper
Print publication date Mar 2022
DOI10.2175/193864718825158299
Volume / Issue
Content sourcePublic Health and Water Conference
Copyright2022
Word count6