16S vs. Shotgun: Which to Choose?
16S sequencing or shotgun sequencing? Almost all microbiome researchers ask themselves this question when planning a new study because the vast majority of microbiome publications utilize either 16S rRNA gene sequencing or shotgun metagenomic sequencing to generate raw data for subsequent microbial profiling or metagenomics analyses. Each method has its pros and cons so, which method should you choose?
WHAT IS 16S rRNA GENE SEQUENCING?
16S rRNA gene sequencing, or simply 16S sequencing, utilizes PCR to target and amplify portions of the hypervariable regions (V1-V9) of the bacterial 16S rRNA gene1. Amplicons from separate samples are then given molecular barcodes, pooled together, and sequenced. After sequencing, raw data is analyzed with a bioinformatics pipeline which includes trimming, error correction, and comparison to a 16S reference database. After the reads are assigned to a phylogenetic rank, a taxonomy profile can be generated. Similarly, ITS sequencing follows the same strategy but targeting the ITS (Internal transcribed spacer) region found in fungal genomes.
WHAT IS SHOTGUN METAGENOMIC SEQUENCING?
Unlike 16S sequencing, which only targets 16S rRNA genes, shotgun metagenomic sequencing sequences all given genomic DNA from a sample. The library preparation workflow is similar to regular whole genome sequencing, including random fragmentation and adapter ligation. A typical workflow for taxonomy analysis of shotgun metagenomic data includes quality trimming and comparison to a reference database comprising whole genomes (e.g. Kraken2 and Centrifuge3) or selected marker genes (MetaPhlAn4 and mOTU5) to generate a taxonomy profile. Because shotgun metagenomic sequencing covers all genetic information in a sample, the data can be used for additional analyses, e.g. metagenomic assembly and binning, metabolic function profiling, and antibiotic resistance gene profiling.
16S/ITS SEQUENCING VS. SHOTGUN METAGENOMIC SEQUENCING
If your study requires genomic analyses beyond taxonomy profiling, such as metabolic pathway analysis, you should consider shotgun metagenomic sequencing due to its greater genomic coverage and data output. If composition profiling is the main purpose of the study, both techniques have pros and cons to be considered (Table 1).
|Shallow Shotgun Sequencing|
|Low Risk||High Risk||High Risk|
Host DNA Interference
Minimum DNA Input
|10 copies of 16S||1 ng||1 ng|
Recommended Sample Type
|All||Human Microbiome||Human Feces|
Table 1: Overall, shotgun metagenomic sequencing has greater taxonomy resolution, functional profiling, and cross-domain coverage. In all other aspects, including price and sample origin compatibility, 16S/ITS sequencing has the advantage.
The taxonomy resolution of 16S/ITS sequencing depends on the variable regions targeted, the organism itself, and the sequence analysis algorithm. In recent years, some error-correction methods, e.g. DADA26, have dramatically improved the accuracy and taxonomy resolution of this technique. With DADA2, species-level resolution for many organisms using regular 16S sequencing is now a reality. But in theory, shotgun metagenomic sequencing can achieve strain-level resolution because it can cover all genetic variations. Although in practice, the accuracy of strain-level resolution still faces technical challenges. Even so, shotgun metagenomic sequencing achieves higher resolution compared to 16S/ITS sequencing.
If metabolic function analysis is a goal, most researchers will quickly overlook 16S and ITS sequencing. But, there are some tools to can infer metabolic function from taxonomy data, e.g. PICRUSt7. But, in general shotgun metagenomic sequencing is often utilized when functional profiling is required because of the additional gene coverage.
MICROBIAL COVERAGE AND RECOMMENDED SAMPLE TYPE
Shotgun sequencing examines all metagenomic DNA while 16S sequencing only 16S rRNA genes, which also suffers from incomplete primer coverage. Consequently, the former has greater cross-domain coverage. Then, why does Table 1 denote 16S/ITS sequencing as better in bacterial and fungi coverage? This stems from the species coverage of available reference databases because the taxonomy prediction of these sequencing approaches heavily depends on the reference database used. Currently, the coverage of 16S/ITS databases is much better than whole-genome databases. This is because the whole genomes of microbes associated with the human microbiome are much better studied than genomes from microbes associated with other environments. This is why it is recommended to use shotgun metagenomic sequencing for human-microbiome-related samples, such as feces and saliva, if taxonomy profiling is the main purpose.
Moreover, metagenomic sequencing has a higher dependence on the reference database. For example, if a bacterium has no closely related representative in the 16S reference database, you might be able to identify it at a higher phylogenetic rank or as an unknown bacteria. But, in the case of shotgun metagenomic sequencing, if a bacterium does not have a close relative (a genome from the same genus) in the reference genome database, you are likely to miss it completely. For example, the ZymoBIOMICS Spike-in Control I contains two microbes alien to the human microbiome (Imtechella halotolerans and Allobacillus halotolerans), whose genomes were previously not available. If you spike it into a fecal sample and sequence with shotgun sequencing, most bioinformatic pipelines will miss them completely unless you manually add these two genomes into the reference database. On the other hand, if analyzed with 16S sequencing, they will be identified due to the presence of their 16S sequence in reference databases.
Error-correction tools, such as DADA2, not only improve the taxonomy resolution of 16S/ITS sequencing, but they also improve accuracy. This is demonstrated when sequencing DNA from the mock microbial community (e.g. ZymoBIOMICS Microbial Community Standard). All 16S sequences are recovered with no error in the sequence, i.e. no false positives. But, with shotgun metagenomic sequencing, unless there is a perfect representative genome in the reference database for a microbe sequenced, the bioinformatics analysis is likely to predict the existence of multiple “closely-related” genomes. These closely related genomes can be from different species of the same genus or even different genus. For example, assume there are three closely related microbes, A, B, and C, and they share some sequences in common. Species A shares some sequences only with B and some other sequences only with C. If the reference database only contains genomes from B and C, when A was sequenced, the bioinformatics will predict that both B and C are present. For instance, both A and B could be strains of Escherichia coli and C is Salmonella enterica; the sequences uniquely shared by B and C may stem from a horizontal gene transfer, which is common between closely related microbes. Because of this, 16S/ITS sequencing is better in regard to false positives.
HOST DNA INTERFERENCE
The presence of too much host DNA can cause non-specific amplification in the library preparation process of 16S and ITS sequencing, but the impact is controllable by adjusting PCR cycles and changing primers. On the other hand, the interference of host DNA is a much more difficult problem for shotgun metagenomic sequencing even though the cost of sequencing has decreased dramatically. Depending on the sample type, some samples can contain >99% human host DNA, which not only increases sequence cost but also introduces uncertainty to the measurement. This is why many researchers look into host DNA depletion, e.g. HostZERO Microbial DNA Kit, before the library preparation of shotgun sequencing. However, there may not be enough microbial genomic DNA left for shotgun sequencing after host DNA depletion, which typically requires a minimum input of 1ng. The interference of host DNA is why shallow shotgun sequencing is only recommended for human fecal samples.
While Shotgun metagenomic sequencing requires 1 ng DNA input in minimum, 16S/ITS sequencing is much more sensitive with input minima being femtograms or even as low as 10 copies of 16S rRNA genes.