Using multiple 16s variable regions for analysis

Thanks for the tag @lizgehret! Yeah, I've spent a lot of time thinking about this in the last year. Attempting to write a paper, TBH, but its slower going than I expected.

So, @Rakaya, like Liz said, the issue in directly combining the two regions is that you're not going to have an ID overlap if you only use the ASVs. Remember from ESVs should replace OTUs that the ASV ID is that single nucleotide sequence. ASVs that differ by a single nucleotide are going to be identified as different, and so ASVs with no overlap are just going to be different.

So, you need a way to provide a common ID for those features. There are 3 options that most people seem to use.

Closed Reference OTUs ASVs with phylogenetic Scaffolding Genus-level taxa
Seminal Reference Lozupone et al, 2012 Jansen et al, 2018; Several, see Wang et al
What it does ASVs are clustered against a common reference database that are shared across multiple regions; sequences that dont matcht he database are discarded ASVs get inserted into a reference backbone that spans multiple regions. ASVs that dont fit in the tree are discarded (although these are often low quality ASVs) The taxonomic assignment for the ASV are used to collapse to a genus level or higher
Can it be used for UniFrac distance? Yes (tree from OTU databse) Yes (insertion tree) No
Can features be compared across regions w/o phylogeny Yes (reference ID) No (ASV IDs are region specific) Yes (taxonomic names should be common across regions)
Strengths Feature-level resolution possible for everything; has been used frequently; computationally effecient Lets you keep ASVs in high quality placements; easy to combine with collapsed data Annecdotally best at minimizing region-to-region differences
Limitations Reads that dont match the database are discarded, so you need a good database; lower resolution that ASVs; can sometimes lead to big regional effects Really only useful for phylogeny-based analyses, must be combined witwh something else Loss of resolution may limit biologically meaningful conclusions
Key qiime2 plugins q2-vsearch q2-fragment-insertion q2-taxa

So, I think the answer for your specific question in region combination is, as always, it depends on what you want. I think for your beta diversity/core microbiome work, your best bet is to either work on collapsed data, or to move to OTU clustering. With both, you may need to consider if there's a regional or study adjustment you need to make (database effects, reagent contamination, etc) and how to model that.

Best,
Justine

6 Likes