Using multiple 16s variable regions for analysis

I aim to compare the gut microbiota composition between 2 sample groups. The first group has V1-V2 region of the 16s sequenced while the second group has the V3-V4 region sequenced.
I am thinking to work separately on each: denoise and classify against a region-specific classifier for each group.
However, I am not sure afterward how to merge them and get subsequent results such as beta-diversity and core-microbiome composition comparing the 2 groups together.
Thus, my question is: is it possible after classification against region-specific classifier for each group to combine the two taxonomy.qza files? If yes what's the command used? and can I proceed afterward in the analysis?

Or is it possible from the beginning to combine the 2 groups together, denoise and classify against general classifier that encompass V1 till V4 by which I would generate it specifically to the V1-V4 as shown below:
qiime feature-classifier extract-reads
--i-sequences silva-138.1-ssu-nr99-seqs-derep-uniq.qza
--p-f-primer V1forward primer
--p-r-primer V4reverse primer
--p-n-jobs 2
--p-read-orientation 'forward'
--o-reads silva-138.1-ssu-nr99-seqs-v1-v4.qza
Would any those suggestions make sense or work :sweat_smile:?

Thank you!

Hi @Rakaya,

It doesn't look like there is any bp overlap between the V2 and V3 regions - so you wouldn't be able to merge these two regions via the typical denoising pipelines. You could compare the two resultant taxonomies to see which region provided better taxonomic resolution with the given classifier you're using, but not sure if that's what you're looking for in your analysis.

I know @jwdebelius has recommended OTU clustering for a similar analysis, but I can't seem to find the particular post where she discusses this. Maybe she can 'QIIME' in with some suggestions? :wink:


Thanks for the tag @lizgehret! Yeah, I've spent a lot of time thinking about this in the last year. Attempting to write a paper, TBH, but its slower going than I expected.

So, @Rakaya, like Liz said, the issue in directly combining the two regions is that you're not going to have an ID overlap if you only use the ASVs. Remember from ESVs should replace OTUs that the ASV ID is that single nucleotide sequence. ASVs that differ by a single nucleotide are going to be identified as different, and so ASVs with no overlap are just going to be different.

So, you need a way to provide a common ID for those features. There are 3 options that most people seem to use.

Closed Reference OTUs ASVs with phylogenetic Scaffolding Genus-level taxa
Seminal Reference Lozupone et al, 2012 Jansen et al, 2018; Several, see Wang et al
What it does ASVs are clustered against a common reference database that are shared across multiple regions; sequences that dont matcht he database are discarded ASVs get inserted into a reference backbone that spans multiple regions. ASVs that dont fit in the tree are discarded (although these are often low quality ASVs) The taxonomic assignment for the ASV are used to collapse to a genus level or higher
Can it be used for UniFrac distance? Yes (tree from OTU databse) Yes (insertion tree) No
Can features be compared across regions w/o phylogeny Yes (reference ID) No (ASV IDs are region specific) Yes (taxonomic names should be common across regions)
Strengths Feature-level resolution possible for everything; has been used frequently; computationally effecient Lets you keep ASVs in high quality placements; easy to combine with collapsed data Annecdotally best at minimizing region-to-region differences
Limitations Reads that dont match the database are discarded, so you need a good database; lower resolution that ASVs; can sometimes lead to big regional effects Really only useful for phylogeny-based analyses, must be combined witwh something else Loss of resolution may limit biologically meaningful conclusions
Key qiime2 plugins q2-vsearch q2-fragment-insertion q2-taxa

So, I think the answer for your specific question in region combination is, as always, it depends on what you want. I think for your beta diversity/core microbiome work, your best bet is to either work on collapsed data, or to move to OTU clustering. With both, you may need to consider if there's a regional or study adjustment you need to make (database effects, reagent contamination, etc) and how to model that.



Thank you, I will try the suggested options. However, the link for the paper discussing genus-level taxa wang et al is incomplete :sweat_smile:. May you please fix it.

1 Like

I've fixed the link!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.