Meta-analysis of multiple 16s datasets with different regions best practice?

Hello Charlotte,

Welcome to the forums!

This remains a challenge, as the V3 ASVs will be totally unique from the V4 ASVs. There's not a perfect way to merge all of these without significant tradeoffs, as you have already discovered!

I retrieved the raw sequencing data from all of these studies and conducted the QIIME pipeline separately until merging the tables and sequences after DADA2 using the qiime feature-table merge and qiime feature-table summarize commands.

Great! That sounds like the method used in the Merging Multiple Runs tutorial.

Sure, you can use just R1 if the R2 quality is poor.

Try the full-length Silva database or Greengenes2! Some regions will classify better than others, but that's okay.


Zooming out a little, the problem is that the single, underlying microbial community is being measured with various primers that all give slightly different results.

Back in 2020, we discussed the problems with multi-region data sets.

Sure, we can merge in Python or R or Excel. The math is easy, the biology is hard. :microbe:

1 Like