Meta-analysis of multiple 16s datasets with different regions best practice?

Hello,

I am conducting a meta-analysis of multiple 16S metabarcoding studies. These studies cover different regions (V3–V4 and V4 only) and use different primers.

I retrieved the raw sequencing data from all of these studies and conducted the QIIME pipeline separately until merging the tables and sequences after DADA2 using the qiime feature-table merge and qiime feature-table summarize commands. I have several questions about what I have done and the next steps:

  • Is it OK to merge datasets that comprise only forward reads (due to poor quality of the reverse reads) with paired-end datasets?

  • What should I do now for taxonomic classification? Should I extract the entire 16S gene sequences from the SILVA database (qiime feature-classifier extract-reads), train them, and then assign the merged dataset's taxonomy? I saw that primer-specific classifiers are not recommended anymore anyway : https://library.qiime2.org/data-resources#naive-bayes-classifiers

  • If it is not recommended to use qiime, is it possible to merge the data in RStudio? ?

Thank you for your help.

Hey @charlottejk,

Welcome to the :qiime2: forum :waving_hand:

Just jumping in to kindly request that you don't double post any questions that you might have. Our forum moderators will be happy to jump in as they are able; thanks in advance for your understanding!

Hi,

My apologies. I wanted to modify the text of my post but a new one was created instead.

1 Like

Hello Charlotte,

Welcome to the forums!

This remains a challenge, as the V3 ASVs will be totally unique from the V4 ASVs. There's not a perfect way to merge all of these without significant tradeoffs, as you have already discovered!

I retrieved the raw sequencing data from all of these studies and conducted the QIIME pipeline separately until merging the tables and sequences after DADA2 using the qiime feature-table merge and qiime feature-table summarize commands.

Great! That sounds like the method used in the Merging Multiple Runs tutorial.

Sure, you can use just R1 if the R2 quality is poor.

Try the full-length Silva database or Greengenes2! Some regions will classify better than others, but that's okay.


Zooming out a little, the problem is that the single, underlying microbial community is being measured with various primers that all give slightly different results.

Back in 2020, we discussed the problems with multi-region data sets.

Sure, we can merge in Python or R or Excel. The math is easy, the biology is hard. :microbe:

1 Like

Hi @colinbrislawn and @charlottejk,

I hope it's okay if I :qiime2: -in? My group did an scoping review related to this issue:

You may find Table 1 the most useful; it breaks down four techniques (closed ref OTUs, de novo/open ref OTUs, ASVs, and a technqiue called MimpLiPi) and the implications for analysis when combining regions. I'll note hte last is pretty new and I dont know that it's been independently benchmarked like other technqiues.

Our paper didn't evaluate "best" just explored what was avaliable. Based on the review, the options I think are best to consider would either closed reference OTU clustering or ASVs and then collapse to a common taxonomic level. There are advantages and disadvantages to both of these approaches, depending on what you want and need from your data.

Best,
Justine

4 Likes

Hi Colin, thank you so much for your very clear response

Thank you very much Justine for this resource that helped a lot.

1 Like