I am conducting a meta-analysis of multiple 16S metabarcoding studies. These studies cover different regions (V3–V4 and V4 only) and use different primers.
I retrieved the raw sequencing data from all of these studies and conducted the QIIME pipeline separately until merging the tables and sequences after DADA2 using the qiime feature-table merge and qiime feature-table summarize commands. I have several questions about what I have done and the next steps:
Is it OK to merge datasets that comprise only forward reads (due to poor quality of the reverse reads) with paired-end datasets?
What should I do now for taxonomic classification? Should I extract the entire 16S gene sequences from the SILVA database (qiime feature-classifier extract-reads), train them, and then assign the merged dataset's taxonomy? I saw that primer-specific classifiers are not recommended anymore anyway : https://library.qiime2.org/data-resources#naive-bayes-classifiers
If it is not recommended to use qiime, is it possible to merge the data in RStudio? ?
Just jumping in to kindly request that you don't double post any questions that you might have. Our forum moderators will be happy to jump in as they are able; thanks in advance for your understanding!
This remains a challenge, as the V3 ASVs will be totally unique from the V4 ASVs. There's not a perfect way to merge all of these without significant tradeoffs, as you have already discovered!
I retrieved the raw sequencing data from all of these studies and conducted the QIIME pipeline separately until merging the tables and sequences after DADA2 using the qiime feature-table merge and qiime feature-table summarize commands.
Sure, you can use just R1 if the R2 quality is poor.
Try the full-length Silva database or Greengenes2! Some regions will classify better than others, but that's okay.
Zooming out a little, the problem is that the single, underlying microbial community is being measured with various primers that all give slightly different results.
I hope it's okay if I -in? My group did an scoping review related to this issue:
You may find Table 1 the most useful; it breaks down four techniques (closed ref OTUs, de novo/open ref OTUs, ASVs, and a technqiue called MimpLiPi) and the implications for analysis when combining regions. I'll note hte last is pretty new and I dont know that it's been independently benchmarked like other technqiues.
Our paper didn't evaluate "best" just explored what was avaliable. Based on the review, the options I think are best to consider would either closed reference OTU clustering or ASVs and then collapse to a common taxonomic level. There are advantages and disadvantages to both of these approaches, depending on what you want and need from your data.