I am doing a meta-anlysis using several 16S data sets from our own work and other studies.
They are surely using different methods of sequencing. Just like different methods in DNA extracting, sequencing region of 16S...
In the past, I would use closed OTU strategy of QIIME1 to diminish the batch effects as much as possible. But now, I have moved all my analytic pipelines to QIIME2. I notice that the DADA2 of QIIME2 is really different from the closed OTU strategy in QIIME1. DADA2 use the single base variant to determine each feature.
Question 1: My question is that is it suitable for us to use merged feature table and merged rep-seqs produced by DADA2 to do the meta-analysis? Or should we use another pipeline of QIIME2 (deblur-closed otu strategy) which might be more similar to the QIIME1 to do the meta-analysis?
In fact I have obtained the results using DADA2, which have really showed evident batch effects. (unweighted-unifrac)
Question 2 I guess it is because the different sequencing region would produce a lot of redundant ASVs in DADA2 pipeline so that the unweighted distance would be really large. However, we know that all the features (ASVs) would be assigned to taxonomic level in the following analysis. In this step, the ASVs though really different (caused by different sequencing region) would still be assigned to the same taxa. So, if we analysis them at a higher taxonomic level (genus...), would batch effects be diminished?
Question 3 In summary, I really want to know the best pipeline for meta-analysis. Can any experienced expert help me ? Many thanks to you!
To combine regions, in general, you need a good reference and consistent approach. Because ASVs are sequence specific, if you’re combining the same region, you need to use the same parameters and if you’re combining different regions, you still need a scaffold. Typically, people scaffold their ASVs using an insertion tree. So, if you used an alignment tree, you may have a larger signal. (See the fragment insertion paper for an example).
I would probably just do closed reference clustering of my ASVs here - you get a scaffold and the tree and taxonomy are already calculated. However, I will say that even with OTUs, you’ll probably still have a big signal from the hypervariable region.
Yes, it means processing the data the same way all the way through: same trimming parameters in cutadapt and same dada2 parameters if you’re matching the same region.
The parameter "--p-min-fold-parent-over-abundance" in DADA2 of all batches is same as 4.
While the parameter of cut length is different among all sets.
Oh! I have observed the similar question you have answered about why no feature was filtered, which is as supposed to be. But I am still confused about the Q1. Many thanks to you!
Q3. I found that the following taxonomic annotation analysis would not use any results of q2-fragment-insert… Or is there different between gg-13-8-99-515-806-nb-classifier.qza and sepp-refs-gg-13-8.qza
Let me try to answer your questions systematically. I just had my first cup of , so my brain is still kinda kicking in and someone else may need to help me out. I'm going to try to work my way backwards and see where it gets me.
Have you checked the underlying disances and/or Adonis? It's possible that it's still the largest effect int he data, but explains a smaller portion of the variation. As I said above,
The hypervariable region is still a complicated one that hasn't been solved.
There is a difference between betwweene the sep-ref-gg-13-8.qza and the gg-13-8-99-515-806-nb-classifier.qza. The most salient one here is that you're not working with the 515-806 region. So, you want the full length classifier; I think it's available on the resources page.