I'm working with 16S data from several studies for meta-analysis. What I did is, import-cutadapt-denoise with dada2 (same primer pairs --> same trim/trunc length, and used --p-max-ee-f,r for quality trim)-phylogenetic tree generation with fasttree - diversity(alpha, beta) analysis with "core-metrics-phylogenetic" function.
I think there was no problem on the process, but what I get from bray-curtis-emperor.qzv is like this:
Whenever you see weird patterns like this, it most definitely is coming from a lack of overlap between studies. You can confirm this yourself by drawing a heatmap of all of your abundances (grouped by study). Unifrac may help with smoothing out these sorts of issues.
That's what I thought too, but isn't there possibility of mistake or incorrect process I did and got this weird-pattern outcome? Or just so high possibility of lack of overlap that I don't have to consider mistake/incorrect process?
I don't think it matters if this is caused by incompatible input data or a mistake in processing. Either way, you have got to get to a shared set of features if you want to compare these studies.
Check that all the projects have sequenced the same region, preferably with the same primers. Comparing a V3 vs V4 would make a plot like this. Then make sure the same region was selected after all the trimming and truncating and denoising.
There is a lot of detective work to be done here. Let us know what you find!
My data are comprised with v3-v4, v1-v2, and v4-v5 regions, and there are 2 studies using same primer sets which are not overlapped. Is this because they are just out of overlapping point on their data?
And is this normal when you do meta-analysis and one of the difficulties with it?
There is no choice but to use OTU clustering rather than ASV by dada2, then. But can I use dada2 outputs for closed-reference OTU clustering? I mean, I'm a bit confused as there are several papers and discussions on this forum to compare between OTU and ASV based methods, so I thought they are quite different and incompatible each other.
What I thought for OTU clustering is:
demux -> cutadapt -> merge -> quality control -> dereplicate -> OTU clustering.
But would it also fine with:
demux -> cutadapt -> dada2 -> OTU clustering?
In this use case, we are using closed-reference clustering to match your various amplicon regions against a common database. This means the output features will be 100% biased by / consistent with the features from the database, because they are just the same features from the database. The huge effect of this bias means that preprocessing order sort of does not matter. Results should be similar with or without denoising with dada2.
You can still use DADA2 and it will make valid and accurate ASVs, but they will not overlap as you have observed. Normalizing to a database with closed-ref clustering will guarantee that they map.