One point of clarification and I did a search, but multiple questions have centered on the merging of the data, but my question is when to best split the data?
Going back: I have 7 different MiSeq runs, in there, there are probably 10 different experiments, when is the best time to split the data?
- I merged after DADA denoise and created table/req-seqs artifacts.
- Created a tree for the merged data
- Run phylogenetic core metrics/taxonomy
Should I instead:
- Merge after DADA denoise and create merged artifacts
- Filter out my samples according to the experiment
- On each separate experiment create a tree
- Run phylogenetic core metrics/taxonomy on each experiment separately?
I think the latter process should be the correct one, but I am unsure? The de novo clustering that DADA performs is based upon the each run, when merged, the Sequence Variants should cluster together(?).
Thus, filtering the samples by experiment will allow for the diversity metrics (such as UniFrac and Bray) to be picked appropriately (and not have other influences from other samples that we are not interested in).
This forum has been responsive and helpful. Thank you again. Ben