Merged multiple runs, several different experiments, when to best split them up?

ben · June 4, 2018, 6:33pm

Hi all,

One point of clarification and I did a search, but multiple questions have centered on the merging of the data, but my question is when to best split the data?

Going back: I have 7 different MiSeq runs, in there, there are probably 10 different experiments, when is the best time to split the data?

I merged after DADA denoise and created table/req-seqs artifacts.
Created a tree for the merged data
Run phylogenetic core metrics/taxonomy

Should I instead:

Merge after DADA denoise and create merged artifacts
Filter out my samples according to the experiment
On each separate experiment create a tree
Run phylogenetic core metrics/taxonomy on each experiment separately?

I think the latter process should be the correct one, but I am unsure? The de novo clustering that DADA performs is based upon the each run, when merged, the Sequence Variants should cluster together(?).

Thus, filtering the samples by experiment will allow for the diversity metrics (such as UniFrac and Bray) to be picked appropriately (and not have other influences from other samples that we are not interested in).

This forum has been responsive and helpful. Thank you again. Ben

colinbrislawn · June 5, 2018, 12:19am

Great question, Ben.

There are many different ways of doing this. I like your second option for a few reasons.

dada2 learns error profiles for each run, so each MiSeq run should be run through dada2 separately
dada2 SVs should have enough resolution to elegantly match when merging
the MSA performed for treebuilding will not be influenced by SVs from other experiments

Note that you have another option: You can split up samples between different experiments as the very first step, then process each run using dada2 then merge like you described. This method means that samples from different experiments will never influence each other during the denoising or treebuilding process. This total separation was absolutely necessary for older de novo OTU picking methods, but modern SV methods are more robust to this.

Edit: f the denoising / SVs methods were perfect, both your methods would produce identical results and different experiments could be merged and split at any time. However, keeping them separate mitigates limitations of the methods.

Colin

system · July 6, 2018, 6:19am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.