DADA2 multiple runs with different number of samples

Dear all,
I have several runs on Illuminas MiSeq and now I wonder if it matters that some runs contain many samples and others few when it comes to DADA2? The smallest number of samples in one run is 10…

Best wishes Solveig

2 Likes

Hello Solveig,

In the case of dada2, each sample is processed separately, so this makes no difference which MiSeq run they are from. In general, Qiime will work with any number of samples as input data, but let us know if this causes any problems.

You probably already know this, but there might be minor batch effects MiSeq runs. Include a category like MiSeqRunNumber in your metadata file so that you can check for batch effects.

I hope that helps,
Colin

Thanks for your answer. I am not very familiar with how the error model works, and that is the step I was wondering if could be affected negatively if the number of samples in one run is low - like my 4 runs where I have 10 samples in one run, and between 50-90 samples in the other 3. Not knowing much about the error model I wondered if it would be problematic to merge these four runs after the DADA2 step if error learning in one run is based on as little as 10 samples, and the others provide a higher number of samples to perform the error learning on. I might have gotten this all wrong, but I thought we needed to upload the runs separately, run them through DADA2 separately, and merge the feature tables after this step because of the error model?

Solveig

1 Like

Hi Colin,
A short follow-up: In the FMT-tutorial it says that " the DADA2 denoising process is only applicable to a single sequencing run at a time." Is that due to DADA2 itself, or is it related to how Qiime2 operates?

Cheers,

Rune

2 Likes

@RuneGronseth and @stangedal,
You are both correct in your assertions that dada2 should be run on each sequencing run/lane separately (absolutely do not combine runs prior to running dada2! :grimacing: )

This is due to dada2 itself, not QIIME2. The error models assume that the data are from a single run, and differences in quality between runs upset this, potentially causing reads from a lower-quality run to be discarded.

No, you are not wrong. You are absolutely correct. :grin:

I believe that the number of samples does not matter at all — it is the number of reads that matters (but perhaps I am mistaken). I am not sure what guidelines there are on the minimum number of reads one should input to dada2, and perhaps @benjjneb can assist here.

I hope that helps!

3 Likes

That is correct.

The error model can be fit on as few as hundreds of reads, but it will get significantly better as read #s increase at least into the 10s of thousands, with more minor improvements beyond that.

3 Likes

This was very helpful - thank you very much :tada:

1 Like

For merging the feature tables and representative sequences from multiple runs I made a simple zsh script since, so far, Qiime2 can only merge two.

2 Likes

Thanks for sharing @gisle! The QIIME 2 2017.12 release is out now and it includes variadic inputs, so no more intermediate artifacts when merging more than two runs! :tada:

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.