How to handle small number of samples in one run on DADA2

I would like to ask how to handle samples sequenced in multiple runs when using DADA2. Some of my samples have been re-sequenced. I have one sample and three samples that were sequenced separately from the runs that sequenced most of my samples.
According to the FMT study tutorial (Fecal microbiota transplant (FMT) study: an exercise — QIIME 2 2023.9.2 documentation), "the DADA2 denoising process is only applicable to a single sequencing run at a time, so we need to run this on a per sequencing run basis and then merge the results".
Would you tell me if I should run DADA2 on each run, even though two of the runs contain one or three samples?

I use QIIME2 version 2023.9 amplicon Distribution (downloaded through conda).

Thank you very much.

Hello!

It would be an ideal solution, if you had an access to the rest of the samples (or part of them, at least) for the same run as resequensed samples. It is what I usually do - I run dada2 for the whole run (or for a part of the run with enough number of samples), filter my samples to a separate table, filter rep-seqs file based on my already filtered table and merge that files with the rest of the samples (of course I run it with the same parameters).

Best,
Timur

2 Likes

Hello, @timanix
Thank you very much for your reply.

Blockquote
I run dada2 for the whole run (or for a part of the run with enough number of samples), filter my samples to a separate table, filter rep-seqs file based on my already filtered table and merge that files with the rest of the samples (of course I run it with the same parameters).

Could you please tell me how many samples would be enough for the DADA2 analysis to work? If I have three sample sets with 90, 130 and 30 samples respectively, excluding the re-sequenced samples, what should I do?
Thank you.

Hello!
Do you mean, you have three different runs, from which you got 90, 130 and 30 samples?

I read that Dada2 developers recommend having 1M reads for proper error model. I would expect no issues with first 2 runs. Regarding third subset, I would denoise it separately if it have at least 500K reads (gut feeling!). If less, I would try to denoise it separately and combining with 90 samples to see what will happen.

If you also have 2-3 samples that were sequenced outside of that 3 runs and you don't have access to the rest of the samples from there, then I think I would merge it with other samples before Dada2. But it is already over my head to judge if it is better to merge 3 samples with bigger dataset from another run or just denoise them separately.

But there is another alternative - you can join reads with vsearch plugin and then run Deblur - it uses another algorithm and there should be no biases caused by different runs and all samples can be pooled together.

Best,

Hello, @timanix
Thank you for your response! Your answer to my question is very helpful.
I will try the suggestions you provided.
You said that 1M reads are required for DADA2 analysis. Does this mean that if I have 50 samples with paired-end reads (10,000 reads each), the reads will be sufficient? If this is the case, each two reads is from the same sample, so the number of reads from the unique samples would be 5,000 reads. Or do I need 100 samples with paired-end reads (10,000 reads each)?
Thank you very much.

I would say 100 samples since reverse and forward reads are the same sequence.
But I would run Dada2 with 500K sequences if it is an alternative to running this samples with samples from another run together.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.