What is a good amount of reads (specially after DADA2)!

Dear QIIME community,

I tried to look trough the forum topics and specially DADA2 topics to get the answer to this question. I got a lot of new information but not directly to the question I have. Sorry if this has already been discussed earlier!

I’ve done DADA2 denoising with my Illumina 2x250 V3-V4 data.

Here is the parameters I used:
qiime dada2 denoise-paired
–i-demultiplexed-seqs demux-paired-end.qza
–p-trim-left-f 13
–p-trim-left-r 13
–p-trunc-len-f 250
–p-trunc-len-r 250
–o-representative-sequences rep-seqs.qza
–o-table table.qza
–o-denoising-stats stats-dada2.qza

And here is the picture of the quality plot I used for DADA2 parameters:

When reading about the topic, I understood that DADA2 needs minimum overlap of 20 bp to succeed with the merging and the length of V3-V4 region (usually around 460 bp) might be too long for proper merging due the natural variation.This can therefore cause the loss of data.

I also read that if the merging is the cause of loosing the data, only the good quality forward reads can be used instead of paired-end reads. Depending on the dataset and the question of interest of course, also the forward read will give good resolution results, right?

Back to the data I have:
Here is the stats of the data after DADA2:

sample-id input filtered denoised merged non-chimeric reads left (%)
AS1 15113 11337 11113 10667 10604 70
AS2 16581 13444 13185 12765 12154 73
AS3 11769 8650 8339 7932 7885 67
AS4 13599 10771 10553 10116 9858 72
AS5 9413 7313 7066 6619 6403 68
AS6 10875 8994 8887 8533 8183 75
AS7 16183 11285 10899 10267 10207 63
AS8 9518 7783 7550 7205 7114 75
AS9 12928 9511 9137 8661 8212 64
AS10 9999 8367 8151 7847 7698 77
AS11 16265 12356 12044 11467 11243 69
AS12 13900 11515 11276 10814 10352 74

I’m wondering do any of you have a “hunch”, what is a good amount of reads to be left after DADA2? Is is above 10 000 or can it be less. Do I have too little reads right at the beginning of the analysis since some of the reads are below 10 000? Or do I have to go through the steps to see the sampling depth to get the “hunch”. It is quite fast to do of course, but I was wondering is there any threshold about the amount of reads at the beginning of the analysis (so is my amount of reads at the starting point enough?) and after DADA2 (so is the amount of reads I have after DADA2 handling enough?).

Sorry for a long post and quite a naive question! Still trying to figure out the basics on this field!

Wishing you all the best!

Hi @veeraku,
Good question, and thanks for doing your research on the forum before posting. This is a complicated question with no real right answer, I’ve written my thoughts on the matter some time ago here that may be of interest to you. The short version is, it depends on your samples and the questions you are trying to answer. For most typical samples we see around here (gut, intestine, skin etc.) your sampling depth should be sufficient.

I would say yes the resolution will be good enough, of course not as good as paired-end but the benefits of the latter only reveal themselves depending on what you are asking of your data. The benefit of using only your forward reads is you are most likely able to retain more reads.


Thank you @Mehrbod_Estaki for your answer!

I was aware that this is a question with no specific answer and is all depending on the various things in the setting of the experiment. You clarified the subject a lot and thank you for the link (so it was indeed discussed already earlier…). I will read that with thought!