What is a good amount of reads (specially after DADA2)!

veeraku · August 8, 2019, 7:17am

Dear QIIME community,

I tried to look trough the forum topics and specially DADA2 topics to get the answer to this question. I got a lot of new information but not directly to the question I have. Sorry if this has already been discussed earlier!

I've done DADA2 denoising with my Illumina 2x250 V3-V4 data.

Here is the parameters I used:
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trim-left-f 13
--p-trim-left-r 13
--p-trunc-len-f 250
--p-trunc-len-r 250
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats stats-dada2.qza

And here is the picture of the quality plot I used for DADA2 parameters:

When reading about the topic, I understood that DADA2 needs minimum overlap of 20 bp to succeed with the merging and the length of V3-V4 region (usually around 460 bp) might be too long for proper merging due the natural variation.This can therefore cause the loss of data.

I also read that if the merging is the cause of loosing the data, only the good quality forward reads can be used instead of paired-end reads. Depending on the dataset and the question of interest of course, also the forward read will give good resolution results, right?

Back to the data I have:
Here is the stats of the data after DADA2:

sample-id	input	filtered	denoised	merged	non-chimeric	reads left (%)
AS1	15113	11337	11113	10667	10604	70
AS2	16581	13444	13185	12765	12154	73
AS3	11769	8650	8339	7932	7885	67
AS4	13599	10771	10553	10116	9858	72
AS5	9413	7313	7066	6619	6403	68
AS6	10875	8994	8887	8533	8183	75
AS7	16183	11285	10899	10267	10207	63
AS8	9518	7783	7550	7205	7114	75
AS9	12928	9511	9137	8661	8212	64
AS10	9999	8367	8151	7847	7698	77
AS11	16265	12356	12044	11467	11243	69
AS12	13900	11515	11276	10814	10352	74

I'm wondering do any of you have a "hunch", what is a good amount of reads to be left after DADA2? Is is above 10 000 or can it be less. Do I have too little reads right at the beginning of the analysis since some of the reads are below 10 000? Or do I have to go through the steps to see the sampling depth to get the "hunch". It is quite fast to do of course, but I was wondering is there any threshold about the amount of reads at the beginning of the analysis (so is my amount of reads at the starting point enough?) and after DADA2 (so is the amount of reads I have after DADA2 handling enough?).

Sorry for a long post and quite a naive question! Still trying to figure out the basics on this field!

Wishing you all the best!
Thanks,
Veera

Mehrbod_Estaki · August 8, 2019, 7:31am

Hi @veeraku,
Good question, and thanks for doing your research on the forum before posting. This is a complicated question with no real right answer, I've written my thoughts on the matter some time ago here that may be of interest to you. The short version is, it depends on your samples and the questions you are trying to answer. For most typical samples we see around here (gut, intestine, skin etc.) your sampling depth should be sufficient.

I would say yes the resolution will be good enough, of course not as good as paired-end but the benefits of the latter only reveal themselves depending on what you are asking of your data. The benefit of using only your forward reads is you are most likely able to retain more reads.

veeraku · August 8, 2019, 10:09am

Thank you @Mehrbod_Estaki for your answer!

I was aware that this is a question with no specific answer and is all depending on the various things in the setting of the experiment. You clarified the subject a lot and thank you for the link (so it was indeed discussed already earlier..). I will read that with thought!

Cheers!

system · September 8, 2019, 4:09pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.