Dear QIIME community,
I tried to look trough the forum topics and specially DADA2 topics to get the answer to this question. I got a lot of new information but not directly to the question I have. Sorry if this has already been discussed earlier!
I've done DADA2 denoising with my Illumina 2x250 V3-V4 data.
Here is the parameters I used:
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trim-left-f 13
--p-trim-left-r 13
--p-trunc-len-f 250
--p-trunc-len-r 250
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--o-denoising-stats stats-dada2.qza
And here is the picture of the quality plot I used for DADA2 parameters:
When reading about the topic, I understood that DADA2 needs minimum overlap of 20 bp to succeed with the merging and the length of V3-V4 region (usually around 460 bp) might be too long for proper merging due the natural variation.This can therefore cause the loss of data.
I also read that if the merging is the cause of loosing the data, only the good quality forward reads can be used instead of paired-end reads. Depending on the dataset and the question of interest of course, also the forward read will give good resolution results, right?
Back to the data I have:
Here is the stats of the data after DADA2:
sample-id | input | filtered | denoised | merged | non-chimeric | reads left (%) |
---|---|---|---|---|---|---|
AS1 | 15113 | 11337 | 11113 | 10667 | 10604 | 70 |
AS2 | 16581 | 13444 | 13185 | 12765 | 12154 | 73 |
AS3 | 11769 | 8650 | 8339 | 7932 | 7885 | 67 |
AS4 | 13599 | 10771 | 10553 | 10116 | 9858 | 72 |
AS5 | 9413 | 7313 | 7066 | 6619 | 6403 | 68 |
AS6 | 10875 | 8994 | 8887 | 8533 | 8183 | 75 |
AS7 | 16183 | 11285 | 10899 | 10267 | 10207 | 63 |
AS8 | 9518 | 7783 | 7550 | 7205 | 7114 | 75 |
AS9 | 12928 | 9511 | 9137 | 8661 | 8212 | 64 |
AS10 | 9999 | 8367 | 8151 | 7847 | 7698 | 77 |
AS11 | 16265 | 12356 | 12044 | 11467 | 11243 | 69 |
AS12 | 13900 | 11515 | 11276 | 10814 | 10352 | 74 |
I'm wondering do any of you have a "hunch", what is a good amount of reads to be left after DADA2? Is is above 10 000 or can it be less. Do I have too little reads right at the beginning of the analysis since some of the reads are below 10 000? Or do I have to go through the steps to see the sampling depth to get the "hunch". It is quite fast to do of course, but I was wondering is there any threshold about the amount of reads at the beginning of the analysis (so is my amount of reads at the starting point enough?) and after DADA2 (so is the amount of reads I have after DADA2 handling enough?).
Sorry for a long post and quite a naive question! Still trying to figure out the basics on this field!
Wishing you all the best!
Thanks,
Veera