I tried running the same data using a different tool and it does not end up removing the same amount of data as DADA2. I used cutadapt to remove primers prior to running DADA2, using the command:-
qiime cutadapt trim-single --i-demultiplexed-sequences demux_seqs_original.qza --p-cores 8 --p-minimum-length 350 --o-trimmed-sequences demux_seqs.qza --p-adapter CCTACGGGAGGCAGCAG...ATTAGAWACCCBDGTAGTCC --p-discard-untrimmed
The DADA2 command I use is:-
qiime dada2 denoise-single
I have a few questions:-
Is it possible to know how specifically denoising is removing so many sequence reads from the samples? I dont see quality as a big issue as I am already removing low-quality data and primer sequences from the dataset.
Is it possible that all of the removed data is basically chimeras and they are just reported like this? I say this because within the denoising report, I don't see many chimeras being detected within the samples.
Yes! In newer version of Qiime2 and the q2-dada2 plugin, more detailed statistics are reported about this process. When you upgrade from 2021.11.0 to version 2022.11, as an example, the dada2 output will include percent reads removed due to joining and chimera filtering as separate columns, which should answer you question.
That is possible!
I'm guessing something is wrong with read joining, but you will have to update and rerun DADA2 to find out!
Thank you for responding. I took your advice and updated my qiime to version 2022.11. It ran successfully and generated a log file. But I am not seeing any extra information like you mentioned. It shows the same columns it showed with the previous version. Is there a different place I got to look for the detailed statistics? Thank you for your help. denoising_stats_v2.qzv (1.2 MB) DADA2_denoising_v2.txt (1.7 KB)