Hello everyone,
I am working through the QIIME/microbiome sequence process for the first time and I've gotten hung up on a few questions on my DADA2 results.
I am working with 2x251 paired end sequences of the V4 region, with 515F and 806R primers. The sequencing followed the EMP protocol, so the primers were removed before I received the multiplexed data. After demultiplexing my data I have tested a number of trimming and truncation parameters after reading many of the great discussions on this forum. To my novice eye, my demux graphs a generally pretty good, with the reverse read being better quality than the forward, and there is some variability in quality in the middle of the reads. demux.qzv (325.4 KB)
I saw in other discussions that I should focus on "crashes" in quality rather than smaller dips. With this in mind I tried a DADA2 run with these parameters in qiime2-2021.4...
--p-trunc-len-f 220
--p-trunc-len-r 207
dada2-table-8.qzv (3.2 MB) denoising-stats-8.qzv (1.2 MB)
I also tried a run where I trimmed from the 5' end and truncated more from the 3' end.
--p-trim-left-f 14
--p-trunc-len-f 153
--p-trunc-len-r 222
dada2-table-10.qzv (3.1 MB) denoising-stats-10.qzv (1.2 MB)
In the second run I am retaining more sequences after filtering/merging/chimeric removal but have a lower number of features. I'm not sure why I have this result and which of the DADA2 runs is closer to what is actually in the samples. Does anyone have a sense of which run I should move forward with, and why? And, can you trim too much from your reads and influence the identification/designation of features?
Thanks!!