Hi all,
I am new in QIIME2 and I am working with paired end reads that came direct from Illumina sequencing. So I have three archives that I imported with the command:
After I run denoise with DADA2 the number of sequence count is really low!
qiime dada2 denoise-paired --i-demultiplexed-seqs demux_novo.qza --p-trim-left-f 100 --p-trim-left-r 100 --p-trunc-len-f 260 --p-trunc-len-r 260 --o-representative-sequences rep-seqs.qza --o-table table.qza --o-denoising-stats denoising-stats.qza
I tried trim and trunc with several combination… but nothing gets better than the number I get with trim 100 and trunk 260. The following table.qzv illustrate what is the best I got so far…
Hi @cintia_martins!
How are you choosing your trim and trunc length parameters? There are lots of good discussions of that here on the forum (incl. 1, 2), as well as in the Atacama tutorial, and in the DADA2 docs.
It's common to lose sequences if DADA2 does not have enough nucleotides to successfully join forward and reverse reads. This can be caused by short raw sequences, but is often caused by narrow trim/trunc values.
Your quality scores for both forward and reverse look strong on the trim-left side, and you might not want to remove that data. At the same time, you might be introducing too many low-quality reads by setting trunc to 260 for your reverse reads. It's not necessary to set forward and reverse to the same values. If more tinkering doesn't yield improved results, let us know!
It worked! I changed trunc values for my reverse reads. I don’t know why I was fixing the same values for forward and reverse…
I am testing some different values and I am getting good sequence count with --p-trim-left-f 100 --p-trim-left-r 100 --p-trunc-len-f 260 --p-trunc-len-r 200
@cintia_martins, I'm glad that helped! Have you tried setting your trim-left values lower (possibly even to 0), and keeping the new truncate values that are working for you? I'm curious about whether you couldn't squeeze more sequences out of this set, or at least preserve all of that high-quality sequence data...
I talked with the QIIME2 team, and this is the first time we've seen better sequence counts with the high trim-left arguments you passed. Hopefully, it's all smooth sailing from here, but if you run into issues, you might use FastQC or similar to interrogate the quality of your data.
The unusual quality profile (very high quality which suddenly drops after 65-85 bp) could indicate the presence of non-biological data, PhiX sequences, or some other artifact. If you find any new insights, please let us know what you learn!