Hello there! I am denoising paired end data as per the "Atacama soil microbiome tutorial" for another dataset. I seem to have pretty decent paired-end sequence counts:
pe-demux.qzv (299.8 KB)
I carried out dada2 with the following lines and my artifact:
qiime dada2 denoise-paired
But after Dada2, I have an extremely low count:
qiime feature-table summarize
pe-table.qzv (449.3 KB)
And when I check the denoising stats, I see that very few of my samples are merged (which I'm not sure if it's the reason):
qiime metadata tabulate
pe-stats-dada2.qzv (1.2 MB)
I have tried using Deblur (same trim length) for the forward reads of my dataset and did not experience so much lost in feature counts. Why is this lost occurring with Dada2? Thank you in advance!
Aslo, you may have left some adapters/primer sequences on the 5' NTs. You may want to run cut adapt for any sequences left over from your primers. Alternatively, you could trim maybe the first 40 NT from the forward and reverse and then trunc at 180.
qiime dada2 denoise-paired
Your trunc lenghts are too low - there needs to be enough overlap between the forward and review reads. You have good coverage throughout, which 16S V region are these? Ben
Hi @ben. Thanks for your answer.
I am using 16S V4 region, with amplicons 251f-251r. I have a question - how does trim 40 and trunc 180 allow enough overlap between the forward and reverse reads?
I am new to this and have tried looking at a few discussions in the forum. I do not understand the concept of overlapping or how the calculation for it works. I would be grateful for any clarification regarding this.
Hi, the typical length of the 16S rRNA V4 region is about 250 bp. We usually sequence the amplicon on Illumina platforms, such as Miseq, by pair-ended sequencing. After sequencing, we merge the forward and reverse reads. Check this thread for a graphical illustation.
You should have an overlapping region of at least 20 bp when merging your forward and reverse reads. In practice, one should try to maximize the length of the overlapping region as it reduces sequencing error rates in your merged sequences. For the calculation, the (trunc-len-f - trim-left-f) + (trunc-len-r - trim-left-r) - expected amplicon length should give you the overlapping length. For the numbers that Ben suggested, here’s the calculation: 180-40 + 180-40 - 250 = 30 bp.
Your forward and reverse reads are of high quality. For the truncate lengths, I suggest to use 240 bp for both forward and reverse reads (–p-trunc-len-f 240, –p-trunc-len-r 240). The first 35 bp of your reads look peculiar to me. It’s longer than the typical lengths of universal primer sets targetting the V4 region. You can inspect your reads using FASTQC to find out whether the adapter/primer sequences have been trimmed. After that, just use the –p-trim-left-f and –p-trim-left-r to trim off your adapter/primer sequence.
Hope that helps.
Yes, this is the answer, I am sorry for the delay. Good luck, would like to see your results when you’re finished. I opted for a shorter trunc length than what @yanxianl suggested as this will remove poor tails on your forward and reverse and offer just enough overlap! Either way, you should experiment to see what you get. Ben
I understand things a lot better now and have trimmed and trunc-ed with 40 and 180 respectively:
The feature count is looking great. Thanks again for all the help, @ben and @yanxianl !
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.