How to interpret demux-paired-end graph

dperez · January 21, 2020, 2:33pm

Hi QIIME2 community!

I am working with Illumina's reads. My demux-paired-end graph looks like this:

I am working with amplicons from V3-V4 region, so reading the "Atacama soil microbiome" it says that "we need the reads to be long enough to overlap when joining paired ends (...) no trimming is being applied to the ends of the sequences to avoid reducing the read length by too much". Taking this advice into account i did not trim the end of sequences with the following command:

*qiime dada2 denoise-paired *

--i-demultiplexed-seqs demux-paired-end.qza *
--p-trim-left-f 16 *
--p-trim-left-r 24 *
--p-trunc-len-f 300 *
--p-trunc-len-r 300 *
--p-chimera-method consensus *
--o-table table.qza *
--o-representative-sequences rep-seqs.qza *
--o-denoising-stats denoising-stats.qza &*

After that, when I checked the denoising-stats file I saw:

As you can see I lose a big number of reads and this fact concerns me because I could be losing reads that would help with bacteria identification.

I tried to trim according to the quality, which means when the quality drops, to see if there is a significant difference with the following command:

*qiime dada2 denoise-paired *

--i-demultiplexed-seqs demux-paired-end.qza *
--p-trim-left-f 16 *
--p-trim-left-r 24 *
--p-trunc-len-f 300 *
--p-trunc-len-r 230 *
--o-table table.qza *
--o-representative-sequences rep-seqs.qza *
--o-denoising-stats denoising-stats.qza &*

The denoising stats file is:

In this case, the amount of reads that I lose is less than when I didn't do the trimming applied to the end of sequences.

In one command I wrote the p-chimera option but I don't think that this could be a problem because the parameter I wrote is the default. Which approach do I have to follow to trim the sequences and why?

Thank you very much for your help!

Dani

Mehrbod_Estaki · January 22, 2020, 12:16am

Hi @dperez,
The question you have is a good one, and in fact a very common one on the forum. I would advise searching your question on the forum and reading through some of the answers there fore more details.
In short, when you don't truncate your reads at all, you lose many reads in the initial filtering step because there are too many poor quality bps in the 3' tail of these reads. When you start getting rid of those bad quality tails (such as in your 2nd attempt) you start to allow more reads to pass the filtering step and get denoised. This is why your second run yielded much better results. You could probably increase this even more by truncating a bit from your forward reads as well.
In general, with paired-end data, to maximize the # number of reads from dada2, you want to truncate (and trim) the quality tails as much as you can while leaving enough overlap for merging. Not truncating/trimming is never a good option in my experience with PE data.
Hope this clarifies it a bit.

system · February 22, 2020, 6:16am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.