DADA2 reduce the mean length of paired end sequences

Hi to everyone, I'm new qiime2 user and I'm having some problems with DADA2.
I have sequencing reads of V3 and V4 regions (460 bp). Those reads were demultiplexed with next comand:
qiime demux emp-paired
--m-barcodes-file sample-metadata.tsv
--m-barcodes-column BarcodeSequence
--p-no-rev-comp-mapping-barcodes
--i-seqs emp-paired-end-sequences.qza
--p-no-golay-error-correction
--o-per-sample-sequences demux.qza
--o-error-correction-details demux-details.qza

Obtaining the following result.

Then I applied dada2:

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 2
--p-trim-left-r 2
--p-trunc-len-f 242
--p-trunc-len-r 250
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

Obtaining the following result.

The problem is in overlaping, because the Mean Length after that process should be near to 460 bp (V3 V4 length) and it's lower.
I don't know what i'm doing wrong, please help me to fix it.

Hi @AdrianMaynez,
Welcome to the forum!
My guess is that the quality of your reads are not great and the truncating parameters you have set is too lenient, causing most of your reads to be discarded during the initial filtering process. What remains has a mean length of 288.
Could you upload the summary visualization of your demultiplexed reads please? (the one with the quality scores).
Could you also post the stats summary of your DADA2 run please?
I should also give you a heads up that with V3V4 region on a 2x250 run, often times there will not be enough good quality reads to merge them and users tend to use the forward reads alone. But that will all depend on the quality scores.

1 Like

Hi @Mehrbod_Estaki, thank you for your answer.
Of course, these are the sumaries.


For now, I'm working just with one sample because I have the same problem when I run all together. When I fix it, I'll be able to do it with full data.

Hi @AdrianMaynez,
Thanks. It would be a lot more helpful to have all of your samples (or at least a few more) together as that will give us a much better idea of the overall quality of your whole run, rather than just this one sample, that may or may not be representative of the run. Also, DADA2 does require a bit more reads than this in its error building process to be reliable, so your results will actually be quite different (even in this one sample) when you run all your samples together.

I think you missed this one. This is formed after running dada2 and should be called something like stats-dada2.qza unless you’ve renamed it.

I’m interested in the visualization of this file, so run:

qiime metadata tabulate \
  --m-input-file stats-dada2.qza \
  --o-visualization stats-dada2.qzv

Thanks

Hi, @Mehrbod_Estaki
These are the quality scores of 30 samples and stats summary of DADA2.


Thank you very much.

Hi @AdrianMaynez,
Actually, your dada2 stats look a lot better than I expected, though less than 50% of input being left is a bit on the lower side of things and certainly has room for improvement. There is a decent number of reads getting denoised and merged, but the biggest loss seems to be coming from chimera removal step which might be a separate issue alltogether.
I am however a little stomped as why your mean length is at 288. Is this the case when you look at all your samples denoised? Maybe it is just this particular sample, or when you run dada2 with only 1 sample?
Two other possibilities come to mind worth checking:

  1. Any chance that the primer you used were actually V4 primers and not V3V4? The 288 length would make more sense for the V4 region.
  2. Blast a few of those unexpectedly short sequences and see what you hit. Maybe there is a lot of host contamination that you might be able to remove with some simple filtering.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.