Low sequence count after DADA2

cintia_martins · December 26, 2018, 4:01pm

Hi all,
I am new in QIIME2 and I am working with paired end reads that came direct from Illumina sequencing. So I have three archives that I imported with the command:

qiime tools import --type EMPPairedEndSequences --input-path paired-end-sequences --output-path paired-end-sequences.qza

And then I demultiplexed with:

qiime demux emp-paired --m-barcodes-file cintiamapping4.tsv --m-barcodes-column BarcodeSequence --i-seqs paired-end-sequences.qza --o-per-sample-sequences demux_novo.qza --p-rev-comp-barcodes

When I look my demux_novo.qzv data I find high sequence count on my sample (illustrated bellow) and the following quality plot:

After I run denoise with DADA2 the number of sequence count is really low!
qiime dada2 denoise-paired --i-demultiplexed-seqs demux_novo.qza --p-trim-left-f 100 --p-trim-left-r 100 --p-trunc-len-f 260 --p-trunc-len-r 260 --o-representative-sequences rep-seqs.qza --o-table table.qza --o-denoising-stats denoising-stats.qza

I tried trim and trunc with several combination… but nothing gets better than the number I get with trim 100 and trunk 260. The following table.qzv illustrate what is the best I got so far…

table.qzv (320.4 KB)

I also tried use Deblur but I also get a low sequence count… actually I get no sequence.

Should I run using alternative methods? I was starting using this tutorial (Clustering sequences into OTUs using q2-vsearch — QIIME 2 2018.8.0 documentation) with my seqs.fna from QIIME1 and filtering with Identifying and filtering chimeric feature sequences with q2-vsearch — QIIME 2 2018.8.0 documentation

What should I do? Any help, tips, suggestions would be very appreciated!

Happy Holidays to all!
And many thanks in advance!
Cintia

ChrisKeefe · December 26, 2018, 7:04pm

Hi @cintia_martins!
How are you choosing your trim and trunc length parameters? There are lots of good discussions of that here on the forum (incl. 1, 2), as well as in the Atacama tutorial, and in the DADA2 docs.

It's common to lose sequences if DADA2 does not have enough nucleotides to successfully join forward and reverse reads. This can be caused by short raw sequences, but is often caused by narrow trim/trunc values.

Your quality scores for both forward and reverse look strong on the trim-left side, and you might not want to remove that data. At the same time, you might be introducing too many low-quality reads by setting trunc to 260 for your reverse reads. It's not necessary to set forward and reverse to the same values. If more tinkering doesn't yield improved results, let us know!

Chris

cintia_martins · December 28, 2018, 10:31pm

Hi @ChrisKeefe!

It worked! I changed trunc values for my reverse reads. I don’t know why I was fixing the same values for forward and reverse…

I am testing some different values and I am getting good sequence count with --p-trim-left-f 100 --p-trim-left-r 100 --p-trunc-len-f 260 --p-trunc-len-r 200

Thank you very much!

Cintia

ChrisKeefe · January 2, 2019, 11:55pm

@cintia_martins, I'm glad that helped! Have you tried setting your trim-left values lower (possibly even to 0), and keeping the new truncate values that are working for you? I'm curious about whether you couldn't squeeze more sequences out of this set, or at least preserve all of that high-quality sequence data...

thermokarst · January 3, 2019, 4:47pm

An off-topic reply has been split into a new topic: Looking for advice

Please keep replies on-topic in the future.

cintia_martins · January 3, 2019, 9:11pm

Hi ChrisKeefe!

With lower setting I don't get much improve. With --p-trim-left-f 25 --p-trim-left-r 25 --p-trunc-len-f 260 --p-trunc-len-r 180 I got the following:

And with --p-trim-left-f 100 --p-trim-left-r 100 --p-trunc-len-f 260 --p-trunc-len-r 180 I got the following:

ChrisKeefe · January 8, 2019, 12:20am

Thanks again for sharing the details, @cintia_martins!

I talked with the QIIME2 team, and this is the first time we've seen better sequence counts with the high trim-left arguments you passed. Hopefully, it's all smooth sailing from here, but if you run into issues, you might use FastQC or similar to interrogate the quality of your data.

The unusual quality profile (very high quality which suddenly drops after 65-85 bp) could indicate the presence of non-biological data, PhiX sequences, or some other artifact. If you find any new insights, please let us know what you learn!

Best of luck,
Chris

system · February 8, 2019, 6:20am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.