Problem with dada2 denoise-paired command

Hi,

I'm starting using QIIME to perform taxonomic classification.

I succeed to import reads with the following command:

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path data
--output-path paired-end.qza
--source-format CasavaOneEightSingleLanePerSampleDirFmt

Where data directory contains two gziped paired fastq files.

Then, I want to run the following command:

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end.qza
--p-trunc-len-f 125
--p-trunc-len-r 125
--o-table table
--output-dir output

But I get an error and you can find the messages printed by QIIME is the following attached file.

denoise-paired.txt (3.2 KB)

Did I do something wrong ?

Best,
Thibaut

Hi @thbtmntgn,
Did you see this note in the error message:

  No reads passed the filter. trunc_len_f (125) or trunc_len_r (125) may
  be longer than read lengths, or other arguments (such as max_ee or
  trunc_q) may be preventing reads from passing the filter.

That would be the best place to start with this (i.e., compare your read lengths against the truncation lengths that you provided).

It looks like the import step worked correctly.

1 Like

Hi @gregcaporaso,

Yes I saw it and I forgot to add that my reads are 150bp long.

I succeed to run the following command:

qiime demux summarize --i-data paired-end.qza --o-visualization demux.qzv

With the demux.qzv file we can see that read size is well 150 bp.

Please find bellow the demux.qzv file.
demux.qzv (259.8 KB)

Best,
Thibaut

Hi @thbtmntgn! From the quality scores plot you posted (demux.qzv), it looks like you have a single sample containing only 100 reads. That’s a really small number of sequences – is this a test/synthetic dataset that you put together?

My guess is that the default parameters used with DADA2 may be too stringent for your extremely small dataset, since the tool models error profiles found in current sequencing technologies (e.g. MiSeq runs). The error message suggests modifying --p-max-ee and/or --p-trunc-q; that may be something you could try.

Hi @jairideout, that's rigth I have a single sample containing 100 reads.
I wanted to test QIIME2 functionalities quickly wiht a little dataset.

I tried to modifyng --p-max-ee and/or --p-trunc-q but I get the same error.
In printed messages, the first error message corresponds to DADA2:

Error: No reads passed the filter (were truncLenF/R longer than the read lengths?)

I also tried to modify --p-trunc-len-f and --p-trunc-len-r option with a value of 50 for example but I get the same error message again.

Is it an error from myself or could it be a bug ?

My idea is to performed taxonomic classification. I have to choose between classify-consensus-blast, classify-consensus-vsearch and classify-sklearn after using denoise-paired right ?

Thank you for your answer and your time!

Best,
Thibaut

Hi Thibaut,
Are there Ns in your test dataset? DADA2 requires no Ns, so if every read has an N you will lose all your reads. If that’s not it, could you share this small test fastq file?

1 Like

Hi, thank you! You found what my problem is!

All the forward reads have a N at the first position and 37/100 reverse reads have at least a N.

Is there another way to create input files needed by classification QIIME command like classify-consensus-blast, classify-consensus-vsearch and classify-sklearn ?

Thank you again!
Thib

Hi Thibaut,
You should be able to solve the N issue on the forward reads by trimming off those bases. Could you try the p-trim-left-f parameter, with a value of perhaps 3, to get rid of that first problematic base? Hopefully then you will get at least the 63/100 with no N in the reverse read through.

Another option is to import a biom tables or sequence data directly: https://docs.qiime2.org/2017.4/tutorials/importing/#feature-table-data

2 Likes

Is there another way to create input files needed by classification QIIME command like classify-consensus-blast, classify-consensus-vsearch and classify-sklearn ?

If you're just interested in testing out taxonomy assignment with your sequences, you can import them into a FeatureData[Sequence] artifact and perform taxonomy assignment as detailed in this tutorial.

Thank you for your time and your answer @benjjneb and @jairideout !

I will try that !

Best,
Thibaut

2 Likes

Hi again @benjjneb,

I tried the following command:

qiime dada2 denoise-paired --i-demultiplexed-seqs paired-end.qza --p-trim-left-f 10 --p-trim-left-r 10 --p-trunc-len-f 100 --p-trunc-len-r 100 --o-table table --output-dir output --verbose

But unfortunately I get the same error message.

I also took a look at your link but I have FASTQ files, not biom or fasta format.

Here are my read files (I add .txt extension to be able to upload them here). They contain 100 reads each:
reads1.fastq.gz.txt (11.8 KB)
reads2.fastq.gz.txt (11.5 KB)

Thibaut

Hi Thibaut,
For now I suggest converting your FASTQ to a FASTA and importing that way, if you just want to classify the sequences.

There is a small bug in the 1.2 version of the DADA2 package the plugin is using (minQ is enforced before trimming, rather than after) that is preventing p-trim-left-f from working as expected. So you would need to trim off that bad staring base position with another bit of software to get it to work with the current QIIME2 plugin. When the plugin upgrades to the 1.4 version of DADA2 the p-trim-left-f approach should work.

1 Like

Hi @benjjneb,

Thank you again for your answers and your time!

I’ll wait for future update!

Thibaut

I've captured a bug-report here, this should get fixed in 2017.5 at the latest.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.