Fail in running dada2 denoise-paired on demultiplexed pair-end data

dimitely · March 14, 2018, 2:54pm

Hello~
I am having a troble of running dada2 denoise-paired on demultiplexed pair-end data.
(qiime2-2018.2) qiime2@qiime2core2018-2:~/Desktop$ qiime dada2 denoise-paired \

--i-demultiplexed-seqs OceanHK.qza
--p-trunc-len-f 240
--p-trunc-len-r 240
--o-table table001-dada2.qza
--o-representative-sequences rep-seqs001-dada2.qza
Usage: qiime dada2 denoise-paired [OPTIONS]

Error: Got unexpected extra argument (OceanHK.qza)

As I need to use the rep-seqs.qza data for the Taxonomic analysis(such as qiime taxa barplot). How can I solve the problem, or is that any other way to generate the representative-sequences (qza)?

Thank you so much for your help!
Best,
Fangzhou CHEN

ebolyen · March 14, 2018, 9:57pm

Hi @dimitely!

I’m not certain, but judging by the way the forum formatted your pasted text, I think you’ve got some extra > characters and some en-dashes in your command. Does retyping (not pasting) the command into the terminal work?

dimitely · March 15, 2018, 12:54pm

Dear Bolyen,

Thank you so much! I think there is something wrong with my importing data last time.
Now after importing my data with the “Fastq manifest” formats method the dada2 process have not announced any error yet. But it really takes sooooo long time for the dada2 process.

This is the current status.
(qiime2-2018.2) qiime2@qiime2core2018-2:~/Desktop$ qiime dada2 denoise-paired \

--i-demultiplexed-seqs demux.qza
--o-table table
--o-representative-sequences rep-seqs
--p-trim-left-f 6
--p-trim-left-r 6
--p-trunc-len-f 300
--p-trunc-len-r 290
--verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmp8lrd034a/forward /tmp/tmp8lrd034a/reverse /tmp/tmp8lrd034a/output.tsv.biom /tmp/tmp8lrd034a/filt_f /tmp/tmp8lrd034a/filt_r 300 290 6 6 2.0 2 consensus 1.0 1 1000000

R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0

Filtering ....
Learning Error Rates
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 65709 reads in 34016 unique sequences.
Sample 2 - 41315 reads in 21297 unique sequences.
Sample 3 - 65467 reads in 34924 unique sequences.
Sample 4 - 49553 reads in 25387 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4

selfConsist step 5

I just using a very small portion of my data to try the test run.
Demultiplexed sequence counts summary
Minimum: 78268
Median: 119213.0
Mean: 110910.0
Maximum: 126946
Total: 443640

Don't know hong long it will take.

Best,
dimitely

ebolyen · March 15, 2018, 9:06pm

Awesome! From your output, I think everything is fine

I don't think that many reads should take an excessively long time (for context, DADA2 will by default try to use 1 million reads to estimate it's error model). However, depending on what kind of resources you have, it can take a while anyways. You can also set --p-n-jobs to parallelize that step.

dimitely · March 26, 2018, 2:57pm

Dear Bolyen,

I have successfully gone through a test run using 4 of my own samples. However, when I tried to increase my sample number to 40 samples the dada2 process only went through 16 samples. I am not sure if there any problem with my manifest file. I showed below here.
sample-id,absolute-filepath,direction
sample1,$PWD/pe-64/OceanHK-001_1.fastq.gz,forward
sample1,$PWD/pe-64/OceanHK-001_2.fastq.gz,reverse
sample2,$PWD/pe-64/OceanHK-002_1.fastq.gz,forward
sample2,$PWD/pe-64/OceanHK-002_2.fastq.gz,reverse
sample3,$PWD/pe-64/OceanHK-003_1.fastq.gz,forward
sample3,$PWD/pe-64/OceanHK-003_2.fastq.gz,reverse
sample4,$PWD/pe-64/OceanHK-004_1.fastq.gz,forward
sample4,$PWD/pe-64/OceanHK-004_2.fastq.gz,reverse
sample5,$PWD/pe-64/OceanHK-005_1.fastq.gz,forward
sample5,$PWD/pe-64/OceanHK-005_2.fastq.gz,reverse
sample6,$PWD/pe-64/OceanHK-006_1.fastq.gz,forward
sample6,$PWD/pe-64/OceanHK-006_2.fastq.gz,reverse
sample7,$PWD/pe-64/OceanHK-007_1.fastq.gz,forward
sample7,$PWD/pe-64/OceanHK-007_2.fastq.gz,reverse
sample8,$PWD/pe-64/OceanHK-008_1.fastq.gz,forward
sample8,$PWD/pe-64/OceanHK-008_2.fastq.gz,reverse
sample11,$PWD/pe-64/OceanHK-011_1.fastq.gz,forward
sample11,$PWD/pe-64/OceanHK-011_2.fastq.gz,reverse
sample12,$PWD/pe-64/OceanHK-012_1.fastq.gz,forward
sample12,$PWD/pe-64/OceanHK-012_2.fastq.gz,reverse
sample13,$PWD/pe-64/OceanHK-013_1.fastq.gz,forward
sample13,$PWD/pe-64/OceanHK-013_2.fastq.gz,reverse
sample14,$PWD/pe-64/OceanHK-014_1.fastq.gz,forward
sample14,$PWD/pe-64/OceanHK-014_2.fastq.gz,reverse
sample15,$PWD/pe-64/OceanHK-015_1.fastq.gz,forward
sample15,$PWD/pe-64/OceanHK-015_2.fastq.gz,reverse
sample16,$PWD/pe-64/OceanHK-016_1.fastq.gz,forward
sample16,$PWD/pe-64/OceanHK-016_2.fastq.gz,reverse
sample17,$PWD/pe-64/OceanHK-017_1.fastq.gz,forward
sample17,$PWD/pe-64/OceanHK-017_2.fastq.gz,reverse
sample18,$PWD/pe-64/OceanHK-018_1.fastq.gz,forward
sample18,$PWD/pe-64/OceanHK-018_2.fastq.gz,reverse
sample19,$PWD/pe-64/OceanHK-019_1.fastq.gz,forward
sample19,$PWD/pe-64/OceanHK-019_2.fastq.gz,reverse
sample20,$PWD/pe-64/OceanHK-020_1.fastq.gz,forward
sample20,$PWD/pe-64/OceanHK-020_2.fastq.gz,reverse
sample21,$PWD/pe-64/OceanHK-021_1.fastq.gz,forward
sample21,$PWD/pe-64/OceanHK-021_2.fastq.gz,reverse
sample22,$PWD/pe-64/OceanHK-022_1.fastq.gz,forward
sample22,$PWD/pe-64/OceanHK-022_2.fastq.gz,reverse
sample23,$PWD/pe-64/OceanHK-023_1.fastq.gz,forward
sample23,$PWD/pe-64/OceanHK-023_2.fastq.gz,reverse
sample24,$PWD/pe-64/OceanHK-024_1.fastq.gz,forward
sample24,$PWD/pe-64/OceanHK-024_2.fastq.gz,reverse
sample25,$PWD/pe-64/OceanHK-025_1.fastq.gz,forward
sample25,$PWD/pe-64/OceanHK-025_2.fastq.gz,reverse
sample26,$PWD/pe-64/OceanHK-026_1.fastq.gz,forward
sample26,$PWD/pe-64/OceanHK-026_2.fastq.gz,reverse
sample27,$PWD/pe-64/OceanHK-027_1.fastq.gz,forward
sample27,$PWD/pe-64/OceanHK-027_2.fastq.gz,reverse
sample28,$PWD/pe-64/OceanHK-028_1.fastq.gz,forward
sample28,$PWD/pe-64/OceanHK-028_2.fastq.gz,reverse
sample29,$PWD/pe-64/OceanHK-029_1.fastq.gz,forward
sample29,$PWD/pe-64/OceanHK-029_2.fastq.gz,reverse
sample30,$PWD/pe-64/OceanHK-030_1.fastq.gz,forward
sample30,$PWD/pe-64/OceanHK-030_2.fastq.gz,reverse
sample31,$PWD/pe-64/OceanHK-031_1.fastq.gz,forward
sample31,$PWD/pe-64/OceanHK-031_2.fastq.gz,reverse
sample32,$PWD/pe-64/OceanHK-032_1.fastq.gz,forward
sample32,$PWD/pe-64/OceanHK-032_2.fastq.gz,reverse
sample33,$PWD/pe-64/OceanHK-033_1.fastq.gz,forward
sample33,$PWD/pe-64/OceanHK-033_2.fastq.gz,reverse
sample34,$PWD/pe-64/OceanHK-034_1.fastq.gz,forward
sample34,$PWD/pe-64/OceanHK-034_2.fastq.gz,reverse
sample35,$PWD/pe-64/OceanHK-035_1.fastq.gz,forward
sample35,$PWD/pe-64/OceanHK-035_2.fastq.gz,reverse
sample36,$PWD/pe-64/OceanHK-036_1.fastq.gz,forward
sample36,$PWD/pe-64/OceanHK-036_2.fastq.gz,reverse
sample37,$PWD/pe-64/OceanHK-037_1.fastq.gz,forward
sample37,$PWD/pe-64/OceanHK-037_2.fastq.gz,reverse
sample38,$PWD/pe-64/OceanHK-038_1.fastq.gz,forward
sample38,$PWD/pe-64/OceanHK-038_2.fastq.gz,reverse
sample39,$PWD/pe-64/OceanHK-039_1.fastq.gz,forward
sample39,$PWD/pe-64/OceanHK-039_2.fastq.gz,reverse
sample40,$PWD/pe-64/OceanHK-040_1.fastq.gz,forward
sample40,$PWD/pe-64/OceanHK-040_2.fastq.gz,reverse
sample41,$PWD/pe-64/OceanHK-041_1.fastq.gz,forward
sample41,$PWD/pe-64/OceanHK-041_2.fastq.gz,reverse
sample42,$PWD/pe-64/OceanHK-042_1.fastq.gz,forward
sample42,$PWD/pe-64/OceanHK-042_2.fastq.gz,reverse

And below is my QIIME2 records:
(qiime2-2018.2) [biostack@LFE0616 primary_seq]$ qiime dada2 denoise-paired \

--i-demultiplexed-seqs demux.qza
--o-table table
--o-representative-sequences rep-seqs
--p-trim-left-f 6
--p-trim-left-r 6
--p-trunc-len-f 300
--p-trunc-len-r 290
--verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmpqrtunug6/forward /tmp/tmpqrtunug6/reverse /tmp/tmpqrtunug6/output.tsv.biom /tmp/tmpqrtunug6/filt_f /tmp/tmpqrtunug6/filt_r 300 290 6 6 2.0 2 consensus 1.0 1 1000000

R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0

Filtering .......................................
Learning Error Rates
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 65709 reads in 34016 unique sequences.
Sample 2 - 72125 reads in 35217 unique sequences.
Sample 3 - 89423 reads in 44061 unique sequences.
Sample 4 - 67189 reads in 32925 unique sequences.
Sample 5 - 42573 reads in 17047 unique sequences.
Sample 6 - 66611 reads in 28142 unique sequences.
Sample 7 - 56166 reads in 23536 unique sequences.
Sample 8 - 61704 reads in 21389 unique sequences.
Sample 9 - 46366 reads in 18225 unique sequences.
Sample 10 - 41315 reads in 21297 unique sequences.
Sample 11 - 62402 reads in 25605 unique sequences.
Sample 12 - 73144 reads in 36470 unique sequences.
Sample 13 - 54216 reads in 27211 unique sequences.
Sample 14 - 85143 reads in 40677 unique sequences.
Sample 15 - 78469 reads in 35961 unique sequences.
Sample 16 - 75634 reads in 37673 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
selfConsist step 6
selfConsist step 7
selfConsist step 8
selfConsist step 9
selfConsist step 10
Self-consistency loop terminated before convergence.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 65709 reads in 37533 unique sequences.
Sample 2 - 72125 reads in 35740 unique sequences.
Sample 3 - 89423 reads in 46467 unique sequences.
Sample 4 - 67189 reads in 35271 unique sequences.
Sample 5 - 42573 reads in 20773 unique sequences.
Sample 6 - 66611 reads in 32522 unique sequences.
Sample 7 - 56166 reads in 26872 unique sequences.
Sample 8 - 61704 reads in 27689 unique sequences.
Sample 9 - 46366 reads in 20816 unique sequences.
Sample 10 - 41315 reads in 23179 unique sequences.
Sample 11 - 62402 reads in 29402 unique sequences.
Sample 12 - 73144 reads in 39768 unique sequences.
Sample 13 - 54216 reads in 29424 unique sequences.
Sample 14 - 85143 reads in 47451 unique sequences.
Sample 15 - 78469 reads in 41059 unique sequences.
Sample 16 - 75634 reads in 42805 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
selfConsist step 6
selfConsist step 7
selfConsist step 8
selfConsist step 9
selfConsist step 10
Self-consistency loop terminated before convergence.
Denoise remaining samples ........

Thank you so much for your help!
Best,
Fangzhou

ebolyen · March 26, 2018, 8:58pm

Hi @dimitely,

The output can be a little deceptive, the step that stops after 16 samples is the error training step. The fact that it stops without reaching the rest of your samples just means DADA2 found 1 million reads within the first 16 samples. All of your samples are processed, but the shared error-model doesn't usually improve by adding more reads, so by default it stops after one million.

However.

Because we do see this message, the model hasn't converged, which means more data might help (it also might not). So it may not be a bad idea to try increasing the number of reads used. You can use this option to increase that limit:

  --p-n-reads-learn INTEGER       The number of reads to use when training the
                                  error model. Smaller numbers will result in
                                  a shorter run time but a less reliable error
                                  model.  [default: 1000000]

Hope that makes sense!

system · April 27, 2018, 2:58am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.