Erro excuting DADA2: Mismatched forward and reverse sequence

Greetings to all,

I am trying to analyze with Qiime2 a set of 30 samples (paired-ended) and I obtain an error when I try to execute Dada2. I have searched about this issue in the forum and I have already applied the recomendations described. Below I explain the steps that I followed and the error that I get:

conda activate qiime2-2020.8

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path manifest
--output-path paired-end-demux.qza
--input-format PairedEndFastqManifestPhred33V2

I successfully imported my data into a qza with my manifest file:
manifest.txt (4.3 KB)

qiime demux summarize
--i-data paired-end-demux.qza
--o-visualization demux.qzv
demux.qzv (312.7 KB)

I ran DADA2:

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trim-left-f 10
--p-trim-left-r 10
--p-trunc-len-f 300
--p-trunc-len-r 300
--o-representative-sequences rep-seqs-dada2.qza
--o-table pet-table.qza
--p-n-threads 1
--o-denoising-stats denoising-stats.qza

I obtained:

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmpgoh5nhbn/forward /tmp/tmpgoh5nhbn/reverse /tmp/tmpgoh5nhbn/output.tsv.biom /tmp/tmpgoh5nhbn/track.tsv /tmp/tmpgoh5nhbn/filt_f /tmp/tmpgoh5nhbn/filt_r 300 300 10 10 2.0 2.0 2 independent consensus 1.0 1 1000000

R version 3.5.1 (2018-07-02)
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: / RcppParallel: 5.0.0

  1. Filtering Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, :
    Mismatched forward and reverse sequence files: 100000, 56391.
    Execution halted
    Traceback (most recent call last):
    File "/home/noe/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2_dada2/", line 264, in denoise_paired
    File "/home/noe/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2_dada2/", line 36, in run_commands, check=True)
    File "/home/noe/miniconda3/envs/qiime2-2020.8/lib/python3.6/", line 438, in run
    output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmpgoh5nhbn/forward', '/tmp/tmpgoh5nhbn/reverse', '/tmp/tmpgoh5nhbn/output.tsv.biom', '/tmp/tmpgoh5nhbn/track.tsv', '/tmp/tmpgoh5nhbn/filt_f', '/tmp/tmpgoh5nhbn/filt_r', '300', '300', '10', '10', '2.0', '2.0', '2', 'independent', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/noe/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2cli/", line 329, in call
results = action(**arguments)
File "", line 2, in denoise_paired
File "/home/noe/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/", line 245, in bound_callable
output_types, provenance)
File "/home/noe/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/home/noe/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2_dada2/", line 279, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.
wing output

Seeing the mismatch error I checked the forum and took the following actions:

  1. Validated my .qza

qiime tools validate paired-end-demux.qza

paired-end-demux.qza appears to be valid at level=max.

  1. Checked all of my paths - they were fine.

  2. Checked all my sequence files for mismatched entries per fastq with the command:

for f in *.fastq; do r=$(wc -l < $f | tr -d ‘[:space:]’); echo $r $f; done
4005172 concatenated_ET1_20C_48_Copro_1_1.fastq
4005172 concatenated_ET1_20C_48_Copro_1_2.fastq
4559244 concatenated_ET1_4C_120_copro_1_1.fastq
4559244 concatenated_ET1_4C_120_copro_1_2.fastq
4106648 concatenated_ET1_4C_120_Gut_1_1.fastq
4106648 concatenated_ET1_4C_120_Gut_1_2.fastq
4446828 concatenated_ET1_4C_48_Copro_1_1.fastq
4446828 concatenated_ET1_4C_48_Copro_1_2.fastq
3769360 concatenated_ET2_20C_48_Copro_1_1.fastq
3769360 concatenated_ET2_20C_48_Copro_1_2.fastq
4257512 concatenated_ET2_RT_120_Gut_1_1.fastq
4257512 concatenated_ET2_RT_120_Gut_1_2.fastq
3814544 concatenated_ET3_4C_48_Copro_1_1.fastq
3814544 concatenated_ET3_4C_48_Copro_1_2.fastq
4149904 concatenated_ET3_4C_48_Gut_1_1.fastq
4149904 concatenated_ET3_4C_48_Gut_1_2.fastq
3725520 concatenated_ET3_RT_48_Gut_1_1.fastq
3725520 concatenated_ET3_RT_48_Gut_1_2.fastq
293860 NG-25787_V3V4a_ET1_20C_48_Gut_lib417413_6977_1_1.fastq
293860 NG-25787_V3V4a_ET1_20C_48_Gut_lib417413_6977_1_2.fastq
252796 NG-25787_V3V4a_ET1_4C_48_Gut_lib417411_6977_1_1.fastq
252796 NG-25787_V3V4a_ET1_4C_48_Gut_lib417411_6977_1_2.fastq
291536 NG-25787_V3V4a_ET1_RT_120_copro_lib417416_6977_1_1.fastq
291536 NG-25787_V3V4a_ET1_RT_120_copro_lib417416_6977_1_2.fastq
261080 NG-25787_V3V4a_ET1_RT_120_Gut_lib417415_6977_1_1.fastq
261080 NG-25787_V3V4a_ET1_RT_120_Gut_lib417415_6977_1_2.fastq
225564 NG-25787_V3V4a_ET1_RT_48_copro_lib417410_6977_1_2.fastq
3842916 NG-25787_V3V4a_ET1_RT_48_copro_lib417410_6999_1_1.fastq
278916 NG-25787_V3V4a_ET1_RT_48_Gut_lib417409_6977_1_1.fastq
278916 NG-25787_V3V4a_ET1_RT_48_Gut_lib417409_6977_1_2.fastq
269384 NG-25787_V3V4a_ET2_20C_48_Gut_lib417423_6977_1_1.fastq
269384 NG-25787_V3V4a_ET2_20C_48_Gut_lib417423_6977_1_2.fastq
345760 NG-25787_V3V4a_ET2_4C_120_copro_lib417428_6977_1_1.fastq
345760 NG-25787_V3V4a_ET2_4C_120_copro_lib417428_6977_1_2.fastq
280564 NG-25787_V3V4a_ET2_4C_120_Gut_lib417427_6977_1_1.fastq
280564 NG-25787_V3V4a_ET2_4C_120_Gut_lib417427_6977_1_2.fastq
275408 NG-25787_V3V4a_ET2_4C_48_Copro_lib417422_6977_1_1.fastq
275408 NG-25787_V3V4a_ET2_4C_48_Copro_lib417422_6977_1_2.fastq
372664 NG-25787_V3V4a_ET2_4C_48_Gut_lib417421_6977_1_1.fastq
372664 NG-25787_V3V4a_ET2_4C_48_Gut_lib417421_6977_1_2.fastq
337712 NG-25787_V3V4a_ET2_RT_120_copro_lib417426_6977_1_1.fastq
337712 NG-25787_V3V4a_ET2_RT_120_copro_lib417426_6977_1_2.fastq
230600 NG-25787_V3V4a_ET2_RT_120_Gut_lib417425_6977_1_1.fastq
230600 NG-25787_V3V4a_ET2_RT_120_Gut_lib417425_6977_1_2.fastq
4026912 NG-25787_V3V4a_ET2_RT_120_Gut_lib417425_6999_1_1.fastq
4026912 NG-25787_V3V4a_ET2_RT_120_Gut_lib417425_6999_1_2.fastq
306068 NG-25787_V3V4a_ET2_RT_48_copro_lib417420_6977_1_1.fastq
306068 NG-25787_V3V4a_ET2_RT_48_copro_lib417420_6977_1_2.fastq
266856 NG-25787_V3V4a_ET2_RT_48_Gut_lib417419_6977_1_1.fastq
266856 NG-25787_V3V4a_ET2_RT_48_Gut_lib417419_6977_1_2.fastq
277256 NG-25787_V3V4a_ET3_20C_48_Copro_lib417434_6977_1_1.fastq
277256 NG-25787_V3V4a_ET3_20C_48_Copro_lib417434_6977_1_2.fastq
263972 NG-25787_V3V4a_ET3_20C_48_Gut_lib417433_6977_1_1.fastq
263972 NG-25787_V3V4a_ET3_20C_48_Gut_lib417433_6977_1_2.fastq
263700 NG-25787_V3V4a_ET3_4C_120_copro_lib417438_6977_1_1.fastq
263700 NG-25787_V3V4a_ET3_4C_120_copro_lib417438_6977_1_2.fastq
270972 NG-25787_V3V4a_ET3_4C_120_Gut_lib417437_6977_1_1.fastq
270972 NG-25787_V3V4a_ET3_4C_120_Gut_lib417437_6977_1_2.fastq
252164 NG-25787_V3V4a_ET3_RT_120_copro_lib417436_6977_1_1.fastq
252164 NG-25787_V3V4a_ET3_RT_120_copro_lib417436_6977_1_2.fastq
283676 NG-25787_V3V4a_ET3_RT_120_Gut_lib417435_6977_1_1.fastq
283676 NG-25787_V3V4a_ET3_RT_120_Gut_lib417435_6977_1_2.fastq
280092 NG-25787_V3V4a_ET3_RT_48_copro_lib417430_6977_1_1.fastq
280092 NG-25787_V3V4a_ET3_RT_48_copro_lib417430_6977_1_2.fastq

All the paired samples had the same counts

  1. I have already renamed the files in the manifest file in order to avoid underscores, as suggested in another post.

  2. Also I must say that the first step I did was to concatenate the forward files and reverse files within some samples because the raw files I received were splitted in some cases.
    So I used the command:

cat sample1_forward_file_1.fastq sample1_forward_file_2.fastq > concatenated_sample1_forward_file.fastq

cat sample1_reverse_file_1.fastq sample1_reverse_file_2.fastq > concatenated_sample1_reverse_file.fastq

I tested using only one concatenated sample and dada2 ran ok, so I dicarded that the problem were caused due to concatenating files.

Any help would be appreciated, thank you

Hi @LuciaGG!

Thanks for the detailed information. Let's start by taking a peek at the demux.qzv that you attached:

Sample ET1-RT-48-copro doesn't have matching read counts. Looking at the bash one-liner you shared:

A similar story, here.

I would suggest double-checking that you transferred the files completely. As well, your concatenation step might have an issue. If that doesn't reveal anything, I suggest contacting your sequencing center.


@thermokarst thank you! I ran dada2 succesfully, my mistake was to not concatenate the sample ET1-RT-48-copro properly.

