DADA2 Error (Return Code 1) - Mismatched Sequences

Hello,

I get the following error when I try to run DADA2.

(qiime2-2017.11) Ginas-MacBook-Pro-2:BigBend ginacerbie$ qiime dada2 denoise-paired \

--i-demultiplexed-seqs paired-end-demux3.qza
--p-trim-left-f 13
--p-trim-left-r 13
--p-trunc-len-f 150
--p-trunc-len-r 150
--o-representative-sequences rep-seqs.qza
--o-table table.qza --verbose
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/forward /var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/reverse /var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/output.tsv.biom /var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/filt_f /var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/filt_r 150 150 13 13 2.0 2 consensus 1.0 1 1000000

R version 3.3.2 (2016-10-31)
Loading required package: Rcpp
There were 50 or more warnings (use warnings() to see the first 50)
DADA2 R package version: 1.4.0

  1. Filtering ....................................................Error in fastqPairedFilter(c(unfiltsF[[i]], unfiltsR[[i]]), c(filteredFastqF, :
    Mismatched forward and reverse sequence files: 78418, 78417.
    Execution halted
    Traceback (most recent call last):
    File "/Users/ginacerbie/miniconda3/envs/qiime2-2017.11/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 179, in denoise_paired
    run_commands([cmd])
    File "/Users/ginacerbie/miniconda3/envs/qiime2-2017.11/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 35, in run_commands
    subprocess.run(cmd, check=True)
    File "/Users/ginacerbie/miniconda3/envs/qiime2-2017.11/lib/python3.5/subprocess.py", line 398, in run
    output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command '['run_dada_paired.R', '/var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/forward', '/var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/reverse', '/var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/output.tsv.biom', '/var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/filt_f', '/var/folders/1z/j61p4k6d079f7rnvw63s0mnm0000gn/T/tmpr1fjj6dz/filt_r', '150', '150', '13', '13', '2.0', '2', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/ginacerbie/miniconda3/envs/qiime2-2017.11/lib/python3.5/site-packages/q2cli/commands.py", line 218, in call
results = action(**arguments)
File "", line 2, in denoise_paired
File "/Users/ginacerbie/miniconda3/envs/qiime2-2017.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 220, in bound_callable
output_types, provenance)
File "/Users/ginacerbie/miniconda3/envs/qiime2-2017.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 355, in callable_executor
output_views = self._callable(**view_args)
File "/Users/ginacerbie/miniconda3/envs/qiime2-2017.11/lib/python3.5/site-packages/q2_dada2/_denoise.py", line 194, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

Plugin error from dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

See above for debug info.


I thought the following "Mismatched forward and reverse sequence files: 78418, 78417" was odd. In the other forum posts regarding this topic, I noticed that those numbers were more than one off. Any ideas on what might be causing this mismatch?

My manifest file is attached below but I was not able to find any naming errors.
AmphipodManifest.csv (9.8 KB)

In case it helps, I used the following to import the data:
(qiime2-2017.11) Ginas-MacBook-Pro-2:BigBend ginacerbie$ qiime tools import \

--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path AmphipodManifest.csv
--output-path paired-end-demux3.qza
--source-format PairedEndFastqManifestPhred33

Thank you so much!
Gina

This error means that a reverse reads file has one fewer read than your forward reads, which doesn't seem particularly odd - this kind of thing can happen pretty easily, especially by accidentally removing a line at usually the very beginning or very end of the file.

Can you run the following, and provide the output from this command?

$ qiime tools validate paired-end-demux3.qza

This will perform thorough validation of the artifact, and may take a bit of time to run. Thanks! :t_rex:

Please find the below screenshot of the above command and the output.

10%20PM

Thank you - I appreciate your time!
Gina

Bummer, I was hoping something would turn up there. How about you run the following and send back the output?

cd /Users/ginacerbie/BigBend/Read_1/
for f in *.fastq; do r1=$(wc -l < $f | tr -d '[:space:]'); r2=$(wc -l < ../r2/$f | tr -d '[:space:]'); echo $r1 $r2 $f; done

This will give us the line count for each fastq file in your dataset, then we can start doing some R1 vs R2 comparisons — I expect to see at least one sample that has files that don’t have identical line counts (the number in the second column is different from the number in the first column).

The above assumes you have perfectly paired files in the two directories. If you don’t, you can run the following, but it will require a bit of manual cleanup and parsing (in Excel for example) to make sense of it.

echo "Read_1 files"
cd /Users/ginacerbie/BigBend/Read_1/
for f in *.fastq; do wc -l $f; done
echo "Read_2 files"
cd /Users/ginacerbie/BigBend/Read_2/
for f in *.fastq; do wc -l $f; done

Thanks! :t_rex:

1 Like

My files must not have been perfectly paired in the two directories because when I ran the first command I received the following for all my samples:

32%20PM

However, it looks like the second command worked! Please see the following:
Read1Read2Compare.csv (4.4 KB)

Rows 54, 55, 69, 71, 80 and 85 contain samples that do not have identical read 1 and read 2 counts. Do you think I will have to drop those samples?

Thanks!
Gina

Hi @GinaC!

Dropping those samples is one option, but lets see if we can get these cleaned up, instead! Looking at the attachment, all of the samples were only missing one read in the reverse direction, with the exception of H_DY_4, which was missing two records.

I just whipped up a quick one-liner that should help us identify which record is missing in your reverse files. One option once you have done that is to open up the forward read in a text editor and remove the offending record. We can help you with that if you want.

cd /Users/ginacerbie/BigBend/Read_1/
for f in *.fastq; do echo $f; diff <(awk 'NR==1||(NR-1)%4==0' $f) <(awk 'NR==1||(NR-1)%4==0' ../Read_2/$f); done

Can you run that and then copy-and-paste the results back here? You will see some errors about those unmatched paired files, like what you reported above, but that shouldn’t be a problem (famous last words).

Sample output should look something like this :crossed_fingers: :

L1S105_9_L001_R1_001.fastq
11339a11340
> @HWI-EAS440_0386:6:111:5426:19682#0/1
L1S140_6_L001_R1_001.fastq

That is a little hard to parse, but it is a diff between the read IDs pulled from the forward and reverse files, this utility should help us detect minor changes between the two files, if everything works according to plan!

Thanks! :t_rex:

Hmm… I think there might be something wrong with my output since it looks a bit different what you posted. I copied just a small part of the output below but if that is right and you want me to post the rest, please let me know! I noticed there were a few lines with diffs=1, is that what I should be looking for?

Ginas-MacBook-Pro-2:~ ginacerbie$ cd /Users/ginacerbie/BigBend/Read_1/
Ginas-MacBook-Pro-2:Read_1 ginacerbie$ for f in *.fastq; do echo $f; diff <(awk ‘NR==1||(NR-1)%4==0’ $f) <(awk ‘NR==1||(NR-1)%4==0’ …/Read_2/$f); done
G_BLBC_1.fastq
1,14958c1,14958
< @G_BLBC_1_318 M01942:4:000000000-ATE0P:1:1101:14107:2261 1:N:0:0 orig_bc=TACGACGACCAC new_bc=TACGATGACCAC bc_diffs=1
< @G_BLBC_1_362 M01942:4:000000000-ATE0P:1:1101:14111:2278 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0
< @G_BLBC_1_2649 M01942:4:000000000-ATE0P:1:1101:10031:3084 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0
< @G_BLBC_1_2729 M01942:4:000000000-ATE0P:1:1101:10028:3109 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0
< @G_BLBC_1_5488 M01942:4:000000000-ATE0P:1:1101:8632:3856 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0
< @G_BLBC_1_5575 M01942:4:000000000-ATE0P:1:1101:8631:3877 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0
< @G_BLBC_1_5894 M01942:4:000000000-ATE0P:1:1101:12630:3951 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0
< @G_BLBC_1_5976 M01942:4:000000000-ATE0P:1:1101:12632:3970 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0
< @G_BLBC_1_6342 M01942:4:000000000-ATE0P:1:1101:17925:4049 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0
< @G_BLBC_1_6387 M01942:4:000000000-ATE0P:1:1101:17944:4059 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0
< @G_BLBC_1_6554 M01942:4:000000000-ATE0P:1:1101:13358:4097 1:N:0:0 orig_bc=TACGATGACCAC new_bc=TACGATGACCAC bc_diffs=0

Thanks!
Gina

Ah bummer, the read id format is a bit different compared to what I was using when I tested earlier. Will come up with a plan B - stay tuned! :tv:

1 Like

Hey @GinaC, let’s try that again, but with this modified diff command:

cd /Users/ginacerbie/BigBend/Read_1/
for f in *.fastq; do echo $f; diff <(awk 'NR==1||(NR-1)%4==0{print $1}' $f) <(awk 'NR==1||(NR-1)%4==0{print $1}' ../Read_2/$f); done

:crossed_fingers: :crossed_fingers: :crossed_fingers: :crossed_fingers: :crossed_fingers:

1 Like

Hello! Thanks for the new command. Please see a portion of the output below. I have a feeling this is not what we were hoping for…

(qiime2-2017.11) Ginas-MacBook-Pro-2:BigBend ginacerbie$ cd /Users/ginacerbie/BigBend/Read_1/
(qiime2-2017.11) Ginas-MacBook-Pro-2:Read_1 ginacerbie$ for f in *.fastq; do echo $f; diff <(awk ‘NR==1||(NR-1)%4==0{print $1}’ $f) <(awk ‘NR==1||(NR-1)%4==0{print $1}’ …/Read_2/$f); done
G_BLBC_1.fastq
8185,8231c8185,8232
< @G_BLBC_1_2843354
< @G_BLBC_1_2843469
< @G_BLBC_1_2843711
< @G_BLBC_1_2843794
< @G_BLBC_1_2843805
< @G_BLBC_1_2844540
< @G_BLBC_1_2845155
< @G_BLBC_1_2845266
< @G_BLBC_1_2845861
< @G_BLBC_1_2845873

Any other ideas? Thanks!

Thanks for sharing your data with me in a DM, @GinaC! Upon closer inspection, it doesn't look like the pairs are off by one read --- there is actually some issues with the read identifiers, too.

For example, here is the first part of the diff for file pair G_SS_1.fastq:

42783,42815c42783,42815
< @G_SS_1_2840219
< @G_SS_1_2840222
< @G_SS_1_2840289
< @G_SS_1_2840332
< @G_SS_1_2840337
< @G_SS_1_2840432
< @G_SS_1_2840539
< @G_SS_1_2840870
< @G_SS_1_2840874
< @G_SS_1_2840968
< @G_SS_1_2841062
< @G_SS_1_2841074
< @G_SS_1_2841468
< @G_SS_1_2841497
< @G_SS_1_2841618
< @G_SS_1_2841626
< @G_SS_1_2841722
< @G_SS_1_2841741
< @G_SS_1_2841764
< @G_SS_1_2841806
< @G_SS_1_2841809
< @G_SS_1_2841845
< @G_SS_1_2842149
< @G_SS_1_2842223
< @G_SS_1_2842313
< @G_SS_1_2842329
< @G_SS_1_2842348
< @G_SS_1_2842356
< @G_SS_1_2842387
< @G_SS_1_2842407
< @G_SS_1_2842431
< @G_SS_1_2842447
< @G_SS_1_2842459
---
> @G_SS_1_2840218
> @G_SS_1_2840221
> @G_SS_1_2840288
> @G_SS_1_2840331
> @G_SS_1_2840336
> @G_SS_1_2840431
> @G_SS_1_2840538
> @G_SS_1_2840869
> @G_SS_1_2840873
> @G_SS_1_2840967
> @G_SS_1_2841061
> @G_SS_1_2841073
> @G_SS_1_2841467
> @G_SS_1_2841496
> @G_SS_1_2841617
> @G_SS_1_2841625
> @G_SS_1_2841721
> @G_SS_1_2841740
> @G_SS_1_2841763
> @G_SS_1_2841805
> @G_SS_1_2841808
> @G_SS_1_2841844
> @G_SS_1_2842144
> @G_SS_1_2842218
> @G_SS_1_2842308
> @G_SS_1_2842324
> @G_SS_1_2842343
> @G_SS_1_2842351
> @G_SS_1_2842382
> @G_SS_1_2842402
> @G_SS_1_2842426
> @G_SS_1_2842442
> @G_SS_1_2842462
...

Basically, there are big chunks where the read identifiers all get offset by one value in the incrementing counter. Check out this screenshot:

The forward read is on the left side, the reverse read is on the right side. Basically, up to line 171129 (the highlighted line), the read identifiers are matched (the same in the fwd and rev reads), but starting on 171129, they are all off by one @G_SS_1_2840219 vs @G_SS_1_2840218, for example.

I don't know what to do for these data --- maybe you could try and demultiplex them again from the source data? I would be curious to know if anyone else has any thoughts about how these reads identifiers got to be this way in the first place. Maybe check in with your sequencing center to see if they have any thoughts. Please keep us posted!

1 Like

Thank you so much for the update! I will try to demultiplex them again and see if that helps. If not, I’ll see what the sequencing center thinks. I’ll let you know what we conclude.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.