DADA2 Mismatched forward and reverse sequence files(different counts of every sample)

Dear All,

I am a new user of QIIME2, before I imported my fastq.files, I trimmed barcodes and primers, demutiplexed my data into R1,R2 files per sample (I used 'choosetag' function of the Galaxy).
After the import, demux-summary-3.qzv was generated.

Then I started to denoise with dada2. Here is my code:

qiime dada2 denoise-paired
--p-trunc-len-f 223
--p-trunc-len-r 223
--i-demultiplexed-seqs half-paired-end-demux-3.qza
--o-representative-sequences rep-seqs-3.qza
--o-table table-3.qza
--o-denoising-stats stats-3.qza

However it showed error:

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/forward /var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/reverse /var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/output.tsv.biom /var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/track.tsv /var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/filt_f /var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/filt_r 223 223 0 0 2.0 2 consensus 1.0 1 1000000

R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0

  1. Filtering Error in (function (fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, :
    Mismatched forward and reverse sequence files: 76127, 73351.
    Execution halted
    Traceback (most recent call last):
    File "/Users/ziyanqin/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 231, in denoise_paired
    run_commands([cmd])
    File "/Users/ziyanqin/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
    subprocess.run(cmd, check=True)
    File "/Users/ziyanqin/miniconda3/envs/qiime2-2019.1/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command '['run_dada_paired.R', '/var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/forward', '/var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/reverse', '/var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/output.tsv.biom', '/var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/track.tsv', '/var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/filt_f', '/var/folders/4j/c8lp_8yd7474wrs0ft7cy98c0000gq/T/tmpgnx5uiop/filt_r', '223', '223', '0', '0', '2.0', '2', 'consensus', '1.0', '1', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/ziyanqin/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "</Users/ziyanqin/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-442>", line 2, in denoise_paired
File "/Users/ziyanqin/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/Users/ziyanqin/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 365, in callable_executor
output_views = self._callable(**view_args)
File "/Users/ziyanqin/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 246, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

I noticed "Mismatched forward and reverse sequence files: 76127, 73351." Then I searched the forum and found the following code counting reads:

qiime tools export --output-path import-check --input-path half-paired-end-demux-3.qza
cd import-check
for f in *.fastq; do r=(( (wc -l < $f | tr -d '[:space:]') / 4 )); echo $r $f; done

Odd things happened...(results)

76127 normal15a_0_L001_R1_001.fastq
73351 normal15a_8_L001_R2_001.fastq
59156 normal15b_1_L001_R1_001.fastq
56461 normal15b_9_L001_R2_001.fastq
60392 normal15c_10_L001_R2_001.fastq
63544 normal15c_2_L001_R1_001.fastq
53972 normal15d_11_L001_R2_001.fastq
58295 normal15d_3_L001_R1_001.fastq
69953 normal16a_12_L001_R2_001.fastq
71056 normal16a_4_L001_R1_001.fastq
81704 normal16b_13_L001_R2_001.fastq
83045 normal16b_5_L001_R1_001.fastq
88786 normal16c_14_L001_R2_001.fastq
90377 normal16c_6_L001_R1_001.fastq
60770 normal16d_15_L001_R2_001.fastq
61871 normal16d_7_L001_R1_001.fastq

The counts of R1/R2 of each sample are totally different!

Could anyone help me with that?

Hi @ziyan,

Thank you for the background information — trimming with choosetag is clearly the issue here (perhaps the function does not control for paired reads?).

Note that QIIME 2 can perform that same trimming for you, using the plugin q2-cutadapt. I recommend importing the raw data and going that route to avoid this issue.

Good luck!

Hi Nicholas!

Thank you so much for the quick answer!

I looked the documents for plugin q2-cutadapt, and disappointedly found that it is not suitable for my data, since my data has already excluded barcodes when I received it from my collaborator. While the data still contains primers, and importantly, the fastq files are NOT demultiplexed! :disappointed_relieved:

I searched the forum and found one topic related to me

Due to the answer there is no way to import multiplexed fastq files without barcodes, so I demultiplexed the data useing 'choosetag' function to generate the demultiplexed fastq.

Since this situation, any advice?:persevere:
Thanks a lot!!!:relaxed:

Hi @ziyan,
You are right, if your sequences are multiplexed but no longer have barcodes QIIME 2 cannot demultiplex these reads… unless if the barcodes are in the sequences, in which case q2-cutadapt can demultiplex these.

I recommend finding a different program to demultiplex, seeing if choosetag has a parameter to either handle paired-end read data or retain reads that do not match, or find another way to drop sequences in the forward or reverse reads that no longer have their paired read (because choosetag dropped it).

Good luck!

Hi Nicholas!

Thanks for your reply!!!

I am able to find one way in the forum, and so far worked fine. I will keep posted if something arises. Again, thanks!

And thank you Danny!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.