Help processing mockrobiota mock-2 standards using Qiime2

jeffin604 · February 29, 2020, 12:09am

Hi,

I'm using qiime2-2019.10 to import, demultiplex and denoise the mock-2 dataset from mockrobiota:
https://github.com/caporaso-lab/mockrobiota/tree/master/data/mock-2

There's a note at the bottom of the page explaining how to demultiplex the reads in qiime1 (the barcodes file contains golay barcodes):

Note: These barcode reads contain golay barcodes, and the mapping barcodes need to be reverse-complemented to match the reads. Run in qiime-1 using the following command: split_libraries_fastq.py -i mock-forward-read.fastq.gz -o split_libraries -m sample-metadata.tsv -b mock-index-read.fastq.gz --rev_comp_mapping_barcodes

but I've been attempting to do the work using Qiime2 where I continue to get errors at the denoising step using Dada2. I suspect the problem is caused by something upstream of denoising. I'd appreciate any feedback you have.

Here are the steps I've taken:

qiime tools import \
--type EMPPairedEndSequences \
--input-path mock-2/fastq/run-1/ \
--output-path mock-2/qiime2/FEB-27-2020/mock-2-multiplexed-paired-end-sequences.qza

qiime demux emp-paired \
--m-barcodes-file mock-2/metadata/sample-metadata.tsv \
--m-barcodes-column BarcodeSequence \
--i-seqs mock-2/qiime2/FEB-27-2020/mock-2-multiplexed-paired-end-sequences.qza \
--o-per-sample-sequences mock-2/qiime2/FEB-27-2020/mock-2-demux.qza
--o-error-correction-details mock-2/qiime2/FEB-27-2020/error-correction-details
--p-rev-comp-mapping-barcodes

These first steps appear to work so I proceeded to denoise (reads are 150 bp) using:

qiime dada2 denoise-paired --i-demultiplexed-seqs mock-2/qiime2/FEB-27-2020/run-1.qza
--p-trim-left-f 6
--p-trim-left-r 6
--p-trunc-len-f 140
--p-trunc-len-r 120
--p-n-threads 0
--o-table mock-2/qiime2/FEB-27-2020/denoised-feature-table-run-1.qza
--o-representative-sequences mock-2/qiime2/FEB-27-2020/denoised-feature-seqs-run-1.qza
--o-denoising-stats mock-2/qiime2/FEB-27-2020/denoising-stats-run-1.qza

but I get this message from Dada2:

An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

The output in Dada2 error file contains:

R version 3.5.1 (2018-07-02)
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.2 / RcppParallel: 4.4.4

Filtering .

Learning Error Rates
402 total bases in 3 reads from 1 samples will be used for learning the error rates.
342 total bases in 3 reads from 1 samples will be used for learning the error rates.

Denoise remaining samples .

Remove chimeras (method = consensus)
Error in isBimeraDenovoTable(unqs[[i]], ..., verbose = verbose) :
Input must be a valid sequence table.
Calls: removeBimeraDenovo -> isBimeraDenovoTable
Execution halted
Traceback (most recent call last):
File "/home/glwinsor/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 257, in denoise_paired
run_commands([cmd])
File "/home/glwinsor/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/home/glwinsor/miniconda3/envs/qiime2-2019.10/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmpr1w4n50h/forward', '/tmp/tmpr1w4n50h/reverse', '/tmp/tmpr1w4n50h/output.tsv.biom', '/tmp/tmpr1w4n50h/track.tsv', '/tmp/tmpr1w4n50h/filt_f', '/tmp/tmpr1w4n50h/filt_r', '140', '120', '6', '6', '2.0', '2.0', '2', 'consensus', '1.0', '0', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/glwinsor/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2cli/commands.py", line 328, in call
results = action(**arguments)
File "</home/glwinsor/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/decorator.py:decorator-gen-459>", line 2, in denoise_paired
File "/home/glwinsor/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
output_types, provenance)
File "/home/glwinsor/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/qiime2/sdk/action.py", line 383, in callable_executor
output_views = self._callable(**view_args)
File "/home/glwinsor/miniconda3/envs/qiime2-2019.10/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 272, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more

I'd appreciate any suggestions on how to get past the dada2 step in Qiime2. Thanks!

Nicholas_Bokulich · February 29, 2020, 12:25am

Welcome to the forum, @jeffin604!

Yep, this resource is from the pre-QIIME 2 days and still needs to be updated to list QIIME 2 importing, demux (and maybe also denoising) steps. (contributions are very welcome if you get this sorted out!)

I believe the issue is with the denoising parameters, actually, or with the use of paired-end reads. Let me explain

This dada2 error occurs when there are no reads left after denoising... most often that happens because all reads fail to merge.

These are V4 sequences, which should be around 290-300 nt long, and hence trimming at 140 + 120 nt is just not long enough to overlap. dada2 requires minimum 12nt overlap to merge reads, so these reads (which I believe are 150nt PE, correct?) may just not be long enough to merge period.

As far as I recall, whenever I've used that mock community (or many of the older mock communities in mockrobiota), I have only used single-end reads. You should probably do the same with this mock community, or use one of the later/longer datasets if you want to use both paired-end reads for your analysis (check out the inventory to see read length, etc... the lower the community ID, the older the dataset more or less, so mock1-6 are really old)

Let me know if you have any other questions and thanks for bringing this up!

jeffin604 · February 29, 2020, 12:29am

Thanks @Nicholas_Bokulich, those are some good tips. It crossed my mind they were a little short but I'm still a little green with respect to microbiome analysis. I'll try another dataset and once I get it working, I'd be happy to contribute.

system · March 31, 2020, 6:29am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.