Error in "Write output" step of Qiime2

Dear Friends,

I get this error from Qiime2 after the dada2 denoise ran for 2 weeks. Could you please let me know how to resolve this issue? I would really appreciate.

  1. Remove chimeras (method = consensus)
  2. Write output
    Traceback (most recent call last):
    File “/root/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/commands.py”, line 274, in call
    results = action(**arguments)
    File “</root/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-442>”, line 2, in denoise_paired
    File “/root/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
    output_types, provenance)
    File “/root/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py”, line 389, in callable_executor
    prov = provenance.fork(name)
    File “/root/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/core/archive/provenance.py”, line 423, in fork
    forked = super().fork()
    File “/root/miniconda2/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/core/archive/provenance.py”, line 326, in fork
    distutils.dir_util.copy_tree(str(self.path), str(forked.path))
    File “/root/miniconda2/envs/qiime2-2019.1/lib/python3.6/distutils/dir_util.py”, line 124, in copy_tree
    “cannot copy tree ‘%s’: not a directory” % src)
    distutils.errors.DistutilsFileError: cannot copy tree ‘/tmp/qiime2-provenance-03mgxmp4’: not a directory

Plugin error from dada2:

cannot copy tree ‘/tmp/qiime2-provenance-03mgxmp4’: not a directory

See above for debug info.
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmp7pxry1hv/forward /tmp/tmp7pxry1hv/reverse /tmp/tmp7pxry1hv/output.tsv.biom /tmp/tmp7pxry1hv/track.tsv /tmp/tmp7pxry1hv/filt_f /tmp/tmp7pxry1hv/filt_r 130 130 0 0 2.0 2 consensus 1.0 2 1000000

Thanks!

Hi @danielsebas

Sounds like QIIME 2 couldn't find the temporary files from this command (read on)...

That is probably the answer why! I suspect your host OS "cleaned up" some of the temporary files too soon. We don't see that often, but it does come up every now and then (macOS is usually the culprit, though).

What was the command you ran? Copy and paste please. One workaround is to re-run, but after setting the $TMPDIR env var to another (non-os owned) location. As well, I wonder if we can speed the command up by setting some --p-n-threads...

:qiime2:

Thanks @thermokarst. this is the command used. For another dataset it ran well and finished in aday, whereas for another dataset it took 2 weeks and crashed with the above error: (very painful :expressionless: )

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs ww-DNA_1to11_paired-end-demux-trimmed-2.qza \
  --p-trunc-len-f 130 --p-trunc-len-r 130 --p-trim-left-f 0 --p-trim-left-r 0 \
  --p-chimera-method consensus --p-n-threads 4 \
  --o-representative-sequences ww-DNA_1to11_paired-end-demux-trimmed_dada2-rep-seqs.qza \
  --o-table ww-DNA_1to11_paired-end-demux-trimmed-dada2-rep-seqs-table.qza \
  --o-denoising-stats ww-DNA_1to11_paired-end-demux-trimmed_dada2-rep-seqs-stats-dada2.qza

Please let me know how you think I can avoid that error to happen and yes I will increase number of nodes to 10. Thanks!

Awesome, thanks!

No need to increase the nodes, since DADA2 isn't multi-tenant, but you can certainly increase the number of threads used on a single node, so long as the node has the CPUs to support it.

Can you tell us about the differences between these two datasets?

Try rerunning with the $TMPDIR env var set to a location that is in your control (for example, your home directory).

Thanks @thermokarst.

The difference in the dataset is that the cDNA paired end illumina sequenced data belonging to bacteria in water sample was sequenced only for V4 region of the bacteria; whereas, the DNA data was sequenced for all variable regions of the bacteria, hence in my understanding the cDNA went fast and DNA takes time. Do you agree?

By changing the tmp dir do you mean:

mkdir qiime-tmp
export TMPDIR="$PWD/qiime-tmp/"

in the current terminal, if I want to make it last long, I can put it in the .bashrc file? Just in case, the terminal gets closed accidentally.

Thanks!

:man_shrugging: Not sure. I should've been more specific --- how many samples in each set, and what about overall number of reads. Just thinking about the things important to the computer here.

:+1: You betcha!

Yep, as long as bash is your shell. Otherwise, there is a similar config file for each shell out there. Have fun!

these are the qzv files before dada2 step. There is huge difference between the total number of reads in cDNA and DNA datasets.

Let me know of your comments. ww-cDNA_1to11_paired-end-demux-trimmed-2.qzv (294.8 KB)
ww-DNA_1to11_paired-end-demux-trimmed-2.qzv (296.9 KB)

Thanks!

Seems pretty reasonable, although your quality profiles looks pretty strange — the lack of any real distribution of quality scores leads me to believe that these sequences have had some kind of quality-control step applied to them already. DADA2 is designed to work with the original noisy reads — that is how it builds the error profile. If my hunch is right and these reads have been altered, I would suggest you get your hands on the source data (pre-qa/qc), and try from there.

Thanks @thermokarst . I did not do any quality control on the fastq files. I only trimmed the primers from the reads before demultiplexing. Should I not remove the primers? Thanks

I strongly encourage you to reach out to your sequencing center --- those error profiles look like the product of some cleanup effort. Not that that is necessarily a problem, but, if it were me, I would want to know!

All non-biological sequences need to be removed from reads prior to using DADA2 (this includes primers).

1 Like

Thanks, I will consult with the data provider but, can you please be more specific on why you think the data has been through quality control. Your explanation will help me to understand the intricacies of the analysis. Thanks!

Hi @danielsebas!

Yep, of course, I actually provided that information above:

So, the concern here is that pretty much all positions in your reads are showing similar, very narrow, ranges in quality scores. This is not what we usually see in the wild, particularly with Illumina data (can you clarify your sequencing platform, by the way?) Normally we see a much more "natural" spread of quality scores. A good example of this spread can be seen here. Hope that helps! :t_rex:

Thanks @thermokarst. I see the point now. The sequencing platform is Illumina. Your comments really helped me to clarify the concepts :slight_smile:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.