Issues with Quality Control and Pair End Joining

Bill_Yen · May 27, 2020, 4:07am

Hello all! I am new to Qiime2, and after finally being able to import stuff into the virtual machine, I am having some problem doing quality control and pair end joining with my test data. I am running qiime2-2020.2 via Virtualbox, and below is my code (with verbose) and its associated error:

Code:
qiime dada2 denoise-paired --i-demultiplexed-seqs demux-paired-end.qza --p-trunc-len-f 230 --p-trunc-len-r 210 --p-trim-left-f 19 --p-trim-left-r 20 --p-n-threads 7 --o-representative-sequences dada2_rep_seq_16s.qza --o-table dada2_table_feature.qza --p-trunc-q 10 --p-max-ee-f 2.5 --o-denoising-stats dada2_denoise --verbose

Error:
Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_paired.R /tmp/tmp6w2ny275/forward /tmp/tmp6w2ny275/reverse /tmp/tmp6w2ny275/output.tsv.biom /tmp/tmp6w2ny275/track.tsv /tmp/tmp6w2ny275/filt_f /tmp/tmp6w2ny275/filt_r 230 210 19 20 2.5 2.0 10 consensus 1.0 7 1000000

R version 3.5.1 (2018-07-02)
Loading required package: Rcpp
DADA2: 1.10.0 / Rcpp: 1.0.3 / RcppParallel: 4.4.4

Filtering Error in names(answer) <- names1 :
'names' attribute [22] must be the same length as the vector [6]
Execution halted
Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 257, in denoise_paired
run_commands([cmd])
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 36, in run_commands
subprocess.run(cmd, check=True)
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['run_dada_paired.R', '/tmp/tmp6w2ny275/forward', '/tmp/tmp6w2ny275/reverse', '/tmp/tmp6w2ny275/output.tsv.biom', '/tmp/tmp6w2ny275/track.tsv', '/tmp/tmp6w2ny275/filt_f', '/tmp/tmp6w2ny275/filt_r', '230', '210', '19', '20', '2.5', '2.0', '10', 'consensus', '1.0', '7', '1000000']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py", line 328, in call
results = action(**arguments)
File "</home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/decorator.py:decorator-gen-455>", line 2, in denoise_paired
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
output_views = self._callable(**view_args)
File "/home/qiime2/miniconda/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_dada2/_denoise.py", line 272, in denoise_paired
" and stderr to learn more." % e.returncode)
Exception: An error was encountered while running DADA2 in R (return code 1), please inspect stdout and stderr to learn more.

What does this error mean? It seems like there is a mismatch between the length of my attribute and the length of my vector, but I'm not sure how to fix it. Thank you for all of your support!

Mehrbod_Estaki · May 27, 2020, 9:57am

Hi @Bill_Yen,
We see this error pop up when the user is assigning more threads than their system can allocate.
In your case its possible that you are still using default VB settings which only assigns 1 thread to that environment, therefore when you are calling 7 threads it is not able to find them/allocate them.
So you can try either a) changing the VB settings to assign more threads to that environment or b) reducing --p-n-threads 7 to 1 and see if that solves the problem. I would go with option a) personally.
Note that sometimes even with enough threads available, we see this error pop up when the system can't for some reason allocate to those threads. In those cases simply assigning less threads (from 7 to say 4) also seems to solve the problem.

Bill_Yen · May 27, 2020, 3:47pm

I have upped the number of CPUs allocated to the system to 3 out of the maximum 4 allowed on my computer, and tried to run the same code, but the same error occurred. I then downed --p-n-threads to just 1, and now the terminal seems to be frozen. I tried --p-n-threads 4 as well and it also froze there. What should I do from here? Also, is there a way to run these code from file instead of directly through the terminal to avoid typing the same thing over again after the virtual machine powers off?

Mehrbod_Estaki · May 27, 2020, 8:23pm

Hi @Bill_Yen,
I probably should have mentioned this alongside my last suggestion (thanks @thermokarst). You also want to make sure you are assigning a reasonable amount of memory to the VB as well. As you increase # of threads, the amount of memory requirement also goes up. How much RAM is available to this environment? If you can allocate at least 6-8 Gigs of RAM to your VB, then dada2 should run fine with 3 threads (assuming your data is not a crazy large set).

Can you elaborate? When a process is running the terminal window wouldn't show you a prompt line, and you won't be able to run anything else until that just is completed or terminated (ctrl + c). This to me says that DADA2 is working. However with 1 thread and the very low default VB memory allocation this might take a very long time. So you can either let it run to completion which might take a while, or you can terminate the task, restart VB with more RAM designated and try again with 3 threads.

Sure, you could write a script to run whatever you want but probably an easier solution for you since you're just starting out and will likely be making a lot of changes to your scripts is to simply type the commands in a text editor, Ubuntu VM I think comes with a plain text editor or g-edit (maybe?) and copy paste from that. You can also use the up/down arrow key to go through the history of your previous commands ran in terminal and just pick the last command you executed so you don't have to re-type everything. Type history to see a full list of previous commands you ran.

Bill_Yen · May 27, 2020, 8:46pm

I am currently assigning 5053 MB to the VirtualBox base memory, and what I meant by the terminal being frozen is that it wouldn't show me a prompt line. I can still type stuff or terminate it, but nothing will happen since it is still running DADA2 as you said. I let it sit for a while but nothing happened, so I terminated the terminal. Below is a screen shot of what this looks like for reference:

*Update: I waited for 30 minutes, and the command eventually finished running. However, now I ran into an even more troubling problem of the virtual disk image using up all the space on my laptop when I tried to do taxonomy annotation (code included below). After downloading the silva classifier and running the command with it, the VDI ballooned in size to the point where it’s now 48.3 GB, and the virtual machine had to be paused because my computer had no more storage (It has 213 GB total).
Code:
qiime feature-classifier classify-sklearn --i-classifier '/home/qiime2/Desktop/silva-132-99-515-806-nb-classifier.qza' --i-reads dada2_rep_seq_16s.qza --o-classification taxonomy.qza

Since this is now a slightly different issue, should I turn this into a new post? Thanks!

Mehrbod_Estaki · May 28, 2020, 7:16am

Hi @Bill_Yen,
Glad you got the problem sorted! 30 minutes is pretty quick actually for dada2 on low threads/memory. Glad to hear it.

Yes please! Also I should note that this is a common problem with the silva database and you can easily search the forum for your error message and see what the recommended solutions are.

Good luck

system · June 28, 2020, 1:21pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.