Qiime dada2 denoise-single

idapantoja · July 13, 2018, 6:12pm

Hi!

I am wondering how long it can take to finish the step qiime dada2 denoise-single ?
I am running it for 85 samples and its being around 2 hours and not done yet. I would appreciate your feedback.
Thanks!
idapantoja

colinbrislawn · July 14, 2018, 8:49pm

Hello idapantoja,

This is a great question! Unfortunately, I don't have a great answer.

Estimating runtime is hard because it depends on the size and complexity of your data set. The easiest way to estimate total run time is to run the program with a subset of your data (say, 2 samples), and then compare the file size of those two samples to the total file size of your full run.

For this specific plugin, you could also consider
increasing --p-n-threads for more threads running at once and
decrease --p-n-reads-learn to make it lean error rates on a smaller set of data.

Let me know if that helps,
Colin

idapantoja · July 16, 2018, 5:37pm

Hi Colin,

Thanks so uch for your response. Apparently, now I am running with another issue. When Trying to run the dada2 command, it's trying to save the output in another path that is not recognized. I am using QIIME2 in Linux server.

[ida.pantojafeliciano@natia0ndetlnx ~] source activate qiime2-2017.9 (qiime2-2017.9) [ida.pantojafeliciano@natia0ndetlnx ~] cd QIIME2_Analysis_Resistant_Starch_samples_IDA/
(qiime2-2017.9) [ida.pantojafeliciano@natia0ndetlnx QIIME2_Analysis_Resistant_Starch_samples_IDA] ls DemultiplexSum_Vis.qzv mapping_file_resistant_starch.csv QIIME2 Analysis Notes-Resistant Starch-Ida.txt gg-13-8-99-515-806-nb-classifier.qza NatickFastqManifest_ResistantStarch.csv TrimRSFastqs.qza (qiime2-2017.9) [ida.pantojafeliciano@natia0ndetlnx QIIME2_Analysis_Resistant_Starch_samples_IDA] qiime dada2 denoise-single --i-demultiplexed-s eqs TrimRSFastqs.qza --p-trim-left 0 --p-trunc-len 124 --o-representative-sequences rep-seqs-dada2.qza --o-table ASV-table-dada2.qza
Traceback (most recent call last):
File "/root/miniconda3/envs/qiime2-2017.9/bin/qiime", line 6, in
sys.exit(q2cli.main.qiime())
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/commands.py", line 185, in call
arguments, missing_in, verbose, quiet = self.handle_in_params(kwargs)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/commands.py", line 257, in handle_in_params
kwargs, fallback=cmd_fallback
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/handlers.py", line 302, in get_value
return qiime2.Artifact.load(path)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/sdk/result.py", line 62, in load
archiver = archive.Archiver.load(filepath)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/core/archive/archiver.py", line 296, in load
rec = archive.mount(path)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/core/archive/archiver.py", line 198, in mount
root = self.extract(filepath)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/core/archive/archiver.py", line 209, in extract
zf.extract(name, path=str(filepath))
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/zipfile.py", line 1335, in extract
return self._extract_member(member, path, pwd)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/zipfile.py", line 1399, in _extract_member
shutil.copyfileobj(source, target)
File "/root/miniconda3/envs/qiime2-2017.9/lib/python3.5/shutil.py", line 82, in copyfileobj
fdst.write(buf)
OSError: [Errno 28] No space left on device

How can I know the default path/directory and how I can change it to my directory? It is a QIIME issue or a Linux issue?

Thanks again!
idapantoja

colinbrislawn · July 16, 2018, 5:45pm

Hello again Idapantoja,

Here's the new error:

OSError: [Errno 28] No space left on device

This could be an issue with remaining space on your home directory, or qiime using a small tmp directory that got full. Or maybe the linux server is running out of room.

We can see what the Qiime devs suggest, but this might be a good time to ask the people who run the linux server if they have any guidelines about disk space, or if they know how to fix this problem.

Colin

ebolyen · July 16, 2018, 6:16pm

From this it looks like the TMPDIR being used is too small for the archive. I would get in contact with your sysadmin like @colinbrislawn suggests and then once you learn a more appropriate location for large temporary files set the TMPDIR variable with export TMPDIR=/some/path/your/sysadmin/gives/you

idapantoja · July 31, 2018, 11:10pm

Hi Colin!

Thanks for the response. Apologies for not getting back to you sooner but I am using a Linux Server and I have been facing a lot of other issues, including power outage. But finally I ran today 2 samples for the dada2 command. It took me an hour! I have 85 samples, which be days of run! What do you suggest? Can I run smaller sets of samples and in some way combine the ASV table without affecting the sequencing error?

Thanks so much!
idapantoja

colinbrislawn · August 1, 2018, 5:53pm

Hello Idapantoja,

I'm glad your linux server is back up and running.

I'm also glad you got dada2 running on two samples. Now that we know it works, let's see if we can make it faster.

In the documentation for dada2 denoise-single, there are few settings we may want to change.

The first is --p-n-threads, which will let the plugin use multiple cores of your machine. Are you already using this? How many cores does your linux server have and how many were you using for the hour long run?

The second is --p-n-reads-learn, which is the number of reads to subsample when training the dada2 error model. It's 1,000,000 (1 million) by default, but you could lower it to 0.1 million and it should still do OK.

So if you combine the two commands together, you could use 10x more threads and 10x fewer reads for training, and hopefully see a 100x speed-up!

Now that dada2 is working, play with these setting using the two samples, and see what increases the speed for you.

Keep up the great work! :qiime2:

Colin

idapantoja · August 1, 2018, 6:29pm

Hi Colin!
Thanks so much! This is super helpful and gives me hope to analyze my data! :slight_smile

I am not using the --p-n-threads yet. How do I know how many cores the linux server have?

I will try the --p-n-reads-learn with 0.1

Thanks again!!
idapantoja

thermokarst · August 1, 2018, 6:32pm

Talk to your sysadmin, they should be able to help you out with this.

idapantoja · August 1, 2018, 8:05pm

Hi!!
Great news!
They said 4 cores and that the OS may be able to see 8 based on hyperthreading. I tried both with 100000 reads and took 15 minutes for the 2 samples. Much better!! Can I go with 10000 reads and still ok?

Thanks a lot!!
idapantoja

colinbrislawn · August 1, 2018, 8:14pm

Hi idapantoja!

Good!

Sure, try it! Also add in the --p-n-threads 8 flag and see if that speeds it up more.

For testing methods, you could try running all your samples using --p-n-reads-learn 1000 or 10000 just to see how long the run takes. Using more reads will improve error detection (and error correction!), so I think it's worth using the higher number like 1,000,000, even if that means running the plugin overnight.

Keep up the good work.
Let us know what you find!
Colin

idapantoja · August 3, 2018, 8:35pm

Hi Colin!

Just to let you know that I was able to run the command for the 85 samples using 100,000 reads. It took around 10 hours! So grateful for your help!! Thanks so much!
idapantoja

system · September 4, 2018, 2:35am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.