the impact of demultiplexing on speed of dada2

msport469 · November 19, 2019, 1:41pm

I understand that I don’t need to demultiplex when I use manifest to import individual fastq files into qiime (at least that’s my understanding). However, does it speed up things downstream (dada2) if I were to demultiplex them after I import them? It is just taking forever and am trying to figure out why. Thank you.

thermokarst · November 19, 2019, 2:24pm

Hey @msport469!

When you import using individual (per-sample) fastq files, it already is demultiplexed. Nothing more to do there on that front. DADA2 can take some time - did you specify more than one computational thread? Did you run with the --verbose output? If so, what does the log say? If not, well, try to remember to run with that next time, that way you know what DADA2 is doing at any given point.

colinbrislawn · November 19, 2019, 3:16pm

When I'm finding part of the program very slow, I use a program to track which resources are being used by the computer. This is Task Manager on windows, Activity Monitor on OSX, or top on linux. This let's me see that something is happening, and check for things that would slow everything down too much, like running out of RAM and overflowing into swap.

With Task Manager / Activity Monitor open, I check to on my CPU and RAM usage. As long as CPU usage is high (close to 100% is good) and memory is low (close to 100% is very, very bad!!) I know my computer is working as fast as possible and I just wait for my processing to finish.

Let us know how this process goes.

Colin

msport469 · November 19, 2019, 3:21pm

Thanks,

That’s what I assumed, I just was wondering if the demultiplex step had within it a step that somehow shrank the files (while maintaining all that information) to make it go faster or something. Obviously if they are per sample fastq files they demultiplexed inherently haha.

Definitely should have specified verbose but alas I did not. I set the threads to equal zero which should maximize the amount of cores the program uses (which in this case is 24). Thanks for the help and advice.

msport469 · November 19, 2019, 3:22pm

Thanks, I’ll try to figure out how to do that. I’m on linux but the actual computing is done in the supercomputer cores so it may be a bit trickier to actually get at that information while the program is running.

msport469 · November 19, 2019, 3:25pm

Here is what the log is spitting out now:

There were some problems with the command:
(1/6) Missing option “–p-trunc-len-f”.
(2/6) Missing option “–p-trunc-len-r”.
(3/6) Missing option “–o-table”. ("–output-dir" may also be used)
(4/6) Missing option “–o-representative-sequences”. ("–output-dir" may
also be used)
(5/6) Missing option “–o-denoising-stats”. ("–output-dir" may also be
used)
(6/6) Got unexpected extra argument ( )
/var/spool/slurm/job10306364/slurm_script: line 12: --p-trim-left-r: command not found
/var/spool/slurm/job10306364/slurm_script: line 13: --p-trunc-len-f: command not found
/var/spool/slurm/job10306364/slurm_script: line 14: --p-trunc-len-r: command not found
/var/spool/slurm/job10306364/slurm_script: line 15: --p-n-threads: command not found

Even though all these things are specified in the code. Here is an example line of the code:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs '/scratch/msportie/mhp/import-files-11.12.19/import-mhp-nov8-SRR6714086.qza' \
  --p-trim-left-f 0 \
  --p-trim-left-r 0 \
  --p-trunc-len-f 0 \
  --p-trunc-len-r 0 \
  --p-n-threads 0 \
  --o-representative-sequences rep-seqs-dada2-11.13.19-SRR6714086.qza \
  --o-table table-dada2-nov13-SRR6714086.qza \
 --o-denoising-stats stats-dada2-SRR6714086.qza

and yes I’ve double checked the file paths, they’re correct.

colinbrislawn · November 19, 2019, 3:37pm

Thanks for posting that!

So the two types of errors are now Missing option "--p-trunc-len-f" and --p-trunc-len-f: command not found.

This is the kind of error I would expect if the \ at the end of each line were not recognized and the linux server thought you were running separate commands.

I’m not sure why the \ are not recognized… Maybe this is a question for the HPC people about they pass multi-line commands.

Colin

thermokarst · November 19, 2019, 4:13pm

The shell matters here --- its possible that a non-bash/zsh shell might not understand the escape character. An HPC environment is a prime place to run into a "weird" or non-bash/zsh shell.

msport469 · November 19, 2019, 4:14pm

It’s run on this system before. I did try using a node I’m not 100% sure I’ve used before though. I ran the exact same script (running dada2 on one of my 190 .qza files i need to run it on) using a different node and it worked, so I’m not sure what’s up with that?

system · December 20, 2019, 10:14pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.