Hey Guys, I’m just wondering if anyone is experiencing a huge speed difference between 2018.11.0 and 2019.1. I’m running nearly the same code on same data set with the only big difference being the update from 2018.11.0 to 2019.1. I also am working through the VirtualBox so the update that went along with that to handle the qiime update from VirtualBox v5.2.12 to v5.2.26. Allocating 5.5 GB of base memory for both.
All the plugins cooresponded to 2018.11 or to 2019.1 as well.
Ran in a few hours on 2018.11:::
qiime deblur denoise-16S
–p-trim length 292
Is going on 4 days on 2019.1 (Still Running):::
qiime deblur denoise-16S
–p-trim length 252
Data set info:
Sequencing of 16S rRNA from 48 samples produced 23,674,488 reads with a mean of 493,218.5 reads per sample (median 255,112.5; minimum 10,639; maximum 11,395,868). Following joining of paired reads with vsearch, we obtained 20,242,213 joined reads with a mean of 421,712.8 reads per sample (median 217,426.5; minimum 8,252; maximum 9,835,668).
Any thoughts on why the significant time difference?
Are you able to monitor
htop and see what processes seem to be running? Behind the scenes, Deblur relies extensively on
sortmerna. And Deblur itself is a Python program.
I’m not aware of any change to the codebase from 2018.11 to 2019.01 that would have any level of performance impact. It also isn’t exactly clear to me why reducing trim length would have such a large bearing on runtime. So am really curious what may be going on here.
Does Deblur (or it least the plugin) drop reads less than the trim length? That might explain the difference if before it was operating on significantly less data (I’m assuming quality-filter was used before to trim at some q-score threshold).
So it has finally finished running. No errors thrown and Deblur workflow appears to have worked in near identical fashion to the first run (taking into consideration the altered trim length). That being said, my largest read count sample performed much better in terms of filtering and chimera id through deblur than the first run which was my initial motivation for rerunning (with 9.7 million raw reads I had a hard time believing it didn't find a single chimera).
Maybe the initial run was the error and this new run was the correct output and timeframe considering the data?
Just to answer previous questions::
Dropped Reads with lower Trim Length:: The trim length appears to have no or very little change in output. Mean, Max and Min raw reads were identical in the deblur stats output
Quality filtering was identical between the runs using
qiime quality-filter q-score-joined \
(same demux-joined-filtered.qza serving as the input for deblur)
I reran the workflow (and killed early so I didn't have to wait again) so I could view htop while running. I'm not all that great at interpreting everything but attached below is what I'm seeing while it is running. I've lowered the allocated mem to 4.0GB during this run as multitasking with online browser for the host computer is mindnumbingly slow when I'm up at 5.5GB dedicated to the VirtualBox.
Yes, good call! Sorry for not catching that
Unfortunately, the top / htop output would be more useful if it’s a long way into the processing.
One possible concern is that operating at 5.5GB may simply be too little memory which could result in heavy swapping. Note that 20M reads at 250nt is approximately 4.8GB of memory. I don’t recall off hand if this mode of execution will attempt to read all the data into main memory, but that would be problematic here.
I’ll give it a try with this lower memory allocation and see if that improves performance speed. Always a possibility that host computer is the issue and nothing to do with the version update!
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.