I see that the Vsearch steps (using the MetONTIIME pipeline and ONT reads) are taking quite a long time, and was wondering if we could replace vsearch (and or blast) by minimap2 in the different sequence comparison steps of the pipeline to speedup this now very slow analysis (15h for 4x10k ONT reads on a 88cpu 512 GB RAM server because parallelisation is limited to few steps).
When we could substitute minimap2 to vsearch, I would expect to see a real speedup already seen in isoseq analysis both for ONT and PacBio reads where a read clustering step is also applied, but I lack deep knowledge of the code to implement this myself because mainly of the difficult to read different qz intermediate files.
As far as I can tell, the MetONTIIME pipeline uses vsearch for two steps; vsearch dereplicate-sequences vsearch cluster-features-de-novo
I'm think dereplicate-sequences is about as fast is it can get...
Minimap2 should be able to accelerate the search part of 'clustering through search', but I'm not sure if it supports clustering directly.
But you could easily replace classify-consensus-blast with classify-consensus-vsearch for a speed gain. (I bet you could try this modification yourself!)
Thanks for that great link @colinbrislawn! where can I find a primer to clone/convert the existing vsearch modules and use MMseqs2 instead. Is this even something possible? (I am a bash user, no python'ner)
Now that you have a Qiime 2 mmseqs2 plugin, you could replace that section in the MetONTIIME bash script with a call to the new plugin.
Building plugins is non-trivial. But editing that bash.sh script to replace classify-consensus-blast with classify-consensus-vsearch
should be pretty easy to try!
Thanks Colin, I was already thinking that way. The problem is me not being a pythonner and from what i see this is pretty pythonish. I will have to wait that someone else produce the modules i am afraid....
I agree this should be an easy switch since it is just altering the bash script, though perhaps we can recommend @MaestSi to swap these methods in MetONTIIME? classify-consensus-vsearch is really the better of the two methods in my opinion since it can be run in parallel and because I have added all sorts of features recently that just make it more useful than classify-consensus-blast, e.g., exact matching, choosing top hit only for consensus assignment, etc.
Hi, of course this should be a very easy switch to implement. I was not completely convinced about that since I am not very familiar with the differences between the two algorithms, but since you are recommending it, I will implement it too in the coming weeks, leaving to the user the possibility to choose.
Hi @splaisan, did your Vsearch run suceed? I implemented the possibility to choose between Blast and Vsearch in MetONTIIME.sh script, and set Vsearch parameters as similar to Blast's as possible. If you want, you can modify them, of course. Let me know if you try it out and perform any comparisons.
Simone
Hi Simone. I had to kill my run as it was running very slow. The probable explanation from Colin can be read here using database indexing for vsearch - #2 by colinbrislawn
I will have a look to the code but fear that my skills in python will not be good enough to add extra arguments to the wrapper. I will try though