can minimap2 be used instead of VSearch to cluster long reads faster

I see that the Vsearch steps (using the MetONTIIME pipeline and ONT reads) are taking quite a long time, and was wondering if we could replace vsearch (and or blast) by minimap2 in the different sequence comparison steps of the pipeline to speedup this now very slow analysis (15h for 4x10k ONT reads on a 88cpu 512 GB RAM server because parallelisation is limited to few steps).

When we could substitute minimap2 to vsearch, I would expect to see a real speedup already seen in isoseq analysis both for ONT and PacBio reads where a read clustering step is also applied, but I lack deep knowledge of the code to implement this myself because mainly of the difficult to read different qz intermediate files.

Thank for any lead

Hello Stephane,

x-link to suggestion as github issue

As far as I can tell, the MetONTIIME pipeline uses vsearch for two steps;
vsearch dereplicate-sequences
vsearch cluster-features-de-novo

I'm think dereplicate-sequences is about as fast is it can get...

Minimap2 should be able to accelerate the search part of 'clustering through search', but I'm not sure if it supports clustering directly. :cry:

But you could easily replace classify-consensus-blast with classify-consensus-vsearch for a speed gain. (I bet you could try this modification yourself!)


MMseqs2 supports clustering. :rocket:
https://github.com/soedinglab/MMseqs2

1 Like

Thanks for that great link @colinbrislawn! where can I find a primer to clone/convert the existing vsearch modules and use MMseqs2 instead. Is this even something possible? (I am a bash user, no python'ner)

Let's work backwards.

As far as I can tell, MetONTIIME is just a bash.sh script. You can view the full source here:
https://github.com/MaestSi/MetONTIIME/blob/master/MetONTIIME.sh

This script calls several qiime 2 plugins in order. You can see these in the .sh file.

One of these plugins is qiime vsearch cluster-features-de-novo, which use q2-vsearch to call the vsearch binary for clustering.


So now you would need to build a Qiime 2 plugin for mmseqs2.

There is full documentation on building Qiime 2 plugins, and of course you could base it off the vsearch plugin.

Now that you have a Qiime 2 mmseqs2 plugin, you could replace that section in the MetONTIIME bash script with a call to the new plugin.


Building plugins is non-trivial. But editing that bash.sh script to replace
classify-consensus-blast with classify-consensus-vsearch
should be pretty easy to try!

Colin

1 Like

Thanks Colin, I was already thinking that way. The problem is me not being a pythonner and from what i see this is pretty pythonish. I will have to wait that someone else produce the modules i am afraid....

That's fair. I haven't built my own plugin either.

However, you could edit the bash script to use vsearch instead of blast! I am 100% sure you can do that, and it will make that step a lot faster.

Let me know if you want to try it and need any help!!

1 Like

I agree this should be an easy switch since it is just altering the bash script, though perhaps we can recommend @MaestSi to swap these methods in MetONTIIME? classify-consensus-vsearch is really the better of the two methods in my opinion since it can be run in parallel and because I have added all sorts of features recently that just make it more useful than classify-consensus-blast, e.g., exact matching, choosing top hit only for consensus assignment, etc.

1 Like

Hi, of course this should be a very easy switch to implement. I was not completely convinced about that since I am not very familiar with the differences between the two algorithms, but since you are recommending it, I will implement it too in the coming weeks, leaving to the user the possibility to choose.

2 Likes

I did already start new run with vsearch this morning. Should end somewhere tomorrow...suspense :slight_smile:

2 Likes

Hi @splaisan, did your Vsearch run suceed? I implemented the possibility to choose between Blast and Vsearch in MetONTIIME.sh script, and set Vsearch parameters as similar to Blast's as possible. If you want, you can modify them, of course. Let me know if you try it out and perform any comparisons.
Simone

1 Like

Hi Simone. I had to kill my run as it was running very slow. The probable explanation from Colin can be read here using database indexing for vsearch - #2 by colinbrislawn
I will have a look to the code but fear that my skills in python will not be good enough to add extra arguments to the wrapper. I will try though

1 Like