Time running classify-consensus-blast: BLAST+

Hello, I m using classify-consensus-blast+ for one paired end sample at a time. I noticed that sometimes everything is fine and it does not take long to finish, whereas some times even when I use the same sample it takes very long, more than one hour and does not finish. I checked the monitor when the process took long and noticed that the use of cpu is very low, I gave 4 cpu in my vm and most of the time only 2% to 5% is used for each cpu, the maximum use being about 20%. Is there something I can do to make it faster?
Thank you very much for the help

Hi @Angelica,

a few things.

This is the first problem: right now classify-consensus-blast cannot be parallelized.

Use classify-consensus-vsearch for an alternative alignment-based + LCA classifier in q2-feature-classifier that supports multithreading.

You may be introducing redundancy if these samples contain many of the same features, since it means that you are effectively classifying the same sequence over and over again (and getting the same answer!). You should merge all of your FeatureData[Sequence] artifacts into a single fasta to use as input to q2-feature-classifier. This will be much more efficient, unless if each sample for some reason contains all unique features.

Note that the classify-consensus-* methods have two stages: the alignment stage (using blast+ or vsearch) and then q2-feature-classifier performs the LCA consensus classification based on those alignment results. Even if you run classify-consensus-vsearch with multiple threads, the LCA part will still be run as a single job. That step does not usually take too much time (compared to alignment) but it does mean that checking your CPU usage might not be too revealing if you check while the LCA step is running.

I hope that helps!

1 Like

Thank you very much for the help! I will try both :relaxed:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.