Problem in assigning taxonomy using VSEARCH consensus taxonomy classifier

Hi there,

I am trying to assign taxonomy to a set of data generated from QIIME 1.9.1 following the above mentioned protocol (using VSEARCH consensus Taxonomy). It is running for almost 24 hours and still not generated any output. I am using the VBox version of QIIME 2. Could anyone please suggest how much more time would it require or any problem is there in the run?

Regards,
Aishiki

Hi @Aishiki,

That does seem like it is taking a while. How large is your dataset? (QIIME 1 has a tendency to create far more OTUs than QIIME 2, which has a better denoising process.)

Since you are running this in VBox, I would also double-check that you have enough memory and CPU allocated to the virtual machine. If your VBox instance is starved for compute-power than anything will run slowly.

pinging @Nicholas_Bokulich, @BenKaehler: any thoughts on expected runtime?

Thanks @ebolyen. Unfortunately, @Aishiki’s results fit with our latest test results.

We are currently recommending that users avoid using classify-consensus-vsearch for more than tens of sequences.

Fortunately, classify-consensus-blast gives very similar performance to classify-consensus-vsearch in terms of accuracy, but in our tests runs 50 times faster. If run time is still an issue, classify-sklearn was 500 times faster in our tests. There is a tutorial for how to use classify-sklearn here.

We haven’t had a chance to look at why classify-consensus-vsearch is so slow. @Nicholas_Bokulich may have more to say on this in future.

1 Like

Thanks to both @BenKaehler and @ebolyen for your suggestions. Fortunately at last the run was over after ~30 hours or so. Although has not yet been able to check the data. Will keep you updated once I check my data. Further I shall try again to run classify-consensus-blast.

1 Like

For tracking purposes we have an open issue to investigate why classify-consensus-vsearch is slow. Thanks!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

The performance issue with classify-consensus-vsearch was investigated and it turns out that vsearch is generally much slower with full-length sequences (see this issue for more details) so the performance is what would be expected. Thanks!