classify-consensus-vsearch run time taking forever

lucky_endophyte · April 22, 2024, 10:37pm

Hi there,
I'm currently running a large number of MiSeq merged sequences that are about 350 in length using classify-consensus-vsearch command for taxonomy assignment. I'm using the current UNITE eukaryote database, and have easily over 200,000 OTUs. So far it's been about 65 hours and hasn't finished yet (forgot to put --verbose at the end), but this isn't even my largest data set -- I have a set with over 60 samples and expect the OTU count to be higher.

I am running them on my Macbook Pro M2 with 8gb RAM.

Is there anyway to speed this process up that won't end up using too much RAM and killing it? I want to use the --p-threads NTHREADS but I'm not sure how this works and how many threads my laptop can take.

Here is what I'm running:

qiime feature-classifier classify-consensus-vsearch
--i-reference-reads unite-ver9_97-eukseqs_2024.qza
--i-reference-taxonomy unite-ver9-97-euktax24.qza
--i-query filtered-rep-seqs-native.qza
--o-classification vsearch-taxeuk-native.qza
--o-search-results taxeuk-single-end-native.qza

Thanks so much!

cherman2 · April 22, 2024, 10:54pm

Hi @lucky_endophyte,
You can run to see how many cores you have and then you could use that many threads for this command:

sysctl -n hw.ncpu

However, with only 8GB of ram I am not sure that you will be able to speed up the process without killing it.

timanix · April 23, 2024, 8:34am

Hello!
In addition to @cherman2 recommendations, another option would be to filter your rep-seqs.qza file.
Usually I filter my feature table to remove OTUs/ASVs that are found less than 10 times and in less than 2 samples. Then I filter representative sequences file based on feature table. This step reduces amount of sequences that should be annotated, speeding up the process.

Best,

lucky_endophyte · April 23, 2024, 4:40pm

Thank you! I only removed singletons but am planning on removing more, that's for sure. How do I remove sequences that are found in less than two samples? And can you clarify wht you mean about filtering rep sequences file based on your feature table? Sorry for all the questions, I'm a complete newbie.

timanix · April 23, 2024, 5:34pm

Did you check already filtering tutorial? There are a lot of examples how to filter feature table! For representative sequences, it is not in the tutorial but if you will take a look on the plugin documentation for sequences filtering, you will see that it can take as input feature table and filter sequences based on it.

I don't think that you are newbie already, looks like you are doing great!

Best,