diversity differences using pretrained classifiers

Melisa_Olivelli · June 16, 2020, 3:31pm

Hi Everyone!
I performed a taxonomic classification using a trained classifier by my own. I used the gg-13-8 97% (I can’t train a 99% classifier because of memory problems). I obtained a really nice barplot. But then I used the pretrained gg-13-8-99-515-806-nb-classifier.qza (which matches with my primers) and obtained a barplot with more species level classification but with a lot less diversity. I wanted to use a pretrained SILVA classifier to compare but a message error from memory keeps appearing. Any suggestions on which data should I use? Because I thought a trained classifier from my own would be the best, but I have more diversity with a 97% than with a 99%. I am pretty new with this.
Thanks!

jwdebelius · June 16, 2020, 4:09pm

Hi @Melisa_Olivelli,

Okay, i might be a stickler, but I’m not sure what you mean by “more diversity” on the barchart. (Barcharts don’t display diversity, they show you composition? But if we’re being sticklers, alpha/beta == diversity & composition == composition. Collapsed diversity is probably less informative than feature-level diversity?? You can’t do phylogenetic diversity on a collapsed table???)

I would look at the classification accuracy when you consider this: how well does your sequence classify? I also tend to stick to higher specificity references (99 vs 97) because they should be more precise. (That said, “species” is a whole weird thing in microbiology.) So, I guess, my suggestion is that you use a 99% classifier. I think you may have an easier time publishing with Silva, but its definitely bigger and more memory intensive.

I do not recommend training your own classifier if you don’t have to. It’s memory intensive and I’m all about doing memory intensive things on other people’s computers. I’m actually pretty firmly in the camp that you really only need to train your own classifier if (a) you’re using a custom/non standard database or (b) you have a region you’re specifically targeting and you’re going to re-use it. If you’re starting or you’re doing a one-off microbiome project or you dont have memory, a pretrained classifier you get from here is going to be your best bet.

Best,
Justine

Melisa_Olivelli · June 16, 2020, 5:25pm

Thanks! I meant a bigger number of taxa. But definitely I will stick to the 99% classifier. I would try the 99% pretrained SILVA but the computer that would allow to do this is at work, and we are still in lockdown these days…
Stay safe!

system · July 17, 2020, 11:25pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.