gg-13-8-99-515-806-nb-classifier.qza is not up-to-date

Hello, I hope you are doing well.

I have been able to analyse my data but I noticed that at the genus level, I have "UNCLASSIFIED CLOSTRIDIACEAE" which is a family name.

I suspect that the classifier I used (gg-13-8-99-515-806-nb-classifier.qza ) is not up-to-date... so if anyone can help with the latest version of classifier, I would greatly appreciate that.

Thanks again.

You are correct!

We have a new page https://resources.qiime2.org/ that has an updated version of Greengenes2 along with SILVA, GTDB, and other databases, all trained for use with the nb-classifier.

Let us know if you have more questions!
(I split this into a new thread, as that old one is pretty old)

Hello Colin
Thanks for your reply. I went through the link you provide. It helped me to understand more about databases... I have few questions indeed to get me set up.

  1. Are databases and classifiers differents?
  2. I am working with V3 and V4 region for human gut microbiome, should I use a classifier for gut microbiome specifically?
  3. What will imply using either Silva or Greengene2 classifier?

Hi @kevnael,

Hope I can help to clarify a few items here. You are correct that databases are distinct from classifiers -- there are many databases, and many different ways of performing taxonomy classification.

A comprehensive review of different strategies, which ultimately informed q2-feature-classifier, can be found here. Environment specific weights for Naive Bayes classifiers can improve classification as well. If your data are 515F-806R, you can additionally consider use of the phylogenetic taxonomy in Greengenes2.

Both SILVA and Greengenes2 are well characterized and well used resources. Historically, SILVA has been more informative in marine environments, although I'm unsure whether there is an independent assessment which is up-to-date. Greengenes (1 and 2) express taxonomy off of a de novo phylogeny, and in Greengenes2 specifically, the taxonomy is explicitly phylogenetic. It is important to appreciate though that no database is perfect, all databases suffer from error and bias, and taxonomy is a human construct that changes over time.

As to whether these choices matter? They may matter a lot or very little, it ultimately depends on the questions being pursued. In the absence of a specific reason to choose a particular database, or a particular algorithm for classification, a defensible approach would be to follow the methodology of recent publications which are adjacent to your research focus (with the caveat that using the latest version of a resource is a good idea).

All the best,
Daniel

1 Like

Thanks Daniel.

I think i may need help here. What happened is that I need to train the silva database because the version of scikit-learn (1.4.2) is not matching with the version (0.24.1) used to generate the artifact. Following instructions from the resources, I have tried training my classifier but it requires a lot of computing capacity. My RAM is a 16gb.

Please, is there not a classifier that I can directly use in my situation? or a way to train my classifier with the capacity I have.

Thanks again for considering all my requests.

Hi @kevnael, There are different versions of the pre-trained classifiers on the page that @colinbrislawn shared earlier (https://resources.qiime2.org/). The specific version of sklearn used for training is noted for each classifier, so you should be able to find the one that will work for you.

1 Like

Hello greg.

Thank you for your feedback.
I have tried this one greengenes2/2022.10.backbone.v4.nb.sklearn-1.4.2.qza on the command below.

qiime feature-classifier classify-sklearn
--i-classifier 2022.10.backbone.v4.nb.sklearn-1.4.2.qza
--i-reads asv-sequences-0.qza
--o-classification taxonomy.qza

after a while, I got a message saying KILLED.

I don't know what that mean.

  1. my 16gb can not handle it?
  2. did I use the pre-trained classifier in a wrong way?

Hi @kevnael, This is almost certainly a memory issue. If you can run this on a system with more RAM (I think 64 GB should do it) that should address the issue.

Hello Greg

Totally right. I asked some one to train it on his computer and it worked.
I shall test it in a moment.

Thanks for all the insights.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.