greengenes database


Can anyone please telll me the differences between these 2 classifiers?

1 Like

Hi @annsantos!

You actually only have pointed to one classifier (the top arrow). The bottom arrow is pointing to the reference database, used to train that feature classifier above. You can learn more about the reference database here:

1 Like

which one should i use?

For what? You haven’t told us what it is that you’re doing.

I want to do a taxonomic classification. I have samples from saliva and i want to know each taxonomic levels.

What type of sequences are these? 16S? The answer will impact your choice in reference database to use for taxonomic classification.

yes 16 S region v3 v4

Hi @annsantos,
Note the headings on the page in question: the first arrow points at the pre-trained naive Bayes classifiers that are ready to be used for classification with the command qiime feature-classifier classify-sklearn.

The second arrow is not pointing at classifiers of any sort, but rather the marker-gene reference databases that are used to train the classifiers above. These reference sequences can be imported then either used to train your own classifier with qiime feature-classifier fit-classifier-naive-bayes, or used directly for alignment/consensus taxonomy classification with classify-consensus-vsearch.

Since you have V34 16S data, you can either use the full-length pre-trained classifiers on your data, or train your own classifier focused on the V34 region (see the online tutorial for training your own classifier).

Good luck!

I don’t want to train mine. So I should use the first one that I pointed out?

correct, use one of the pre-trained classifiers

might I note: training your own classifier could be highly beneficial. I have used the HOMD database in the past for saliva bacterial classification with q2-feature-classifier; it is an oral microbiome-specific database and using it together with q2-feature-classifier will increase likelihood of species-level classification. Worth comparing vs. greengenes or silva.

If you’re up for a challenge (that will increase species classification accuracy even more) you could also try training a classifier with oral microbiome-specific taxonomic weights, as described in this tutorial:

You could follow that tutorial with either the HOMD or greengenes or SILVA sequences. See the “more exotic weights” section and swap out this line:

redbiom search metadata "cheese where cheese_type=='stilton'" > sample_ids

for this:

redbiom search metadata "where host_taxid==9606 and sample_type in ('Oral', 'oral', 'Mouth', 'mouth', 'Saliva', 'saliva')" > sample_ids

Yes, you are right. I am going to try to do your two options.
Starting with the first one, i would have to download the HOLD database and then? I find the tutorial q2-feature-classifier a little bit confused

1 Like

Yes — you would follow this tutorial with the exception that you download and format the HOMD database, rather than download greengenes.

Note — the HOMD database may need to be formatted appropriately (see the greengenes files or the examples in that tutorial as an example of the appropriate formats). It looks like there are QIIME-formatted taxonomy files here but since these are released by HOMD I can’t guarantee that they will work “as is”…


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.