greengenes database

annsantos · October 11, 2019, 2:26pm

Hi,

Can anyone please telll me the differences between these 2 classifiers?

thermokarst · October 11, 2019, 2:31pm

You actually only have pointed to one classifier (the top arrow). The bottom arrow is pointing to the reference database, used to train that feature classifier above. You can learn more about the reference database here: https://aem.asm.org/content/72/7/5069

annsantos · October 11, 2019, 2:36pm

which one should i use?

thermokarst · October 11, 2019, 2:36pm

For what? You haven't told us what it is that you're doing.

annsantos · October 11, 2019, 2:38pm

I want to do a taxonomic classification. I have samples from saliva and i want to know each taxonomic levels.

thermokarst · October 11, 2019, 2:42pm

What type of sequences are these? 16S? The answer will impact your choice in reference database to use for taxonomic classification.

annsantos · October 11, 2019, 3:46pm

yes 16 S region v3 v4

Nicholas_Bokulich · October 11, 2019, 4:54pm

Hi @annsantos,
Note the headings on the page in question: the first arrow points at the pre-trained naive Bayes classifiers that are ready to be used for classification with the command qiime feature-classifier classify-sklearn.

The second arrow is not pointing at classifiers of any sort, but rather the marker-gene reference databases that are used to train the classifiers above. These reference sequences can be imported then either used to train your own classifier with qiime feature-classifier fit-classifier-naive-bayes, or used directly for alignment/consensus taxonomy classification with classify-consensus-vsearch.

Since you have V34 16S data, you can either use the full-length pre-trained classifiers on your data, or train your own classifier focused on the V34 region (see the online tutorial for training your own classifier).

Good luck!

annsantos · October 11, 2019, 5:35pm

I don't want to train mine. So I should use the first one that I pointed out?

Nicholas_Bokulich · October 11, 2019, 5:52pm

correct, use one of the pre-trained classifiers

might I note: training your own classifier could be highly beneficial. I have used the HOMD database in the past for saliva bacterial classification with q2-feature-classifier; it is an oral microbiome-specific database and using it together with q2-feature-classifier will increase likelihood of species-level classification. Worth comparing vs. greengenes or silva.

If you're up for a challenge (that will increase species classification accuracy even more) you could also try training a classifier with oral microbiome-specific taxonomic weights, as described in this tutorial:

You could follow that tutorial with either the HOMD or greengenes or SILVA sequences. See the "more exotic weights" section and swap out this line:

redbiom search metadata "cheese where cheese_type=='stilton'" > sample_ids

for this:

redbiom search metadata "where host_taxid==9606 and sample_type in ('Oral', 'oral', 'Mouth', 'mouth', 'Saliva', 'saliva')" > sample_ids

annsantos · October 11, 2019, 6:00pm

Yes, you are right. I am going to try to do your two options.
Starting with the first one, i would have to download the HOLD database and then? I find the tutorial q2-feature-classifier a little bit confused

Nicholas_Bokulich · October 11, 2019, 6:19pm

Yes — you would follow this tutorial with the exception that you download and format the HOMD database, rather than download greengenes.

Note — the HOMD database may need to be formatted appropriately (see the greengenes files or the examples in that tutorial as an example of the appropriate formats). It looks like there are QIIME-formatted taxonomy files here but since these are released by HOMD I can't guarantee that they will work "as is"...

system · November 12, 2019, 12:19am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.