Greengenes assigning wrong family to genus level

Hey guys,

I'm comparing outputs of taxonomic classification from kraken2, greengenes (gg-2022-10-nb-classifier.qza) and silva (silva-138-99-nb-classifier.qza). I'm using qiime2 version 2023.5. I runned them for the same dataset.

My problem is: at family level, both kraken2 and silva have identified f__Prevotellaceae, only greengenes didn't identify; but then at genus level all of these 3 have identified Prevotella. Altough, looking at ncbi taxonomy it says that for being classified as Prevotella it needs to be classified as Prevotellaceae at family level, and for genus Prevotella, greengenes classified it as family Bacteroidaceae (f__Bacteroidaceae; g__Prevotella). How can it be? Is it wrong? I don't know if i'm doing something wrong, here follows the commands I used (to compare percentage of reads):

I have seen this other topic greengenes taxonomy: bacteria belonging to same genus but in different family but now I wonder if it's a thing of greengenes.

There follows the commands I used:

#greengenes

qiime feature-classifier classify-sklearn
--i-classifier gg-2022-10-nb-classifier.qza
--i-reads rep-seqs-dada2.qza
--o-classification taxonomy.qza

#merge

qiime feature-table group
--i-table table-dada2.qza
--m-metadata-file sample-metadata.tsv
--m-metadata-column Host_disease
--p-mode sum
--p-axis sample
--o-grouped-table grouped-table.qza

#putting in each taxonomic level:

#class

qiime taxa collapse
--i-table grouped-table.qza
--i-taxonomy taxonomy.qza
--p-level 3
--o-collapsed-table collapsed-table-l3.qza

qiime feature-table relative-frequency
--i-table collapsed-table-l3.qza
--o-relative-frequency-table percentagetable-l3.qza

qiime metadata tabulate
--m-input-file percentagetable-l3.qza
--o-visualization percentagetable-l3.qzv

#order

qiime taxa collapse
--i-table grouped-table.qza
--i-taxonomy taxonomy.qza
--p-level 4
--o-collapsed-table collapsed-table-l4.qza

qiime feature-table relative-frequency
--i-table collapsed-table-l4.qza
--o-relative-frequency-table percentagetable-l4.qza

qiime metadata tabulate
--m-input-file percentagetable-l4.qza
--o-visualization percentagetable-l4.qzv

#family

qiime taxa collapse
--i-table grouped-table.qza
--i-taxonomy taxonomy.qza
--p-level 5
--o-collapsed-table collapsed-table-l5.qza

qiime feature-table relative-frequency
--i-table collapsed-table-l5.qza
--o-relative-frequency-table percentagetable-l5.qza

qiime metadata tabulate
--m-input-file percentagetable-l5.qza
--o-visualization percentagetable-l5.qzv

#genus

qiime taxa collapse
--i-table grouped-table.qza
--i-taxonomy taxonomy.qza
--p-level 6
--o-collapsed-table collapsed-table-l6.qza

qiime feature-table relative-frequency
--i-table collapsed-table-l6.qza
--o-relative-frequency-table percentagetable-l6.qza

qiime metadata tabulate
--m-input-file percentagetable-l6.qza
--o-visualization percentagetable-l6.qzv

Also, when running silva this error appeared about 4 times but qiime2 normally generated its normal output of taxonomy:

Message from syslogd ... nnot exec /etc/apcupsd/apccontrol changeme: No such file or directory qiime2

The command I used:

qiime feature-classifier classify-sklearn
--i-classifier silva-138-99-nb-classifier.qza
--i-reads rep-seqs-dada2.qza
--o-classification taxonomy.qza

But what worries me is that at genus level, both kraken2 and greengenes were able to classify at g__Enterobacter (kraken2), g__Enterobacter_B_683926 (greengenes) but silva only classified at family level f__Enterobacteriaceae;__, but I wonder if that error can affect somehow taxonomic classification at genus level.

Thank you in advance!!

1 Like

Hi @Liviacmg,

Welcome to the messy world of taxonomy! :slight_smile:

There is not necessarily a problem with the different classifiers per se. However, the various reference databases often follow different nomenclatural rules, and implement different taxonomic schemas. This is often confounded by the increasing pace in which microbial taxonomy is changing and being updated. It is quite difficult for the groups that curate these different databases to keep up. Especially, with the decline of formally trained taxonomists these days. :slightly_frowning_face:

There are also taxonomic inconsistencies within various taxonomic groups of a single database too. See this example from our RESCRIPt tutorial.

Again, it has more to do with how often the databases are updated and curated rather than capabilities of a classification algorithm. It is the old "garbage in garbage out". Your classification is as only as good as your reference database. :bar_chart:

This is why we even provide tools like RESCRIPt, to help users better curate their reference databases. Especially if they know more about the taxonomy of the groups they are interested in, e.g. fix an incorrect reference taxonomy.

Sadly there is no simple fix to this...

In addition to the Kraken and SILVA databases I'd also suggest using Greengenes2.

3 Likes

Hi @Liviacmg,

Thanks for the summary! It's important to note that NCBI is not an authoritative taxonomy per their own disclaimer: "Disclaimer: The NCBI taxonomy database is not an authoritative source for nomenclature or classification - please consult the relevant scientific literature for the most reliable information." This can be found at the bottom of a taxonomy detail, such as this one.

Greengenes2 derives its primary taxonomy from GTDB which is actively curated, and guided by whole genome phylogeny. GTDB has revised Prevotella to fall under Bacteroidaceae (see e.g., Prevotella ruminicola), which is why we see that association within 2022.10.

With respect to the classification question, the algorithms and data in the databases differ so it isn't unexpected to see a difference in results.

Best,
Daniel

2 Likes

Hi @SoilRotifer,

Thank you so much for your reply! I didn't know about RESCRIPt, and I already used greengenes2.

Best regards,
Lívia

Hi @wasade ,

Thank you so much for your reply!!! Now I understand why greengenes2 classified genus Prevotella as family Bacteroidaceae. Do you know if Silva and Kraken2 (default database) derives its taxonomy from ncbi taxonomy?

Best regards,
Lívia

Hi @Liviacmg,

Great!!

I'm unsure of the curation processes for those resources

Best,
Daniel

You'll have to read the respective database documentation. Here is the link for the SILVA taxonomy. I think this should help get you started on figuring out how the Kraken devs curate their databases.

2 Likes

Hi @wasade,

Ok, thank you once again!

Best regards,
Lívia

1 Like

Hi @SoilRotifer,

Thank you so much!!

Best regards,
Lívia

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.