As shown in the picture, there are three same genus(g__Clostridium), but why did they belong to different families? I use q2-classify-sklearn to perform taxonomy annotation. Is it normal? Can I combine this thress genus(g__Clostridium) in downstream analysis such as differential abundance analysis?
Hi @Zhanzhan, great question.
This is because Clostridium is polyphyletic, the taxonomy is a mess, many genera historically assigned to Clostridium (e.g., using traditional techniques) do not place in that genus based on molecular phylogenies.
Nothing to do with the classifier, everything to do with the reference taxonomy that you are using — the classifier can only operate as accurately as the training data you feed it.
I would discourage it — these are distinct clades, even if the names are misleadingly similar. If you want to disambiguate these, you could label both the family and genus that they are assigned.
@Nicholas_Bokulich OK, I got it！Thanks！
What was the reference database ?
@Jaroslaw_Grzadziel Hello! I used Greengenes database.
Hi, thank you for a reply
I would recommend to use one of most up-to-date reference database, like RDP, Silva or GTDB.
Greengenes is outdated and should be forgotten
The database doesn’t solve the core issue, though, which is genus Clostridia is a polyphyletic clade. The GTDP update helps, but you’re fighting against set taxonomic names that existed before we got good at molecular phylogeny.
So, perhaps a second option is to embrace your ASVs and work with the molecular barcodes.
@Jaroslaw_Grzadziel Thank you for your advice. But if RDP can be used in qiime2? Silva demands high ram and my device can not work if I use silva.
@jwdebelius I know it. Thank you very much!
There are ways around this, see this post and the following post in the same topic: