Does having multiple IDs point to the same species in the database affect taxonomy assignment?

Xio_Lee · January 14, 2020, 7:00pm

Hi there,

I have a custom database. In this database, there are cases where multiple unique gene bank IDs would have the same taxonomy. In general, I think it would not affect the taxonomy assignment but still I would like to hear your opinion?

Thanks,

Xiaoping

jwdebelius · January 14, 2020, 7:27pm

Hi @Xio_Lee,

In most databases, it's standard to have multiple reference sequences assigned to the same taxonomic group. Some has to do with problems in modern database construction (we're often missing resolution we'd like). Sometimes, it has to do with the fact that multiple sequences map to the same "species". Because taxonomy and phylogeny are hard and don't always line up. And bacterial sex is weird. And species definitions in bacteria are weird. (And if your database isn't bacteria, sorry. Phylogeny and taxonomy are still not always related, even in macro-organisms. We've been doing morphology for at least a century, but molecular phylogeny is far newer.) But, I digress.

If you're not comfortable with a shared taxonomic assignment, you could add a descriptive column at the end (probably level 7) that designates different features. HOMD, which contains a mixture of known isolates and OTUs does this, and I think it's brilliant in a curated database.

I suspect, though, that if you have too many closely related organisms, you may just not be able to distinguish accurately and may ultimately lose the resolution in classification. (I havent bench marked this, its just a hypothesis),

So, tl;dr Multiple unique reference sequences mapping to the same taxonomy is normal and fine.

Best,
Justine

lca123 · January 14, 2020, 7:31pm

That depends. I'll talk about my experience:
In terms of DNA sequence, it wont, because I global align with Vsearch the ASVs against the database. So, when a given read align to a given sequence in the database it has nothing to do with that "sequence id", but only with the nucleotides.
But, if I am not wrong, the qiime algorithm don't allow for equal taxonomies in the "FeatureData[Taxonomy]" object and will break down the alignment.

Cheers

lca123 · January 14, 2020, 7:36pm

And yes, as told by Justine, many ASVs can assign to more than one sequence in the database. In this case, qiime will try to decide which taxonomy is the "correct" and if you dig it you may happen to find a lot of possible taxonomies for that ASV.

Nicholas_Bokulich · January 14, 2020, 8:18pm

hmm... QIIME 2 does not have any issue with multiple feature IDs mapping to the same taxonomy annotation in a “FeatureData[Taxonomy]” artifact. As a matter of fact, that is very common to have (both in the reference databases posted on the QIIME 2 data resources, as well as in typical study results). QIIME 2 does not allow redundant feature IDs, however.

lca123 · January 15, 2020, 6:55pm

Thanks, Nicholas. Not sure if I well mentioned it and now I am confuse: can we have repeated ids in a FeatureData[Taxonomy], eg, same annotation, same access id etc?

Nicholas_Bokulich · January 15, 2020, 7:00pm

yes, same annotation is fine, but all IDs must be unique. Annotations can be anything as long as the ID is unique. Thanks!

system · February 16, 2020, 1:00am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.