Is unrestricted taxonomic assignment better for estimating numbers of genera?

Hi @Jan_Kollar,

I suggest reading through this post to get a better understanding of how the classifier works.

My off-the-cuff response would be no. The classifier is unsure of what these sequences are (at your confidence setting)..., it has no idea what genera these sequences belong to. Many of these unclassified genera might belong to the same genera, or they may belong to different genera.

That is, your results would be similar to taking the top BLAST hit. Which is not good, as the top hit (one genera) might be statistically the same as the 200th hit (another genera), and your genus-level taxonomic assignments would be arbitrary. Leading to inadvertent lumping or splitting of your ASVs by this spurious taxonomy. Then further conflated by the biases of the representatives in the database.

In this case, your estimate of the number of genera is suspect. Especially, given that the taxonomy of today, is not going to be the taxonomy of tomorrow.

One approach would be to summarize the number of ASVs per family, or class, or order, etc... Akin to the old genus / species ratios. To approximate some semblance of relatedness of the ASVs, you can even cluster them at 99, 98, or 97 % and use a ratio of OTUs per Family, etc. :man_shrugging:

3 Likes