I'm wondering how, if at all, people clean up taxonomic annotation from databases before publication? If not, why not? If you do, what do you clean and how?
As a more concrete example, if I decided to develop a Greengenes or Silva-like label for Batman characters because I am a nerd, how would you clean up these taxa?
I cleaned a little bit taxonomy to remove some symbols like '' from your example and usually I am labeling like 'Wayne_uncultured_superhero' from your last example. But I keep original taxonomy, creating additional labels according to last taxonomy unit available to ASV table.
Thanks @timanix! Part of the reason I ask is because Im trying to write a script to clean automagically and so I want to figure out waht the optimal cleaning is for many people.
Pop culture themed test code and examples are a critical part of my development cycle.
Does is make a difference if I mention that means contested. So like, [Batfamily] is actually "contested Batfamily". (It took me years to learn this)?
It is why I am always keeping original taxonomy. I need to clean my labels because when I feed labeled ASVs to tree constructing plugin it complains about numerous 'wrong' symbols. Now I got what you want to do. In Silva database, there is very annoying thing with some symbols - some times you can't find or replace them with a script. Solution was to run first
Unless a PI or coauthor asks to remove them, I usually keep them in. While they could be considered a database artefact, I don't mind have unambiguous ranking labels in all my taxa strings.
Because I make most graphs with ggplot2 and Phyloseq, here is how I would remove the prefixes:
Thanks! I'm struggling with whether or not i should write a standardized database tidying script for a plugin Im working on. And, like, whether I should add a "uncultured" or "unspecified" or "ambiguous" label to the inherited label. So, it sounds like people (you) are already doing it, but that its not systematic. And, while it's not that much more work (and lets me procrastinate on something I dont want to write), if its not going to be useful, its probably not worth doing. Ive got my own ever-growing collection of taxonomy cleaning notebooks stashed across three or four file systems, but given how suprisingly easy it has been to write a qiime2 plugin, it seemed worth considering here.