I've been using the GTDB tree/kraken 2 table recently. The genome IDs contain underscores in their names. When i import my table into QIIME as a FeatureTable[Frequency] (so biom under the hood), the underscores in the name are preserved. When I import the tree (Phylogeny[Rooted]) with tip IDs that contain underscores, the underscores are replaced by spaces. This may be a scikit-bio, python API specific quirk, but it's darn obnoxious. If it's more appropriate as a scikit-bio issue, I'm happy to take it there, but QIIME 2 is where the integration breaks.
Let me know if example code, etc would help; I'm happy to share.
There is a mechanism to escape any underscores so that they aren't replaced with spaces, maybe that'll help get you moving in the right direction? Keep us posted!
PS
I agree! We should think about some tooling that might help with this - perhaps an import format that applies escaping rules prior to import?
I will pay more attention to the escape on import with scikit-bio/the python API. It would be nice for q2-cli to maintain underscores upon import; there's always the possibility that I'm an edge case, but I don't think so?
In the meantime, I just have to remember to rename all my feature IDs.