Hi @SunScript0,
Welcome to the :qiime2: forum!
This is going to be a long answer, just poured myself a new cup of
(and encourage you to join me in a beverage), because it's a big conversation.
In 16S analysis, we essential operate on this assumption that a specific molecular fingerprint reflects a set of functions and interactions in an ecosystem with phenotypic consequences for the overall organism. With 16S, we're making this assumption based on a phylogenetically identified fingerprint from a universal tree of life, and we're essentially saying "closer evolution, closer function/genome." With a taxonomic assignment, you're saying "same genus, similar function".
I think the other assumption is that by naming things we give them meaning. It's a rain afternoon, I have my aforementioned tea, but I think this is maybe more a philosophical question that it needs to be. I'd argue that community measurements in and of themselves, without any taxonomic naming, can be informative. It certainly can help to contextualize behavior (that whole "same genus, similar function" thing often works), but I can ask all sorts of questions about my data without every annotating the sequences.
I also think we need to talk about some of the challenges with annotation, even in relatively well characterized enviroments. There are some "everyone knows, no one talks about" issues in microbiome taxonomy to think about based on the interpretation/assignment.
First, taxonomy in and of itself is incredibly messy, even in macroscopic organisms. A phenotypic-based characterization is great... right up until you start discovering things like the fact that dinosaurs might have had features and therefore, chickens
are more closely related to dinosaurs
than reptiles are
, despite what you might have been told in school. (Also, this seriously puts the Emu War into a slightly scarier context).
Using phenotypic morphology as a guide can sometimes be super useful, and sometimes you can miss key traits or end up with wierd things in convergent evolution.
And this is a problem before we get into a set of organisms that are really hard to grow in captivity and which don't do nice sexual or asexual reproduction. So, what are issues in bacteria, specifically:
These issues don't mean that I think you should never taxonomy: there's still utility in the assumptions you can make with taxonomic assignments. It just means that if you're looking for taxonomy to save you from noise, you might be sorely disappointed and not solving all your intended problems.
Personally, I like to work at an ASV or OTU level, because there's a lot of interesting things that happen within genera and smaller clades. There are a lot of examples of niche competition by closely related species, or where specific species/strains are related to an outcome of interest. (I recently co-authored a paper showing a single nucleotide difference in an ASV drove community cases and related to a cancer diganosis. It's not the only example, but it's one.) I think if you want to make the collapsed assumption, you can, but that you need to knwo wyou're missing information.
You probably could come up with something, but because taxonomy ≠ phylogeny, it likely won't be as good. In an ideal world, you could, but not right now. So, you can either chose to work on an uncollapsed table or you could stick to non-phylogenetic metrics, both are good choices.
For de novo processes, they're really different steps: you can't map taxonomy until you have representative sequences. Closed reference OTUs are assigned taxonomy in the clustering step: they inherit the label of their cluster.
Tutorials are often written with pedagogical goals/a planned workflow in mind; the order presented in a tutorial may or may not be the order in which people actually work. (Personally, I tend to assign taxonomy once I have representative sequences, since I view it as a processing step, but there are as many views on this a there are analysts.)
I hope this helps.
Best,
Justine