By the time we start filtering the table (to see what’s unique to that dataset) there isn’t any ASV information left. Everything is a taxonomy string. So in principle, you shouldn’t see any shared features between the two differences: unmerged - merged and merged - unmerged.
Hmm, but if the tables were collapsed at L7, isn't it possible there would be features (for want of a better word) that have the same taxonomic assignment (especially if not named down the species level) but are different enough to be classified as different species? Maybe I don't understand quite what is happening when they're collapsed. Should I actually have collapsed at a lower taxonomic level if I don't have species-level assignments for all the taxa?
Could you share the “unique” taxonomy strings that seem to be shared? I’m guessing they are subtly different in some way that the computer is seeing (such as a trailing ; or s__ or something) but which is otherwise not very meaningful.
Sure!
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Corynebacteriaceae;g__Corynebacterium;s__
vs.
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Corynebacteriaceae;g__Corynebacterium;s__
and
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Veillonellaceae;__;__
vs.
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Veillonellaceae;g__;s__
So having just done that, I can see that the second pair (Veillonellaceae) are actually not exactly the same because one has the 'g' and 's' and the other doesn't (would you mind explaining why this would happen?), but the first pair still look the same to me.
I'm pretty sure I used the pretrained Greengenes 13.8 99% feature classifier to assign taxonomies -- sorry I'm not at the office at the moment so can't check for sure.
Thanks!