When to collapse (the data, not yourself)

Nicholas_Bokulich · April 8, 2019, 12:26pm

I would personally avoid clustering on the front or back end. If you do it at either end, do it on the front and risk losing important information... but let the model tell you that! It is certainly interesting to test whether you receive similar (or more?) predictive power with taxonomy-collapsed features compared to ASVs.

But that variation could be important. I'd say it's troublesome if you suspect these are the exact same species and exact same individual insects but multiple copy # variation is making these slight variants covary almost perfectly (e.g., those culex ASVs), but that is not really troublesome at all from the standpoint of these models... just for interpreting whether you really have 3 or 4 distinct culex species/subpopulations or if these are all one in the same.