So, Id start by making a PCoA with your favorite metric on rarified data to see if there’s separation by sample type. That will help you get a feel for whether you should separate by sample type or not. Given your criteria, something like Jaccard should answer your question.
If you decide to analyze by sample type. I’d work with your data from there and filter for feature based analysis. (I would still encourage alpha and beta on the unfiltered, rarified data to check and make sure you actually have differences in the forest before yo go starting at trees.)
Second, I wanted to mention earlier, I think your filtering criteria is very stringent, possibly too stringent. Human microbiome data tends to be sparse for a lot reasons. The end result is that in a population of 100 adults, I might find my most prevelent OTU/ASV in only 80 of them, with I think a power law decrease. So, I’d recommend relaxing your filtering criteria especially in humans. I tend to have a lot of success filtering to 5-10% preference (contingent on sample size, complexity, and model.)
My rule of thumb for filtering is to check and make sure it’s okay by running a Procrustes analysis comparing a rarified distance matrix of the original data to a rarefied distance matrix of the unfiltered data. My correlation with Bray Curtis distance should be high (you want the mantel test for this, i Target about 90% or higher) and you should have decent correlation with any other metrics of interest. I like UniFrac.
So, as a quick summary, I would filter by site, but relax my filtering criteria a lot.
Finally, I apologize for mistakes, I’m on my phone and autocorrect isn’t my friend today.