The boxplots and p-values make sense to me... the mean # of observed OTUs for those groups are not the same. Even though there is a lot of overlap, Erie definitely looks lower, and you have a fairly large sample size of power this comparison.
That definitely adds noise, but it seems you still see significant differences.
You can test it both ways. You can see here that alpha diversity is significantly lower in lake Erie than the others, even when you do not account for fraction. The Lake/SizeFraction will probably make the differences more pronounced, but reduce your sample size so you might actually lose significance.
With an N of 6, the multiple test correction should not make much of a difference. If you have borderline significance that is made insignificant after correction, I'd say manually correct those p-values with the actual pairwise comparisons that you would have planned to perform from the start (because I agree, comparing diversity in different fractions in different lakes is probably not something you would ever want to test). But in any case don't let worry over multiple test correction stop you from running everything together for convenience... running together and correcting later (if needed) is easier than splitting and testing now. (or run everything now and then filter and test later)
Correct.
I hope that helps!