Hi Cameron,
Sorry, one other question. When I use the qiime plugin I now I understand that the --p-rank parameter defines the output rank of the matrix which means the total variance of your axes in PCA will be defined by that number (rank # = number of PCA vectors to sum to 100% variance). When I analyze my own dataset I find that if I use the default rank setting of 3 that the axes explain 47%,27%,26% of the variance respectively. When I rerun this analysis with --p-rank set to 5 the axes explain 35%, 19%, 18%, 14%, 13% of the variance respectively. The clustering is generally similar but definitely easier to see in the --p-rank 3 plot since all of the variance is forced into the first 3 axes which we can visualize easily.
People often use the variance explained in the first three axes to determine how useful the ordination plot is, visually speaking. However, I feel like people would get different interpretations if they thought the data could be explained completely in the first three axes vs lower and lower amounts depending on the rank parameter chosen. Why should rank 3 be sufficient in most studies? Should this value be dependent on the number of metadata conditions you expect like in your paper? If you expected two outcomes but in reality there were four would forcing the rank to 2 cause overfitting?
Best,
Jacob