Thanks for this great tool.
I was hoping you could help improve my understanding of this method and how to properly use it.
Correct me if I’m wrong but the main improvement here over using the aitchison distance alone is in the handling of zeros (matrix completion vs pseudocount). Do you have example cases of the improvement seen over aitchison with pseudocount? I saw the nice comparisons to jaccard and bray-curtis but not aitchison alone.
Second, I noticed the suggestion to not rarefy the data before running rpca. What is the effect of having one sample with 10X the depth of another sample? Does matrix completion work better when the zeros have an equal likelihood of being due to subsampling across samples?
Third, the other suggestion is to not collapse the data to the genus level. Could you expand on why this is the case? Many features cannot be annotated properly below the genus level so for plotting measures that use taxonomy, like the biplot, it is often helpful in my eyes to collapse to the genus level. Does this process skew the data in some form that interferes with rpca?