In a recent study, we profiled the response of fish gut microbiota to a dietary intervention. Besides sampling the luminal contents and mucosal tissue from the proximal and distal intestine, we also collected feed and water samples. To avoid rarefying the ASV table, I used the DECODE plugin for the visualization of beta-diversity. However, the resulting PCoA biplot clustered specific sets of samples (mucosal tissue + water + feed) into one straight line, which is very different from PCoA results produced by the QIIME2 core metrics.
That is indeed strange-looking, thank you for taking the time to report it. There are some interesting saturations in the Bray-Curtis plot as well. My initial guess is to remove the feature filter by changing –p-min-feature-count 10 to –p-min-feature-count 0.
Next, let’s check the version of RPCA you are using. You can check this by running the following:
pip freeze | grep deicode
if this is not 0.2.4 try updating through conda by running:
conda install -c conda-forge deicode==0.2.4
If neither of those attempts work we can try a few more things.
The next thing to try is a frequency filter which can be added as a parameter in v.0.2.4 of deicode. To do this add this flag --p-min-feature-frequency 10 to the input command for RPCA.
Another thing we can try is increasing the rank by changing the n-components parameter. I would suggest trying something like --p-n-components 6. There is also a new command in v.0.2.4 of deicode where this rank can be automatically estimated called auto-rpca instead of rpca.
If neither of those tests work we may need to take a look at the feature table stats. Thank you!
After playing around with the --p-min-feature-frequency and --p-n-components, I found that increasing the rank to 6 or using auto-rpca actually helps a bit but setting the minimal frequency to 10 didn't help.
The plots with a higher rank look much better and look more similar to the other methods (i.e. Bray-Curtis and UniFrac). It also seems that auto-rpca is estimating the rank to be higher so that indicates that increasing the --p-n-components parameter makes sense. Given this, I would just stick to the auto-rpca output, unless slightly increasing --p-n-components in rpca (e.g. 10) helps the plot spread out more.
If you are curious the rank is estimated in auto-rpca using part C in this paper.
Let me know if you have any more questions. Thanks!