Peculiar clustering of samples in DECODE RPCA biplot


In a recent study, we profiled the response of fish gut microbiota to a dietary intervention. Besides sampling the luminal contents and mucosal tissue from the proximal and distal intestine, we also collected feed and water samples. To avoid rarefying the ASV table, I used the DECODE plugin for the visualization of beta-diversity. However, the resulting PCoA biplot clustered specific sets of samples (mucosal tissue + water + feed) into one straight line, which is very different from PCoA results produced by the QIIME2 core metrics.

RPCA results:
rpca-biplot.qzv (1.2 MB)

PCoA results based on Bray-Curits/weighted-UniFrac distance (rarefied to 11171 reads per sample):
bray_curtis_emperor.qzv (888.1 KB) weighted_unifrac_emperor.qzv (888.6 KB)

Command lines I used to generate the RPCA results.

qiime deicode rpca
–i-table data/qiime2/table-no-chlo-mito-lowPre-contam-ctrl-with-phyla.qza
–p-min-feature-count 10
–p-min-sample-count 500
–output-dir data/qiime2/robust-Aitchison-PCA
–o-biplot data/qiime2/robust-Aitchison-PCA/rpca-ordination.qza
–o-distance-matrix data/qiime2/robust-Aitchison-PCA/Aitchison-distance.qza

qiime emperor biplot
–i-biplot data/qiime2/robust-Aitchison-PCA/rpca-ordination.qza
–m-sample-metadata-file data/metadata.tsv
–m-feature-metadata-file data/qiime2/taxonomy-silva132.qza
–o-visualization data/qiime2/robust-Aitchison-PCA/rpca-biplot.qzv
–p-number-of-features 8


Hi @yanxianl,
I am just pinging @cmartino to see if he can take a look. Thank you both!

Hi @yanxianl,

That is indeed strange-looking, thank you for taking the time to report it. There are some interesting saturations in the Bray-Curtis plot as well. My initial guess is to remove the feature filter by changing –p-min-feature-count 10 to –p-min-feature-count 0.

Next, let’s check the version of RPCA you are using. You can check this by running the following:

pip freeze | grep deicode

if this is not 0.2.4 try updating through conda by running:

conda install -c conda-forge deicode==0.2.4

If neither of those attempts work we can try a few more things.


Hi Cameron,

I’m using qiime2-2019.7 and the RPCA is in version 0.2.4. Based on your advice, I removed the feature filtering but it didn’t help.

rpca-biplot.qzv (1.2 MB)

Hi @yanxianl,

Thank you for trying that.

The next thing to try is a frequency filter which can be added as a parameter in v.0.2.4 of deicode. To do this add this flag --p-min-feature-frequency 10 to the input command for RPCA.

Another thing we can try is increasing the rank by changing the n-components parameter. I would suggest trying something like --p-n-components 6. There is also a new command in v.0.2.4 of deicode where this rank can be automatically estimated called auto-rpca instead of rpca.

If neither of those tests work we may need to take a look at the feature table stats. Thank you!

Hi @cmartino!

After playing around with the --p-min-feature-frequency and --p-n-components, I found that increasing the rank to 6 or using auto-rpca actually helps a bit but setting the minimal frequency to 10 didn’t help.

rpca-biplot-autorpca.qzv (1.2 MB) rpca-biplot-minFreq10.qzv (1.2 MB) rpca-biplot-ncomp6.qzv (1.2 MB)

Do you think the plots with an increased number of ranks look normal?

Hi @yanxianl,

The plots with a higher rank look much better and look more similar to the other methods (i.e. Bray-Curtis and UniFrac). It also seems that auto-rpca is estimating the rank to be higher so that indicates that increasing the --p-n-components parameter makes sense. Given this, I would just stick to the auto-rpca output, unless slightly increasing --p-n-components in rpca (e.g. 10) helps the plot spread out more.

If you are curious the rank is estimated in auto-rpca using part C in this paper.

Let me know if you have any more questions. Thanks!


Great. Thanks for your time and help! I’ll read that paper.