Peculiar clustering of samples in DECODE RPCA biplot


In a recent study, we profiled the response of fish gut microbiota to a dietary intervention. Besides sampling the luminal contents and mucosal tissue from the proximal and distal intestine, we also collected feed and water samples. To avoid rarefying the ASV table, I used the DECODE plugin for the visualization of beta-diversity. However, the resulting PCoA biplot clustered specific sets of samples (mucosal tissue + water + feed) into one straight line, which is very different from PCoA results produced by the QIIME2 core metrics.

RPCA results:
rpca-biplot.qzv (1.2 MB)

PCoA results based on Bray-Curits/weighted-UniFrac distance (rarefied to 11171 reads per sample):
bray_curtis_emperor.qzv (888.1 KB) weighted_unifrac_emperor.qzv (888.6 KB)

Command lines I used to generate the RPCA results.

qiime deicode rpca
--i-table data/qiime2/table-no-chlo-mito-lowPre-contam-ctrl-with-phyla.qza
--p-min-feature-count 10
--p-min-sample-count 500
--output-dir data/qiime2/robust-Aitchison-PCA
--o-biplot data/qiime2/robust-Aitchison-PCA/rpca-ordination.qza
--o-distance-matrix data/qiime2/robust-Aitchison-PCA/Aitchison-distance.qza

qiime emperor biplot
--i-biplot data/qiime2/robust-Aitchison-PCA/rpca-ordination.qza
--m-sample-metadata-file data/metadata.tsv
--m-feature-metadata-file data/qiime2/taxonomy-silva132.qza
--o-visualization data/qiime2/robust-Aitchison-PCA/rpca-biplot.qzv
--p-number-of-features 8


Hi @yanxianl,
I am just pinging @cmartino to see if he can take a look. Thank you both!

Hi @yanxianl,

That is indeed strange-looking, thank you for taking the time to report it. There are some interesting saturations in the Bray-Curtis plot as well. My initial guess is to remove the feature filter by changing –p-min-feature-count 10 to –p-min-feature-count 0.

Next, let’s check the version of RPCA you are using. You can check this by running the following:

pip freeze | grep deicode

if this is not 0.2.4 try updating through conda by running:

conda install -c conda-forge deicode==0.2.4

If neither of those attempts work we can try a few more things.


Hi Cameron,

I'm using qiime2-2019.7 and the RPCA is in version 0.2.4. Based on your advice, I removed the feature filtering but it didn't help.

rpca-biplot.qzv (1.2 MB)

Hi @yanxianl,

Thank you for trying that.

The next thing to try is a frequency filter which can be added as a parameter in v.0.2.4 of deicode. To do this add this flag --p-min-feature-frequency 10 to the input command for RPCA.

Another thing we can try is increasing the rank by changing the n-components parameter. I would suggest trying something like --p-n-components 6. There is also a new command in v.0.2.4 of deicode where this rank can be automatically estimated called auto-rpca instead of rpca.

If neither of those tests work we may need to take a look at the feature table stats. Thank you!

Hi @cmartino!

After playing around with the --p-min-feature-frequency and --p-n-components, I found that increasing the rank to 6 or using auto-rpca actually helps a bit but setting the minimal frequency to 10 didn't help.

rpca-biplot-autorpca.qzv (1.2 MB) rpca-biplot-minFreq10.qzv (1.2 MB) rpca-biplot-ncomp6.qzv (1.2 MB)

Do you think the plots with an increased number of ranks look normal?

Hi @yanxianl,

The plots with a higher rank look much better and look more similar to the other methods (i.e. Bray-Curtis and UniFrac). It also seems that auto-rpca is estimating the rank to be higher so that indicates that increasing the --p-n-components parameter makes sense. Given this, I would just stick to the auto-rpca output, unless slightly increasing --p-n-components in rpca (e.g. 10) helps the plot spread out more.

If you are curious the rank is estimated in auto-rpca using part C in this paper.

Let me know if you have any more questions. Thanks!


Great. Thanks for your time and help! I’ll read that paper.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.