Sample size decrease when doing beta diversity analysis


I have a question about beta diversity analysis. My sample size seems to be reduced in beta diversity analyses in the emperor plot from 36 to 25. I used a sampling depth of 6093 which only removed three of the samples but I can see reductions in all samples really more than this. Can someone help me with figuring out why this is happening?


Hi @Negin,
We would need more detail to troubleshoot this issue. Could you provide us with the qiime2 version, exact commands you are running, and the feature-table and metadata file you are using? You could PM this if you rather not share this info publicly.

Hi Mehrbod,

I was able to return some of the samples. It seemed that I was using the sampling depth from before filtering so a higher number of my samples were removed. This fixed almost everything except for one species and only for Jaccard index that I lose one sample even though making unweighted unifrac plot with the exact data would not remove that sample. I was wondering whether dots can overlap in the plot?

Hi @Negin,
Glad you figured out the root of the problem!
Dots can theoretically overlap if everything in the samples is the same though this rarely happens in real life data. Personally I’ve never seen complete overlap. You can try searching for that individual sample by browsing through the side menu in Emperor. Simply go to the scale menu, find the sample that appears to be missing and increase its size until you find it on the plot. If that sample is missing completely from the drop down menu then we’ll have to look elsewhere for our problem.

Hi Mehrbod,

I tried that. I can’t find it. One sample of one species seem to be missing from two of the plots: Jaccard and weighted unifrac but not from unweighted unifrac or bray-curtis.

Hi @Negin,
Not sure I understand what you mean by one sample of one species, since the Emperor plots show samples alone without species info (unless this is a biplot?)
Would you mind sharing one of plots with the missing data and one without the missing data? You can PM this if you rather not share the data.

Hi Mehrbod,

In Emperor plot, we can choose a category. Here, I choose species and then species are color-coded and I can see what is missing. For example in the two plots below, you can see that Jaccard has one fewer sample when compared to Unifrac. They are located on the top left of the plot and they are green if you color-code based on species. Please let me know when you see the plots and I will remove them from here.

Thanks Negin,
Just looking through them now if you'd like to remove them. In the future, if you have sensitive data that you do't want public you can send them via direct messages on the forum by clicking on the top right avatar icon and clicking the envelope 'message' sign. :slight_smile:

Hi Negin,
Actually, looks like you attached the Jaccard visualization but the unweighted-unifrac distance matrix. I won’t be able to reproduce the emperor plot without the metadata.tsv file you used. Do you mind either PMing me the metadata.tsv file or the unweighted-unifrac visualization artifact. Thanks

Hi @Negin,
Thanks for PMing your Jaccard plot. As it turns out the powers at play had one last April Fool's joke to play with us.
Your initial guess that there was a complete overlap is true.

The image on the left is your Jaccard plot color-coded with Species and the big green sphere is Hibernia and Persephone in perfect overlap. I couldn't separate them with scaling but I just changed one shape to a Ring and the other to a diamond and voila, they're both there.
I've never actually seen complete overlap in real life samples but given that the Jaccard distance is based on Presence/Absence this simply means that those two samples share all the same taxa/ASVs completely. This may have happened naturally which is rare, but I guess totally plausible or perhaps something you did with the filtering steps prior that drew those 2 samples to each other. Eitherway, there's nothing wrong with your plots or any of the tools used. Just Loki having a laugh :black_joker:


Hi Mehrbod,

Thank you for trouble-shooting that. I did filter reads that were suspected to be host DNA based on NCBI. That might be the cause of complete overlap.


and I should mention this was very smart of you to figure that out. Choosing a ring and resize them was the best way to see both. Thanks for figuring that out.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.