Jaccard and Bray-curtis emperor plots do not show all samples and do show "nan"

Hi all. I have a set of data that I have analysed with qiime2/2018.2.
All has run OK and I have checked that the metadata file is correctly formatted.
Unfortunately I can’t post some parts of the analysis due to commercial sensitivity but I was hoping for some general feedback on the nature of the problem.

Basically, as mentioned, both Jaccard and Bray-curtis emperor plots don’t show all the samples listed in both the fastq manifest and metadata file, but do show a category called “nan”. We do see a clear separation of the things we think should be separated so the results seem valid, I just don’t understand why I’m putting in 70 samples and only 64 of them are visible, and where this “nan” category is coming from (“nan” shows up as a single category with a single sample but it is not in the metadata file and I’ve checked the formatting of the metadata file extremely thoroughly).

I think I’m running up against the limits of my knowledge of what emperor plots are supposed to show. Can anyone help here?

Hi @danwiththeplan,

My first guess would be that the rarefy step (which is a standard precursor step, assuming you’re running core-metrics) has dropped samples (this is normal and kind of expected).

The “nan” variable is probably an empty cell in your metadata. In QIIME 2 we use empty cells to represent missing values. When doing math on missing values, usually the most obvious representation is “NaN” (Not-a-Number), which will propagate through all calculations. So when QIIME 2 sees an empty cell, it will internally represent the “missingness” of that as a NaN (even for categorical data). This is a pretty common convention, but can definitely be strange at first if you aren’t used to it.

2 Likes

Thanks very much for your reply. I understand what “NAN” means and the concept of missing values, and I have checked that there are no empty cells in the metadata file. I’m looking at the metadata file in Atom, invisible characters viewable. There are metadata columns that definitely do have missing values (by design) and these are designated as NA, so I would expect a “NAN” category when looking at an emperor plot that is set to look at those columns, but the column I’m interested in has four categories, no missing values, no hidden whitespaces or weird characters, no extra line feed at the end (UNIX line feeds), nothing. So I still don’t understand why I’m getting a NAN category… it’s showing up in the emperor plots as a dot, which implies that there’s data behind it, but I just can’t understand why.

As for the dropping out of samples this seems to be related to some abundance filtering I did but even with no filtering one of the samples is still missing, so I’ll do some more work to try and diagnose this.

Thanks for your help!

1 Like

Actually I’ve resolved the “missing samples” problem, thanks for your insights on that. Caused by --p-sampling-depth parameter in qiime diversity core-metrics, and it’s probably appropriate to filter out samples with low sampling depth, so all OK for that issue.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.