NaNs/non symmetrical data in Table resulting from qiime feature-table filter-samples

I used the metadata option of qiime feature-table filter-sequences to remove a category of data from my table. However, when I move on to create a distance matrix using qiime diversity beta I get an error telling me that I either have NaNs in my table or my data is not symmetrical. I have verified that the table does have the expected sequences in it and isn’t empty using qiime feature-table summarize. Below are my commands and the error message. Anyone have any issues like this? Did I mess up the SQlite where clause somehow?

filtering command:
qiime feature-table filter-samples
–i-table …/…/…/dada2_output/table.qza
–m-metadata-file …/…/metadata_LSU_qiime2_2017_bb_only_organizedfortidal.tsv
–p-where “Sheared=‘Unsheared’”
–o-filtered-table Intact_Table
Beta Diversity command:
qiime diversity beta
–i-table Intact_Table.qza
–p-metric dice --o-distance-matrix dice_distance_matrix_intact
error message: /opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/sklearn/m etrics/ DataConversionWarning: Data was converted to boolean fo r metric dice
warnings.warn(msg, DataConversionWarning)
Traceback (most recent call last):
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q 2cli/”, line 327, in call
results = action(**arguments)
File “</opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/>”, line 2, in beta
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q iime2/sdk/”, line 240, in bound_callable
output_types, provenance)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q iime2/sdk/”, line 383, in callable_executor
output_views = self._callable(**view_args)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q 2_diversity/_beta/”, line 129, in beta
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/s kbio/diversity/”, line 381, in beta_diversity
return DistanceMatrix(distances, ids)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/s kbio/stats/distance/”, line 106, in init
self._validate(data, ids)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/s kbio/stats/distance/”, line 873, in _validate
“Data must be symmetric and cannot contain NaNs.”)
skbio.stats.distance._base.DistanceMatrixError: Data must be symmetric and canno t contain NaNs.

Plugin error from diversity:

Data must be symmetric and cannot contain NaNs.

See above for debug info.

Hi! Can you double check your metadata file, is there any column with empty cells for one or several samples? All wells should be filled if there is a column.

Yes, all wells are filled, and I’ve validated the sheet with Keemei. Of course, there’s samples on the metadata sheet that aren’t in the resulting filtered table, because I’m using categories on the sheet to filter the data. But that shouldn’t matter, because qiime diversity beta doesn’t call for metadata, right?

Thanks for the reply.

Hi @prayle1,

Why do you assume this issue is related to filter-samples? Do you not see this error prior to filtering? Maybe you could post the filtered feature table summary here so we could take a look?

What happens if you use a different metric? This could help diagnose whether this is actually an issue with the data or with the metric.


I haven’t seen this error with my original, unfiltered table at all, and I’ve used this same diversity metric. In checking a few other metrics quickly, the filtered table does work with some but not all.
dice - error
jaccard - completes
yule - error
russellrao - completes
hamming - completes
Here’s my filtered table summary - Intact_Table.qzv (1.9 MB)
I’ll keep checking metrics, but I’m not sure why some of them work and some don’t.

Well, I’ve just rerun qiime diversity beta on my original table and it’s failed as well with the same error. I think I’ll regenerate my table from my reads and see if that fixes it.

Hey @prayle1!

If I had to guess, there is probably a division by zero happening inside the metric itself. Looking at your table summary, you have a few samples which have no counts (and one with only 1). If you filter those out first I suspect metrics will start behaving better as they aren’t comparing ratios of 0 anymore.

This issue is usually handled by rarefying (which would drop those samples), which is why we don’t notice most of the time.


Aha, that’s it. I filtered this table out of my original, completely unaltered table, when I should have been filtering it from the table where I filtered down to the group of organisms I am interested in (which also removed all the samples that had no reads). I’ll definitely be looking into the rarefaction for beta diversity as well. Thanks!