I used the metadata option of qiime feature-table filter-sequences to remove a category of data from my table. However, when I move on to create a distance matrix using qiime diversity beta I get an error telling me that I either have NaNs in my table or my data is not symmetrical. I have verified that the table does have the expected sequences in it and isn’t empty using qiime feature-table summarize. Below are my commands and the error message. Anyone have any issues like this? Did I mess up the SQlite where clause somehow?
qiime feature-table filter-samples
Beta Diversity command:
qiime diversity beta
–p-metric dice --o-distance-matrix dice_distance_matrix_intact
error message: /opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/sklearn/m etrics/pairwise.py:1575: DataConversionWarning: Data was converted to boolean fo r metric dice
Traceback (most recent call last):
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q 2cli/commands.py”, line 327, in call
results = action(**arguments)
File “</opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/ decorator.py:decorator-gen-375>”, line 2, in beta
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q iime2/sdk/action.py”, line 240, in bound_callable
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q iime2/sdk/action.py”, line 383, in callable_executor
output_views = self._callable(**view_args)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q 2_diversity/_beta/_method.py”, line 129, in beta
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/s kbio/diversity/_driver.py”, line 381, in beta_diversity
return DistanceMatrix(distances, ids)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/s kbio/stats/distance/_base.py”, line 106, in init
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/s kbio/stats/distance/_base.py”, line 873, in _validate
“Data must be symmetric and cannot contain NaNs.”)
skbio.stats.distance._base.DistanceMatrixError: Data must be symmetric and canno t contain NaNs.
Plugin error from diversity:
Data must be symmetric and cannot contain NaNs.
See above for debug info.
Hi! Can you double check your metadata file, is there any column with empty cells for one or several samples? All wells should be filled if there is a column.
Yes, all wells are filled, and I’ve validated the sheet with Keemei. Of course, there’s samples on the metadata sheet that aren’t in the resulting filtered table, because I’m using categories on the sheet to filter the data. But that shouldn’t matter, because qiime diversity beta doesn’t call for metadata, right?
Thanks for the reply.
Why do you assume this issue is related to
filter-samples? Do you not see this error prior to filtering? Maybe you could post the filtered feature table summary here so we could take a look?
What happens if you use a different metric? This could help diagnose whether this is actually an issue with the data or with the metric.
I haven't seen this error with my original, unfiltered table at all, and I've used this same diversity metric. In checking a few other metrics quickly, the filtered table does work with some but not all.
dice - error
jaccard - completes
yule - error
russellrao - completes
hamming - completes
Here's my filtered table summary - Intact_Table.qzv (1.9 MB)
I'll keep checking metrics, but I'm not sure why some of them work and some don't.
Well, I’ve just rerun qiime diversity beta on my original table and it’s failed as well with the same error. I think I’ll regenerate my table from my reads and see if that fixes it.
If I had to guess, there is probably a division by zero happening inside the metric itself. Looking at your table summary, you have a few samples which have no counts (and one with only 1). If you filter those out first I suspect metrics will start behaving better as they aren’t comparing ratios of 0 anymore.
This issue is usually handled by rarefying (which would drop those samples), which is why we don’t notice most of the time.
Aha, that’s it. I filtered this table out of my original, completely unaltered table, when I should have been filtering it from the table where I filtered down to the group of organisms I am interested in (which also removed all the samples that had no reads). I’ll definitely be looking into the rarefaction for beta diversity as well. Thanks!
I have a count table in which all samples have plenty of counts:
# Constructed from biom file
#OTU ID 2_T3 93_T3 93_T7 97_T6 36_T6 88_T6
s__Hespellia_stercorisuis 17.0 40.0 42.0 17.0 58.0 26.0
s__Eubacterium_F_sp002431395 0.0 22.0 20.0 30.0 0.0 0.0
s__UBA9502_sp004554205 223.0 12.0 13.0 93.0 0.0 49.0
s__Firm-11_sp900540045 1702.0 95.0 33.0 141.0 0.0 0.0
s__QANG01_sp003150145 42.0 53.0 0.0 0.0 0.0 23.0
s__Parabacteroides_sp003480915 0.0 99.0 82.0 13.0 0.0 92.0
s__Bacteroides_sp007097645 118.0 704.0 208.0 152.0 1253.0 3981.0
s__UBA6398_sp003150315 11.0 22.0 11.0 17.0 20.0 0.0
s__Schaedlerella_sp000364245 16.0 0.0 14.0 0.0 0.0 0.0
s__Lawsonibacter_sp900066645 226.0 155.0 71.0 83.0 19.0 424.0
s__Blautia_sp001304935 25.0 31.0 18.0 0.0 34.0 14.0
[~1500 more lines with abundant taxa]
…and I am only getting the
Data must be symmetric and cannot contain NaNs. error if running
core-metrics-phylogenetic but not when running
core-metrics. Maybe it is something wrong with my phylogeny, but without more info on the cause of the error (the qiime2 error log file doesn’t help), it’s really hard to know.
@nick-youngblut, please do not cross-post (feature_ids must be present as tip names), this is a violation of our code of conduct (https://forum.qiime2.org/faq#cross-posting) - this puts extra burden on the volunteers helping out on this forum. Thanks for your understanding. I am locking this topic, we can carry on discussion in the post I linked to above.