NaNs/non symmetrical data in Table resulting from qiime feature-table filter-samples

Hello,
I used the metadata option of qiime feature-table filter-sequences to remove a category of data from my table. However, when I move on to create a distance matrix using qiime diversity beta I get an error telling me that I either have NaNs in my table or my data is not symmetrical. I have verified that the table does have the expected sequences in it and isn’t empty using qiime feature-table summarize. Below are my commands and the error message. Anyone have any issues like this? Did I mess up the SQlite where clause somehow?
Thanks,
Patrick

filtering command:
qiime feature-table filter-samples
–i-table …/…/…/dada2_output/table.qza
–m-metadata-file …/…/metadata_LSU_qiime2_2017_bb_only_organizedfortidal.tsv
–p-where “Sheared=‘Unsheared’”
–o-filtered-table Intact_Table
Beta Diversity command:
qiime diversity beta
–i-table Intact_Table.qza
–p-metric dice --o-distance-matrix dice_distance_matrix_intact
error message: /opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/sklearn/m etrics/pairwise.py:1575: DataConversionWarning: Data was converted to boolean fo r metric dice
warnings.warn(msg, DataConversionWarning)
Traceback (most recent call last):
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q 2cli/commands.py”, line 327, in call
results = action(**arguments)
File “</opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/ decorator.py:decorator-gen-375>”, line 2, in beta
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q iime2/sdk/action.py”, line 240, in bound_callable
output_types, provenance)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q iime2/sdk/action.py”, line 383, in callable_executor
output_views = self._callable(**view_args)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/q 2_diversity/_beta/_method.py”, line 129, in beta
n_jobs=n_jobs
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/s kbio/diversity/_driver.py”, line 381, in beta_diversity
return DistanceMatrix(distances, ids)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/s kbio/stats/distance/_base.py”, line 106, in init
self._validate(data, ids)
File “/opt/anaconda/anaconda3/envs/qiime2-2019.7/lib/python3.6/site-packages/s kbio/stats/distance/_base.py”, line 873, in _validate
“Data must be symmetric and cannot contain NaNs.”)
skbio.stats.distance._base.DistanceMatrixError: Data must be symmetric and canno t contain NaNs.

Plugin error from diversity:

Data must be symmetric and cannot contain NaNs.

See above for debug info.

Hi! Can you double check your metadata file, is there any column with empty cells for one or several samples? All wells should be filled if there is a column.

Yes, all wells are filled, and I’ve validated the sheet with Keemei. Of course, there’s samples on the metadata sheet that aren’t in the resulting filtered table, because I’m using categories on the sheet to filter the data. But that shouldn’t matter, because qiime diversity beta doesn’t call for metadata, right?

Thanks for the reply.

Hi @prayle1,

Why do you assume this issue is related to filter-samples? Do you not see this error prior to filtering? Maybe you could post the filtered feature table summary here so we could take a look?

What happens if you use a different metric? This could help diagnose whether this is actually an issue with the data or with the metric.

Correct.

I haven't seen this error with my original, unfiltered table at all, and I've used this same diversity metric. In checking a few other metrics quickly, the filtered table does work with some but not all.
dice - error
jaccard - completes
yule - error
russellrao - completes
hamming - completes
Here's my filtered table summary - Intact_Table.qzv (1.9 MB)
I'll keep checking metrics, but I'm not sure why some of them work and some don't.
Thanks

Well, I’ve just rerun qiime diversity beta on my original table and it’s failed as well with the same error. I think I’ll regenerate my table from my reads and see if that fixes it.

Hey @prayle1!

If I had to guess, there is probably a division by zero happening inside the metric itself. Looking at your table summary, you have a few samples which have no counts (and one with only 1). If you filter those out first I suspect metrics will start behaving better as they aren’t comparing ratios of 0 anymore.

This issue is usually handled by rarefying (which would drop those samples), which is why we don’t notice most of the time.

3 Likes

Aha, that’s it. I filtered this table out of my original, completely unaltered table, when I should have been filtering it from the table where I filtered down to the group of organisms I am interested in (which also removed all the samples that had no reads). I’ll definitely be looking into the rarefaction for beta diversity as well. Thanks!

2 Likes

I have a count table in which all samples have plenty of counts:

# Constructed from biom file
#OTU ID 2_T3    93_T3   93_T7   97_T6   36_T6   88_T6
s__Hespellia_stercorisuis       17.0    40.0    42.0    17.0    58.0    26.0
s__Eubacterium_F_sp002431395    0.0     22.0    20.0    30.0    0.0     0.0
s__UBA9502_sp004554205  223.0   12.0    13.0    93.0    0.0     49.0
s__Firm-11_sp900540045  1702.0  95.0    33.0    141.0   0.0     0.0
s__QANG01_sp003150145   42.0    53.0    0.0     0.0     0.0     23.0
s__Parabacteroides_sp003480915  0.0     99.0    82.0    13.0    0.0     92.0
s__Bacteroides_sp007097645      118.0   704.0   208.0   152.0   1253.0  3981.0
s__UBA6398_sp003150315  11.0    22.0    11.0    17.0    20.0    0.0
s__Schaedlerella_sp000364245    16.0    0.0     14.0    0.0     0.0     0.0
s__Lawsonibacter_sp900066645    226.0   155.0   71.0    83.0    19.0    424.0
s__Blautia_sp001304935  25.0    31.0    18.0    0.0     34.0    14.0
[~1500 more lines with abundant taxa] 

…and I am only getting the Data must be symmetric and cannot contain NaNs. error if running core-metrics-phylogenetic but not when running core-metrics. Maybe it is something wrong with my phylogeny, but without more info on the cause of the error (the qiime2 error log file doesn’t help), it’s really hard to know.

@nick-youngblut, please do not cross-post (feature_ids must be present as tip names), this is a violation of our code of conduct (https://forum.qiime2.org/faq#cross-posting) - this puts extra burden on the volunteers helping out on this forum. Thanks for your understanding. I am locking this topic, we can carry on discussion in the post I linked to above.