Biplot visualization error

stamper · November 14, 2019, 9:41pm

I am trying to create a biplot and keep getting this error. I have run the same command in several versions of qiime2-2019 (.10, .4, and .1) with several different datasets with varying levels of analysis (SEPP and not, debloomed and not, greengenes, silva, mitochondria & chloroplast and not). I have been following this post as a guide for generating the necessary lead up files.

Your assistance is greatly appreciated.

(qiime2-2019.1) Chriss-MacBook-Pro:core-metrics-results stamper$ qiime emperor biplot --i-biplot bray_curtis_biplot.qza --m-sample-metadata-file Barth_metadata.txt --m-feature-metadata-file taxonomy.qza --o-visualization bray_curtis_biplot.qzv --verbose
Traceback (most recent call last):
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "</Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-357>", line 2, in biplot
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 427, in callable_executor
ret_val = self._callable(output_dir=temp_dir, **view_args)
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_emperor/_plot.py", line 87, in biplot
feats['importance'] = feats.apply(euclidean, axis=1, args=(origin,))
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/pandas/core/frame.py", line 6014, in apply
return op.get_result()
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/pandas/core/apply.py", line 142, in get_result
return self.apply_standard()
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/pandas/core/apply.py", line 248, in apply_standard
self.apply_series_generator()
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/pandas/core/apply.py", line 277, in apply_series_generator
results[i] = self.f(v)
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/pandas/core/apply.py", line 74, in f
return func(x, *args, **kwds)
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/scipy/spatial/distance.py", line 602, in euclidean
return minkowski(u, v, p=2, w=w)
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/scipy/spatial/distance.py", line 505, in minkowski
dist = norm(u_v, ord=p)
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/scipy/linalg/misc.py", line 137, in norm
a = np.asarray_chkfinite(a)
File "/Users/stamper/miniconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/numpy/lib/function_base.py", line 461, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: ('array must not contain infs or NaNs', 'occurred at index 0df6c802966e8670279671824da4f10a')

Plugin error from emperor:

('array must not contain infs or NaNs', 'occurred at index 0df6c802966e8670279671824da4f10a')

See above for debug info.

fedarko · November 15, 2019, 2:38am

Hi @stamper! Looking through that error message, it seems like things are going wrong at this particular line in the q2-emperor code.

From what I can tell, the process of trying to compute the magnitude of each feature in the biplot (what's being done on that line of code) is failing because of non-finite (to quote the error message, "infs or NaNs") feature loading values. That is: your bray_curtis_biplot.qza artifact should just have a bunch of numbers under the hood, but it looks like one of those numbers isn't actually a number!

There are a few possible reasons you could be getting this error. Maybe there are literally Infinity or NaN ("not a number") values listed in the feature loadings in your PCoA OrdinationResults, maybe some of these numbers are large enough that they're being converted somewhere in the code to Infinity (since this is Python, I doubt it), or maybe some other crazy thing(s) are happening.

In any case, I think the bray_curtis_biplot.qza artifact (that you got from running qiime diversity pcoa-biplot, it sounds like?) is the source of your problems here. If you wouldn't mind sharing it on the forum, it should be possible to dig through it and see if there are any entries that stand out (e.g. if there are literally Infinity values). From there, I think we can trace this error back to where things went wrong.

stamper · November 15, 2019, 4:18pm

Ok thanks for the information. Here is the file. Thanks again for the help! bray_curtis_biplot.qza (567.0 KB)

fedarko · November 16, 2019, 7:27am

Hi @stamper! I split my answer up into two sections.

What I think the problem is

Looking through the OrdinationResults data under the hood in that QZA file, what sticks out to me is that, for each of the rows in this OrdinationResults describing a feature, the feature loading for the final eigenvalue (the 41^st one) is nan. I'm pretty sure this is why you got that error!

It looks like this same eigenvalue has an 0% "proportion explained" for this biplot -- I think this is the source of the problem. My guess is that this eigenvalue's "proportion explained" is getting rounded down to 0, and that this is somehow causing all of the feature loadings for this eigenvalue to be nan. I am not sure why this problem is happening to you?, but it should be possible for us to get around this

One possible solution

From looking at the provenance of the QZA file, it looks like you got the Bray-Curtis PCoA through running the qiime diversity core-metrics-phylogenetic pipeline. I dug under the hood a bit, and it looks like that pipeline runs qiime diversity pcoa without specifying a --p-number-of-dimensions, and that in turn causes qiime diversity pcoa to compute "all" eigenvalues rather than use an approximation to compute some of the eigenvalues. Phew!

Since it seems like the final eigenvalue is the problem, you might be able to get around this by running the rarefaction? --> beta diversity --> PCoA --> PCoA biplot commands manually (i.e. without using the core-metrics-phylogenetic pipeline). I think this would allow you to run qiime diversity pcoa and explicitly set --p-number-of-dimensions to a value smaller than 41, and I think this should solve this problem.

Let me know how this goes! Like I said, I don't know what the root cause of this problem is, but I think the solution above should at least work as a temporary measure to get around this.

stamper · November 18, 2019, 8:50pm

Thanks! That worked!

MMC_northS · December 11, 2019, 8:43am

Hello,

I had a similar error for Biplot visualization with emperor with QIIME2-2019.10 version. I have QIIME2 in two computers and in one of them I let the old version QIIME2-2019.7. I have run the exactly same comands in both, with the same files but I have this error when I use the QIIME2-2019.10 version.

Plugin error from emperor: ('array must not contain infs or NaNs', 'occurred at index FJ944692.1.1306')
Debug info has been saved to /tmp/qiime2-q2cli-err-6vmoq9ip.log

However, with the old version QIIME2-2019.7 it worked and I can see my biplot.

The commands that I run were:
1- qiime diversity beta --i-table rarefied_feature_table_clor_mit_n10.qza --p-metric braycurtis --o-distance-matrix bd_braycurtis_clor_mit_n10.qza
2- qiime diversity pcoa --i-distance-matrix bd_braycurtis_clor_mit_n10.qza --o-pcoa bd_braycurtis_pcoa_clor_mit_n10.qza
3- qiime feature-table relative-frequency --i-table rarefied_feature_table_clor_mit_n10.qza --o-relative-frequency-table relative_rarefied_feature_table_clor_mit_n10.qza
4- qiime diversity pcoa-biplot --i-pcoa bd_braycurtis_pcoa_clor_mit_n10.qza --i-features relative_rarefied_feature_table_clor_mit_n10.qza --o-biplot bd_braycurtis_biplot_pcoa_clor_mit_n10.qza
5- qiime emperor biplot --i-biplot bd_braycurtis_biplot_pcoa_clor_mit_n10.qza --m-sample-metadata-file mapping_file_parameters.txt --p-number-of-features 5 --o-visualization bd_braycurtis_biplot_pcoa_clor_mit_n10.qzv

It is the command "5", using emperor to see the biplot and create the qzv file which gave me the problem with te new version of QIIME2.

What could be the problem with this new version?

Thank you very much in advance for your help

fedarko · December 12, 2019, 2:37am

@MMC_northS It's hard to say for sure without seeing your data, but it does look like you're running into the same problem that we encountered above.

Could you try rerunning the qiime diversity pcoa step so that you explicitly set --p-number-of-dimensions 5 (doesn't have to be 5, any number around that should be ok)? After rerunning qiime diversity pcoa you'll need to rerun the qiime diversity pcoa-biplot and qiime emperor biplot commands with the new outputs. My guess is that this should make things work.

I don't know what in the newest version of Q2 could be causing this problem (I'm not really qualified to speak on that ). It might be a NumPy / SciPy problem with precision -- I looked around the q2-diversity GitHub page, and found this issue which might be related. Alternately, this comment suggests that one culprit might be empty samples -- if you filtered your dataset to e.g. remove a certain amount of ASVs, maybe this resulted in a single sample becoming "empty", which could cause problems with various computations.

In any case, keep us posted on how this goes!

MMC_northS · December 16, 2019, 2:49pm

@fedarko Thank you very much for your help. After following your recommendation and adding --p-number-of-dimensions when I did my pcoa using qiime diversity pcoa all was right. The pcoa biplot run without errors and I could use it in the qiime view.

I do not know why this parameter is mandatory in this new version of qiime2 and it was not necessary in the version 2019.7 but hereafter I will use it with this setting.

Thanks again.

thermokarst · December 16, 2019, 3:29pm

The requirement of the parameter did not change in 2019.10. As @fedarko mentioned about, it would be helpful to get our hands on data in order to reproduce. As well, we need the full error log (run the command with the --verbose flag. Thanks!

timanix · January 13, 2020, 11:31am

Thank you for such detailed explanation and solution!
Just encountered the same error with my new dataset (no problems with older dataset). Your suggestion worked perfectly and saved me a lot of time.

Kelly_Weldon · January 17, 2020, 9:48pm

Hi @fedarko

What would be the easiest way to go about checking the correct --p-n-number-of-dimensions that might be ideal for your dataset? The biplots are a bit hit for the metabolomics community and we are receiving the same error when trying to create biplots with version 2019.10!

fedarko · January 17, 2020, 10:32pm

Hi @Kelly_Weldon!

I'm honestly not sure what the easiest thing here is. In the problematic biplot QZA that @stamper uploaded earlier in this thread, the problem was with the 41st eigenvalue -- and there were also 41 samples in that biplot. So if people are running into the same error, I guess they could try one of two things:

using a --p-number-of-dimensions of (number of samples - 1) or something, and keep decreasing that until it works
starting with something small that we know will probably work (e.g. --p-number-of-dimensions 5, assuming you have more than 5 samples).
- This is probably the easier of the two options, but the first option seems more "careful" to me because it's using a more realistic amount of eigenvalues. Honestly I don't know enough about the underlying algorithms/math to say if the loss of accuracy from e.g. --p-number-of-dimensions 5 will be worse than that from --p-number-of-dimensions 40 for a 41-sample dataset...

In any case, these options are just temporary workarounds due to that error -- they really shouldn't have to be used in practice. If you still have access to the data that's causing problems, I'd suggest getting it together (feature table QZA + PCoAResults QZA + sample metadata) into a ZIP file or something, and uploading it to this thread (or in a new GitHub issue for q2-diversity). Hopefully that should be enough for the Q2 devs to figure out what the problem is and fix it for everyone, so that we won't have to use this workaround any more