NaNs in q2-perc-norm output

@cduvallet Thanks for the link.
I followed the tutorial using my dataset but had this error “Input contains NaN, infinity or a value too large for dtype”.
I used the default --p-otu-thresh 0.3. I had some sparse OTUs.

I exported my FeatureTable[PercentileNormalized] file after perc-norm and noted some features assigned “nan” across samples.

What could be the problem?

Hi @prince. It seems like there are some issues with your input Feature Table, but without knowing more about your data I can’t really help you troubleshoot this. You should check that you have all the samples and ASVs you expect, and that the read count values are sensible. Otherwise, I’d recommend trying to run the plugin with an example FeatureTable provided in one of the tutorials to see if it’s an issue with the plugin or with your data. That would be where I’d start.

2 Likes

Hi @cduvallet, I tried the plugin with an example Feature table from https://github.com/Gibbons-Lab/ccmb_workshop . Had the same issues.
But using --p-otu-thresh 0.0 gave no issues with both my data and the example data.
I tried 0.1 and still had “nan”. I guess the problem is with the --p-otu-thresh. How bad is using --p-otu-thresh 0.0 ?

Sorry to hear there are still issues, @prince. I’m currently traveling, so apologies for the delay in responding. If you’d like faster responses, it’s helpful to provide exactly what data you used and commands you ran, so I can quickly try to reproduce the error on my computer. You should also include what versions of qiime2 and the plugin you are using.

I just ran the following command using the crc_metadata.tsv and crc_relative.qza files provided in @cdiener’s workshop repo and did not get any errors. I’m using the 2019.1 version of q2-perc-norm on the 2019.1 version of qiime2.

qiime perc-norm percentile-normalize \
	--i-table crc_relative.qza \
	--m-metadata-file crc_metadata.tsv \
	--m-metadata-column disease_state \
	--m-batch-file crc_metadata.tsv \
	--m-batch-column study \
	--o-perc-norm-table percentile_normalized.qza

Lowering the OTU presence threshold to 0 is not really recommended. We discussed this on a previous post: Q2-perc-norm - reduction in taxa

Hello @cduvallet, thanks for the response. My issue is with the introduction of “nan” after percentile normalization.
I actually used the same data in my previous trial.
Anyway, I replicated your example. Attached are the percentile normalized files (first, with the default --p-otu-thresh and the other with --p-otu-thresh 0.0). I still found “nan” in percentile normalization with the default --p-otu-thresh after exporting to biom

I am using 2019.1 version of q2-perc-norm on q2cli version 2019.1.0.

Below are the commands I ran

qiime perc-norm percentile-normalize --i-table crc_relative.qza --m-metadata-file crc_metadata.tsv --m-metadata-column disease_state --m-batch-file crc_metadata.tsv --m-batch-column study --o-perc-norm-table percentile_normalized

qiime perc-norm percentile-normalize --i-table crc_relative.qza --m-metadata-file crc_metadata.tsv --m-metadata-column disease_state --m-batch-file crc_metadata.tsv --m-batch-column study --o-perc-norm-table percentile_normalized_thresh_0.0 --p-otu-thresh 0.0

percentile_normalized.qza (1.2 MB)
percentile_normalized_thresh_0.0.qza (1.9 MB)

Thank you

Oh, I see. OTUs which are not present in greater than --p-otu-thresh fraction of samples get converted to NaNs. This is not a bug, this is the expected behavior of the plugin.

I recommend keeping the behavior as is, and just removing the OTUs which are NaN before your next analysis step. Note that even though sparse OTUs aren’t converted to NaN when you use --p-otu-thresh, they are essentially converted to noise. So if you use these in downstream steps, they will in the best case be just noise and in the worst case introduce spurious signal to your analyses. Again, check out the previous post I linked for more discussion about this.

Let me know if you have any other questions!