Maturity Index function in sample classifier

rboutin · March 16, 2018, 1:14pm

Hi there,
I am trying to use the sample classifier maturity index function but am getting a "Plugin error from sample-classifier: 287.0" response and the debug information file reads:

Traceback (most recent call last):
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/commands.py", line 246, in call
results = action(**arguments)
File "", line 2, in maturity_index
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
output_types, provenance)
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 424, in callable_executor
ret_val = self._callable(output_dir=temp_dir, **view_args)
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_sample_classifier/classify.py", line 139, in maturity_index
metadata, predicted_column, column, group_by, control)
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_sample_classifier/utilities.py", line 465, in _maz_score
_median, _std = medians[metadata.loc[i][column]]
KeyError: 287.0

My code is as follows:
qiime sample-classifier maturity-index
--i-table NoDups_table.qza
--m-metadata-file Metadata_March618_rmNA.txt
--p-column age_at_visit
--p-group-by Disease_status
--p-control Control
--o-visualization maturity.qzv

Age at visit is a number in days ranging from ~90-400.

I was also wondering if the values in the --p-group-by column can be numbers, or if only characters accepted?

Any help would be much appreciated. Thanks!

Nicholas_Bokulich · March 16, 2018, 1:50pm

Hi @rboutin,
Thanks for reporting this.

See the notes/warnings here. maturity-index is a really cranky method that is in much need of improvement. In particular, the issue here is that your Age value sounds like it is continuously distributed across samples. You should "bin" that category, e.g., into 10-day bins (or whatever makes the most sense) and then re-analyze with the --p-stratify parameter. If each age is already replicated multiple times then binning is not necessary — just run with stratify.

The issue here is that this method needs adequate replication across each target value, because samples are split into training/testing groups for predicting age-specific traits — and without adequate replication there may be no control group at age X.

(there are some changes I plan to make that should partially alleviate this issue; I hope to get around to this in one of the next 2 releases of QIIME2).

I believe numbers should be fine — let me know if you get an error when you do that. Again, just as long as there is adequate replication and a "control" group that you can specify in this column.

I hope that helps!

rboutin · March 21, 2018, 2:36pm

Hi Nicholas,
Thanks very much for your help with this! I tried binning my Age value so that each number was present in at least 2 samples but am now getting the KeyError 7.0, which reads:
Traceback (most recent call last):
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/commands.py", line 246, in call
results = action(**arguments)
File "", line 2, in maturity_index
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
output_types, provenance)
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 424, in callable_executor
ret_val = self._callable(output_dir=temp_dir, **view_args)
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_sample_classifier/classify.py", line 139, in maturity_index
metadata, predicted_column, column, group_by, control)
File "/home/rboutin/miniconda3/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_sample_classifier/utilities.py", line 465, in _maz_score
_median, _std = medians[metadata.loc[i][column]]
KeyError: 7.0

If I have a range of age bins from 1-5 now, is this too few? The error key is different now so I'm wondering if the problem could be somewhere else? I've already removed any NAs from the data set. Thanks in advance!

Nicholas_Bokulich · March 21, 2018, 2:53pm

2 per bin is too few. Try minimum 4-5 per bin.

How many samples do you have? This method is only going to work meaningfully if you have a large number samples... like 50 at the absolute minimum but ideally several hundred. The number of bins is not going to matter as much as the number of samples per bin.

This is the same error but triggered by a different value (now that you have binned your ages).

You will need to increase the number of samples per bin, or drop bins that are underrepresented. E.g., the test dataset in the tutorial contains a few hundred samples with age measured by month of life (creating a natural "bin"); months with too few samples were filtered out.

I hope that helps!

rboutin · March 22, 2018, 6:39pm

Hi Nicholas,
Thanks again for your help-I finally got it to work! With regards to the output, does the value in the "feature importance" indicate a p-value?

Thanks in advance!

Nicholas_Bokulich · March 23, 2018, 4:45pm

No, feature importance is not a P-value. Feature importance indicates the relative importance of each feature for predictive accuracy. The higher the importance, the more important that feature is for making predictions.

Precisely how these are calculated depends on the estimator that you are using. See here for some more details (for random forest estimators).

system · April 23, 2018, 10:45pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.