Dear all,
I used regress-samples (with RandomForest as the estimator) in order to identify the features contributing the most to a given Behavioral Score (Time spent in social contact).
"
qiime sample-classifier regress-samples
--i-table $data-social-table.qza
--m-metadata-file metadata.tsv
--p-test-size 0.05
--p-n-jobs 4
--m-metadata-column Time_in_Social_Contact
--p-estimator RandomForestRegressor
--p-random-state 123
--p-optimize-feature-selection
--output-dir $data-social-TimeSC
"
Then I took a peek at FeatureImportance, which was almost exactly what I was looking for. The only issue I have is that FeatureImportance does not specify whether a feature is contributing towards "high" or "low" "time spent in social contact". In other words, whether the coefficient for that feature is positive or negative.
I know that with categorical data I should be able to extract a vector of "feature contribution" with treeinterpreter from the RandomForest directly in Python.
I was hoping to do just that here.
Thus, I exported the Sample-estimator
"
qiime tools export
--input-path $data-social-TimeSC/sample-classifier.qza
--output-path $data-social-TimeSC/class
"
This produced a pipeline_sklearn.pkl which I should be able to import in Python.
However, every attempt at importing the pickled pipeline produced the following error :
(apologies for the screencap, i do not quite know how to format code on the forums).
I know so far that it isn't due to scikit-learn version as I have installed scikit-learn=0.23.1 which Qiime2 is using.
I can't seem to find a way to solve this issue on the qiime2 forums nor in scikit-learn related forums.
Does anyone know how to solve the import issue ?
Or otherwise, how to access the "orientation" of feature importance ?
The only solution I can think of is actually running the RandomForest classifier outside of Qiime and directly in R or python, which is much less convenient.
This is quite weird and I feel as though I am missing an obvious piece of the problem. Accessing the "sign" of "RandomForest regressor coefficients" seems like a basic request and I am puzzled not to find any related topics to that on the forums. So apologies if the answer is obvious.
As a side note, I would like to thank the mods and the community for the constant technical support.
This is the first time that I ever have a question for which I can't find an answer on the forums.