Sample classifier error: Shape of passed values is (73, 1), indices imply (73, 2)


I am trying to run the sample classifier plug-in but I get I am getting the following error:

Shape of passed values is (73, 1), indices imply (73, 2)

For context: my feature-table has 365 samples but my metadata has 400 samples. I have less samples in the feature-table because I have filtered for low abundance and low frequency of features. Is this possibly the origin of the problem?

Moreover, I am trying to classify between two categorical variables and there are at least 50 samples of each categorical variable.

Any idea what it could be?


Hi @Pablo_V,
Could you please post the full error message? See the log file that is generated, or re-run the command with --verbose. Thanks!

Hi @Nicholas_Bokulich

Thanks a lot for your fast response!

Attached you will find the log when I run --verbose. It is a bit long to post here.

Pablo classifier-error-log.txt (32.7 KB)

And what is the command that you used?

It looks like the error is due to the estimator type that you are using — instead of LinearSVC you should use something else; if you share the command you used so I can confirm the details, I will open a ticket so that LinearSVC fails more gracefully in the future. Thanks!

So I am using this command:

qiime sample-classifier classify-samples --i-table table-noblanks.qza --m-metadata-file metadata_rockwool_noblanks.tsv --m-metadata-column Diagnostic --p-optimize-feature-selection --p-parameter-tuning --p-estimator LinearSVC --p-n-estimators 20 --p-random-state 123 --output-dir classifierLVC --verbose

I am using LinearSVC because it worked with a similar dataset before. The only difference is that this time the dataset has no mock community samples included.

Yeah digging into the code for this it looks like this should work (or at least LinearSVC runs fine with the test data), so I will investigate more, this may be an issue with the latest version of scikit-learn.

In the meantime I think the only workaround will be to use a different estimator.

Alright. I have checked with the other estimators and works well. However, the confusion matrix is not as good as with LinearSVC on my other very similar dataset.

Thanks a lot for your help!

This may be a version issue, I have yet to examine; another workaround may be to use an older version of QIIME 2 (ideally the version used on your other dataset, which was probably before ROC curves were added to the confusion-matrix action).