Sample classifier error: Shape of passed values is (73, 1), indices imply (73, 2)

Pablo_V · March 25, 2020, 2:50pm

Hi,

I am trying to run the sample classifier plug-in but I get I am getting the following error:

Shape of passed values is (73, 1), indices imply (73, 2)

For context: my feature-table has 365 samples but my metadata has 400 samples. I have less samples in the feature-table because I have filtered for low abundance and low frequency of features. Is this possibly the origin of the problem?

Moreover, I am trying to classify between two categorical variables and there are at least 50 samples of each categorical variable.

Any idea what it could be?

Cheers,
Pablo

Nicholas_Bokulich · March 25, 2020, 3:06pm

Hi @Pablo_V,
Could you please post the full error message? See the log file that is generated, or re-run the command with --verbose. Thanks!

Pablo_V · March 25, 2020, 3:44pm

Hi @Nicholas_Bokulich

Thanks a lot for your fast response!

Attached you will find the log when I run --verbose. It is a bit long to post here.

Cheers,
Pablo classifier-error-log.txt (32.7 KB)

Nicholas_Bokulich · March 25, 2020, 3:50pm

And what is the command that you used?

It looks like the error is due to the estimator type that you are using — instead of LinearSVC you should use something else; if you share the command you used so I can confirm the details, I will open a ticket so that LinearSVC fails more gracefully in the future. Thanks!

Pablo_V · March 25, 2020, 4:03pm

So I am using this command:

qiime sample-classifier classify-samples --i-table table-noblanks.qza --m-metadata-file metadata_rockwool_noblanks.tsv --m-metadata-column Diagnostic --p-optimize-feature-selection --p-parameter-tuning --p-estimator LinearSVC --p-n-estimators 20 --p-random-state 123 --output-dir classifierLVC --verbose

I am using LinearSVC because it worked with a similar dataset before. The only difference is that this time the dataset has no mock community samples included.

Nicholas_Bokulich · March 25, 2020, 4:41pm

Yeah digging into the code for this it looks like this should work (or at least LinearSVC runs fine with the test data), so I will investigate more, this may be an issue with the latest version of scikit-learn.

In the meantime I think the only workaround will be to use a different estimator.

Pablo_V · March 25, 2020, 8:10pm

Alright. I have checked with the other estimators and works well. However, the confusion matrix is not as good as with LinearSVC on my other very similar dataset.

Thanks a lot for your help!

Nicholas_Bokulich · March 25, 2020, 9:18pm

This may be a version issue, I have yet to examine; another workaround may be to use an older version of QIIME 2 (ideally the version used on your other dataset, which was probably before ROC curves were added to the confusion-matrix action).