Error in feature-classifier fit-classifier-naive-bayes after using clawback empo_3:Animal distal gut

Hello QIIME users!

I need to improve my classifier and I'm going to test clawback, using this Tutorial: Using q2-clawback to assemble taxonomic weights

In my case, I would like samples of human feces so I chose to download empo_3:Animal distal gut.
I used this command
qiime clawback assemble-weights-from-Qiita --i-classifier Ncbi16Sv3v4-vNov2024-2024.5-classifier.qza --i-reference-taxonomy ncbi-refseqs-taxonomy.qza --i-reference-sequences ncbi-refseqs-v3v4.qza --p-metadata-key empo_3 --p-metadata-value "Animal distal gut" --p-context Deblur_2021.09-Illumina-16S-V3-V4-150nt-ac8c0b --o-class-weight feces-weights.qza

And I got the following result: Saved FeatureTable[RelativeFrequency] to: animalDistalGut-weights.qza

However, when I retrain the bank using this command:
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads ncbi-refseqs-v3v4.qza --i-reference-taxonomy ncbi-refseqs-taxonomy.qza --i-class-weight animalDistalGut-weights.qza --o-classifier animalDistalGut-classifier.qza

I have the following error:
Plugin error from feature-classifier:

'ascii' codec can't decode byte 0xcc in position 132: ordinal not in range(128)

Debug info has been saved to /var/folders/kp/4qtw7ssx5r31x88hc272t1mw_rkzbh/T/qiime2-q2cli-err-6ewlfdv7.log

I verified that the problem is in the animalDistalGut-weights.qza file, because when I remove the --i-class-weight parameter the command runs normally.

Could anyone help me with this error?

Additional information:
Mac-M1
Qiime version: 2024.5
Install

CONDA_SUBDIR=osx-64 conda env criar -n qiime2-amplicon-2024.5 --file https://data.qiime2.org/distro/amplicon/qiime2-amplicon-2024.5-py39-osx-conda.yml
conda ativar qiime2-amplicon-2024.5
conda config --env --set subdiretório osx-64

Hi @raquel_riyuzo ,

This error is indicating that there is a special character (0xcc = Ì) somewhere that cannot be interpreted. Based on your troubleshooting, it sounds like this is entering in from Qiita, or otherwise while the weights are being assembled.

If that is the case, I suggest exporting the data from that file and inspecting to see why this special character occurs at position 132, and if others are present in the file.

Otherwise see these related issues on the forum; it could be as simple as changing the character encoding:

https://forum.qiime2.org/search?q=%27ascii%27%20codec%20can%27t%20decode%20byte%20order%3Alatest_topic

1 Like

Thanks!
I am investigating the uncompressed file, I used the command:

qiime tools extract --input-path animalDistalGut-all-weight.qza --output-path extracted-weights-all

and I checked the feature-table.biom file in two ways, the first with the command:

biom validate-table -i extracted-weights-all/data/feature-table.biom

Resulting in:
Unknown table type, however that is likely okay.
The input file is not a valid BIOM-formatted file.

and
biom convert -i extracted-weights-all/data/feature-table.biom -o feature-table.json --to-json
biom convert -i extracted-weights-all/data/feature-table.biom -o feature-table.tsv --to-tsv

Both result in: TypeError: extracted-weights-all/data/feature-table.biom does not appear to be a BIOM file!

I suspect that the animalDistalGut-all-weight.qza file I downloaded earlier is corrupt. What do you think? Is there another way to obtain this data? I'm training my classifier to improve performance with human fecal samples.

Hi @raquel_riyuzo ,

You could try re-downloading the file. Though I am a little bit skeptical, because I think you would have received a different error with fit-classifier-naive-bayes. If the file were corrupted I would have expected a different/earlier error (i.e., failing to read the file).

Besides, from your initial post it looked like you were using assemble-weights-from-Qiita, not downloading pre-generated weights.

This could also be a biom-table format version issue, but I am not sure how to troubleshoot that/if it is possible to convert between biom table format versions.

If you are comfortable using Python you could also view the file as a pandas dataframe to inspect the issue.