Feature Classifier - Not enough values to unpack

bentsch · March 26, 2018, 2:57pm

Hi Everyone,

I am having a problem with training the feature-classifier for a specific database (phytoref). My issue is very similar to the one posted by another user [Plugin error from feature-classifier: not enough values to unpack] but unfortunately that issue never got resolved.

Here is my command:
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads phytoref_16S_ref-seqs.qza --i-reference-taxonomy phytoref_taxonomy.qza --o-classifier phytoref_409nt_classifier.qza

And I get the following output message:
Plugin error from feature-classifier:

not enough values to unpack (expected 2, got 0)

Debug info has been saved to /var/tmp/pbs.566300.hpcnode1/qiime2-q2cli-err-09wthw34.log

Looking into the log file reveals the following:
more /var/tmp/pbs.566300.hpcnode1/qiime2-q2cli-err-09wthw34.log
/shared/c3/apps/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/classifier.py:101: UserWarning: The Taxonomic
Classifier artifact that results from this method was trained using scikit-learn version 0.19.1. It cannot be used with other versions of sci
kit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/shared/c3/apps/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/commands.py", line 246, in call
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/shared/c3/apps/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
output_types, provenance)
File "/shared/c3/apps/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 363, in callable_executor
output_views = self.callable(**view_args)
File "/shared/c3/apps/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 310, in generic
fitter
pipeline)
File "/shared/c3/apps/miniconda2/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 31, in fit_pipeline
y, X = list(zip(*data))
ValueError: not enough values to unpack (expected 2, got 0)

I have attached the two artefact files used in the command. Any help on this issue would be very much appreciated.

Thanks a lot,

Benni

phytoref_taxonomy.qza (17.1 KB)
phytoref_16S_ref-seqs.qza (129.7 KB)

bentsch · March 27, 2018, 2:32pm

Hi again,

I figured out what the problem was. The imported taxonomy file is not correct and does not contain the taxonomy strings. I previously had issues importing the taxonomy using the following command:

qiime tools import --type 'FeatureData[Taxonomy]' --source-format HeaderlessTSVTaxonomyFormat --input-path /shared/c3/bio_db/Phytoref/qiime2.tax.txt --output-path phytoref_taxonomy.qza

I got an error saying that the file I am trying to import is not headerless, even though it clearly didn't have a header, see the first few lines of the qiime2.tax.txt file:

202#HE610155 Eukaryota;Archaeplastida;Chlorophyta;Ulvophyceae;Ulvophyceae_X;Ulvophyceae_XX;Ulvophyceae_XXX;Ulvophyceae_XXXX;Desmochloris;Desmochloris_halophila;
803#AF514849 Eukaryota;Stramenopiles;Ochrophyta;Bacillariophyta;Bacillariophyceae;Naviculales;Naviculales_X;Naviculaceae;Haslea;Haslea_crucigera;

My workaround of this problem was actually introducing a header line, after that the above command ran without error and produced an output file (which I now know is not correct).

A colleague of mine figured out that the error during importing the taxonomy file was related to the '#' character in the identifiers. We replaced the '#' with an "_" underscore (also in the sequence file) and everything worked.

So it seems as if '#' characters should be avoided in the identifiers of taxonomy files. The '#' is part of all identifiers in the phytoref database (http://phytoref.sb-roscoff.fr/static/downloads/PhytoRef_with_taxonomy.fasta) which can be used to assess eukaryotik species based on their chloroplast 16S.

ebolyen · March 27, 2018, 5:14pm

Excellent debugging @bentsch!

It looks like we have a related issue for this when working with SILVA.

This also explains why the HeadlerlessTSVTaxononyFormat didn't work, as your first ID contains that "comment" character. The error maybe wasn't so helpful, but the computer was very confused at that point.

I think our future goal is to make it so that if # is treated as a comment (which is probably a debatable point in this context), it would only work if it was the first character on a line. So this might be a comment:

feature-id<tab>some;taxa;string
# this is a comment
fieature-id2<tab>some;other;string

while this would not be a comment:

feature#id<tab>some;taxa;string#1
feautre#id2<tab>some;taxa;string#2

We'll try and follow up here when we fix this issue!

system · April 27, 2018, 11:14pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.