Feature-classifier error 'not enough values to unpack'

Hi,

I am trying to import the HOMD database to assign taxonomy to my dataset in q2, but am getting the error 'not enough values to unpack (expected 2, got 0)'

I know there have been a lot of posts on this issue, but I have triple checked my files and they don't seem to fall into any of the previous categories (i.e. there are no #s, and the IDs match between the Feature[Taxonomy] and Feature[Sequence] files):

image

Below I have posted the entire error message, and here are the files I was using
HOMD-ref-seqs.qza (27.0 KB)
HOMD-taxonomy.qza (16.1 KB)

Any tips on how to troubleshoot would be greatly appreciated!

Thanks,
Lisa

/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_feature_classifier/classifier.py:101: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.19.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in callable_executor
output_views = self._callable(**view_args)
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 316, in generic_fitter
pipeline)
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 31, in fit_pipeline
y, X = list(zip(*data))
ValueError: not enough values to unpack (expected 2, got 0)

In case it is helpful, here is the shell script I ran to generate these files (the commands that worked successfully are hastagged out and I am getting the error on the last command):

#qiime tools import \

–type ‘FeatureData[Sequence]’ \

–input-path sample.fasta \

–output-path HOMD_otus.qza

#qiime tools import \

–type ‘FeatureData[Taxonomy]’ \

–input-format HeaderlessTSVTaxonomyFormat \

–input-path HOMD_16S_rRNA_RefSeq_V15.1.TAXONOMY.txt \

–output-path HOMD-taxonomy.qza

#qiime feature-classifier extract-reads \

–i-sequences HOMD_otus.qza \

–p-f-primer GTGCCAGCMGCCGCGGTAA \

–p-r-primer GGACTACHVGGGTWTCTAAT \

–p-trunc-len 150 \

–o-reads HOMD-ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads HOMD-ref-seqs.qza
–i-reference-taxonomy HOMD-taxonomy.qza
–verbose
–o-classifier HOMD-classifier.qza

Hi @cmarotz,
It looks like the issue here may be because your taxonomies are non-hierarchical. I.e., you have a single species label for each sequence, rather than the taxonomic lineage, which this classification method expects.

The classifiers in q2-feature-classifier usually expect a hierarchical taxonomic lineage (semicolon-delimited), because they are designed to determine the correct lineage instead of the top hit in the reference database.

HOWEVER, you can still achieve that (just be aware that you are very possibly going to retrieve some false positives with this approach): use classify-sklearn with the confidence parameter set to -1, or use classify-consensus-blast with the maxaccepts paramater set to 1. That will find the top match for each query sequence and should work with your non-hierarchical taxonomy data.

If you do still want to use classify-sklearn to classify the lineage, rather than top hit species, you will need to fill out your reference taxonomy to include at least some other taxonomic information (e.g., family, genus).

Got it. Thanks so much for your help!

1 Like