I am trying to import the HOMD database to assign taxonomy to my dataset in q2, but am getting the error 'not enough values to unpack (expected 2, got 0)'
I know there have been a lot of posts on this issue, but I have triple checked my files and they don't seem to fall into any of the previous categories (i.e. there are no #s, and the IDs match between the Feature[Taxonomy] and Feature[Sequence] files):
Below I have posted the entire error message, and here are the files I was using
HOMD-ref-seqs.qza (27.0 KB)
HOMD-taxonomy.qza (16.1 KB)
Any tips on how to troubleshoot would be greatly appreciated!
/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_feature_classifier/classifier.py:101: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.19.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
Traceback (most recent call last):
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in callable_executor
output_views = self._callable(**view_args)
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 316, in generic_fitter
File "/home/lisa/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 31, in fit_pipeline
y, X = list(zip(*data))
ValueError: not enough values to unpack (expected 2, got 0)
In case it is helpful, here is the shell script I ran to generate these files (the commands that worked successfully are hastagged out and I am getting the error on the last command):
#qiime tools import \
–type ‘FeatureData[Sequence]’ \
–input-path sample.fasta \
#qiime tools import \
–type ‘FeatureData[Taxonomy]’ \
–input-format HeaderlessTSVTaxonomyFormat \
–input-path HOMD_16S_rRNA_RefSeq_V15.1.TAXONOMY.txt \
#qiime feature-classifier extract-reads \
–i-sequences HOMD_otus.qza \
–p-f-primer GTGCCAGCMGCCGCGGTAA \
–p-r-primer GGACTACHVGGGTWTCTAAT \
–p-trunc-len 150 \
qiime feature-classifier fit-classifier-naive-bayes
It looks like the issue here may be because your taxonomies are non-hierarchical. I.e., you have a single species label for each sequence, rather than the taxonomic lineage, which this classification method expects.
The classifiers in
q2-feature-classifier usually expect a hierarchical taxonomic lineage (semicolon-delimited), because they are designed to determine the correct lineage instead of the top hit in the reference database.
HOWEVER, you can still achieve that (just be aware that you are very possibly going to retrieve some false positives with this approach): use
classify-sklearn with the
confidence parameter set to -1, or use
classify-consensus-blast with the
maxaccepts paramater set to 1. That will find the top match for each query sequence and should work with your non-hierarchical taxonomy data.
If you do still want to use
classify-sklearn to classify the lineage, rather than top hit species, you will need to fill out your reference taxonomy to include at least some other taxonomic information (e.g., family, genus).
Got it. Thanks so much for your help!