One example of reference sequence file is as follows.
gb|AF004680_2001_Millner,_P.D._etunicatum_no value set
ATTATAAAATTTTTCATATATTAAATTTATTTTTAATATATAAAATTTATATAAAAATGTATTCAAAACCCACACTCTTT
ATAACCATATAAAAAAACAATTATTATATCTTGTATATAATATAAAAAAAACAACTTTCAACAACGGATCTCTTGGCTCT
CGCATCGATGAAGAACGCAGCGAATTGCGATAAGTAATGTGAATTGCAGAATTACGTGAATCATCGAATCTTTGAACGCA
TATTGCACTCTCTGGTAATCCGGGGAGTATGCCTGTTTGAGGGTCAGTAAATAATAATTATCATGATCTTTTTGATTGTG
GAATTGGGCCTTTATTTCATTAACGATTTATGGCCTCAAAATTATTTTACCGCTTGTTTAATATGAAATTCGACCGAATG
GAGCAATTAAACAAATTCTCTCGTTAGGCGGATTCTCATCAAGCAATTACGATTTTTTGGCCGTCAAAGCATTTTTTACG
AGTGCTTGGCTGGGATCGTAAGATTCATTAACAATGACCTCAAATCAGGCAAGAATACCCGCTGAACTTAAGCATATCAA
I would like to use this reference sequence file to analyze with Qiime2.
But, I don't know how to convert this file to form that can be used for Qiime2.
If you can convert on Qiime 2, please let me know how to do that.
/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/classifier.py:101: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.19.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/commands.py", line 246, in call
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 363, in callable_executor
output_views = self._callable(**view_args)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 310, in generic_fitter
pipeline)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 31, in fit_pipeline
y, X = list(zip(*data))
ValueError: not enough values to unpack (expected 2, got 0)
The sequence labels must match the taxonomy labels exactly.
For instance, you have a sequence labelled with “HM162335” in the taxonomy file but as “HM162335_2011_Kivlin,_S._N._sp._no value set” in the sequence file. If you change the label in the sequence file to “HM162335”, and do the same for every other sequence (with the appropriate label), it should work.
Please let me know if you run into any more problems.