How to convert the reference file to a form that can be used for Qiime2

Hello,

I got reference sequence file from MaarjAM.

One example of reference sequence file is as follows.

gb|AF004680_2001_Millner,_P.D._etunicatum_no value set
ATTATAAAATTTTTCATATATTAAATTTATTTTTAATATATAAAATTTATATAAAAATGTATTCAAAACCCACACTCTTT
ATAACCATATAAAAAAACAATTATTATATCTTGTATATAATATAAAAAAAACAACTTTCAACAACGGATCTCTTGGCTCT
CGCATCGATGAAGAACGCAGCGAATTGCGATAAGTAATGTGAATTGCAGAATTACGTGAATCATCGAATCTTTGAACGCA
TATTGCACTCTCTGGTAATCCGGGGAGTATGCCTGTTTGAGGGTCAGTAAATAATAATTATCATGATCTTTTTGATTGTG
GAATTGGGCCTTTATTTCATTAACGATTTATGGCCTCAAAATTATTTTACCGCTTGTTTAATATGAAATTCGACCGAATG
GAGCAATTAAACAAATTCTCTCGTTAGGCGGATTCTCATCAAGCAATTACGATTTTTTGGCCGTCAAAGCATTTTTTACG
AGTGCTTGGCTGGGATCGTAAGATTCATTAACAATGACCTCAAATCAGGCAAGAATACCCGCTGAACTTAAGCATATCAA

I would like to use this reference sequence file to analyze with Qiime2.

But, I don't know how to convert this file to form that can be used for Qiime2.

If you can convert on Qiime 2, please let me know how to do that.

Thanks
Kohei

Hello Kohei,

You can import your data with the FASTA format into QIIME.
In this example, you should convert it to the below.

A query (description line of the sequence) must begin with the character “>”, and
the following line is the sequence data (AGCT).

>gb|AF004680_2001_Millner,_P.D._etunicatum_no_value_set
ATTATAAAATTTTTCATATATTAAATTTATTTTTAATATATAAAATTTATATAAAAATGTATTCAAAACCCACACTCTTTATAACCATATAAAAAAACAATTATTATATCTTGTATATAATATAAAAAAAACAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAATTGCGATAAGTAATGTGAATTGCAGAATTACGTGAATCATCGAATCTTTGAACGCATATTGCACTCTCTGGTAATCCGGGGAGTATGCCTGTTTGAGGGTCAGTAAATAATAATTATCATGATCTTTTTGATTGTGGAATTGGGCCTTTATTTCATTAACGATTTATGGCCTCAAAATTATTTTACCGCTTGTTTAATATGAAATTCGACCGAATGGAGCAATTAAACAAATTCTCTCGTTAGGCGGATTCTCATCAAGCAATTACGATTTTTTGGCCGTCAAAGCATTTTTTACGAGTGCTTGGCTGGGATCGTAAGATTCATTAACAATGACCTCAAATCAGGCAAGAATACCCGCTGAACTTAAGCATATCAA

Best regards,
Ohmiyajohn

Thank you for your reply!

I succeeded convert sequence file.

I analyzed 'qiime feature-classifier' using following file.

ref-seqs.qza (8.9 KB)
ref-taxonomy.qza (7.3 KB)

But, I got error.

/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/classifier.py:101: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.19.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)
Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2cli/commands.py", line 246, in call
results = action(**arguments)
File "", line 2, in fit_classifier_naive_bayes
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 228, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/qiime2/sdk/action.py", line 363, in callable_executor
output_views = self._callable(**view_args)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 310, in generic_fitter
pipeline)
File "/home/qiime2/miniconda/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 31, in fit_pipeline
y, X = list(zip(*data))
ValueError: not enough values to unpack (expected 2, got 0)

How can I solve it?

Thanks
Kohei

Hi Kohei,

The sequence labels must match the taxonomy labels exactly.

For instance, you have a sequence labelled with “HM162335” in the taxonomy file but as “HM162335_2011_Kivlin,_S._N._sp._no value set” in the sequence file. If you change the label in the sequence file to “HM162335”, and do the same for every other sequence (with the appropriate label), it should work.

Please let me know if you run into any more problems.

Thanks,
Ben

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.