Hi, all,
I'm getting a funny error using the naive bayesian classifier on the eukaryote Unite database and I can't quite figure out a workaround. There doesn't seem to be anything on the forums like this, either. I'm running the Qiime2 container v. 2019.7 (I know, its old...) on Docker and have had no trouble when using the classifier on the fungal-only database. Furthermore, when I compare both databases (fasta and taxa files) to one another, they look exactly the same.
The import commands for the sequences and taxonomy files produce the correct .qza files. The issue is with this command:
qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads //c/Sequencing/Spike_R_2/UNITE_train_set_oom.qza --i-reference-taxonomy //c/Sequencing/Spike_R_2/UNITE_taxa_oom.qza --o-classifier //c/Sequencing/Spike_R_2/UNITE_classifier_BW_oom.qza --verbose
This results in:
/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/skbio/io/registry.py:548: FormatIdentificationWarning: <_io.BufferedReader name='/tmp/qiime2-archive-75mbqgx5/b9c2c328-b05f-47cd-b763-b13169c1657d/data/dna-sequences.fasta'> does not look like a fasta file
% (file, fmt), FormatIdentificationWarning)
Traceback (most recent call last):
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/q2cli/commands.py", line 327, in call
results = action(**arguments)
File "</opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/decorator.py:decorator-gen-349>", line 2, in fit_classifier_naive_bayes
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/action.py", line 229, in bound_callable
spec.view_type, recorder)
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/sdk/result.py", line 289, in _view
result = transformation(self._archiver.data_dir)
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/qiime2/core/transform.py", line 213, in wrapped
return transformer(view.file.view(self._wrapped_view_type))
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/q2_types/feature_data/_transformer.py", line 264, in _9
generator = _read_dna_fasta(str(ff))
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/q2_types/feature_data/_transformer.py", line 240, in _read_dna_fasta
return skbio.read(path, format='fasta', constructor=skbio.DNA)
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/skbio/io/registry.py", line 1161, in read
**kwargs)
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/skbio/io/registry.py", line 506, in read
return (x for x in itertools.chain([next(gen)], gen))
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/skbio/io/registry.py", line 531, in _read_gen
yield from reader(file, **kwargs)
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
yield from reader_function(fhs[-1], **kwargs)
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/skbio/io/format/fasta.py", line 675, in _fasta_to_generator
FASTAFormatError):
File "/opt/conda/envs/qiime2-2019.7/lib/python3.6/site-packages/skbio/io/format/fasta.py", line 849, in _parse_fasta_raw
"\n%s" % seq_header)
skbio.io._exception.FASTAFormatError: Found non-header line when attempting to read the 1st record:
>SH1140862.08FU_HM100661_reps_singleton
Plugin error from feature-classifier:
Found non-header line when attempting to read the 1st record:
>SH1140862.08FU_HM100661_reps_singleton
The "dna-sequences.fasta'> does not look like a fasta file" is frustrating, because it looks exactly like the other fasta file that works. Additionally, my first line is:
SH1140862.08FU_HM100661_reps_singleton
CTGAGCTGTCGACACGAGCTGTTGCTGGTCCTCAAACAAGGGGGCATGTGCACGCTCTGTTCACACATCTACTCACAGGTGCACCGTCTGTAGTTTTATGGTCTGGGGGACACACCGTCTTCCTCCCGTGGCTCTACGTCTTTACACACACATCGTAGTTAAGTTTTATGGAATGTGCATCGCTTTTAACGTAATACAATACAACTTTCAGCAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTTATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCCCCTTGGCTATTCCGAGGGGCATGCCTGTTTGAGTATCATGAACACCTCAACTCCTCATGTTTCCCGTGATGAGCTTGGACTTCTGGAGGTTTTGCTTACCTGCGGTCTCTCCTCTCAAACGCATCAGCTTGCCAGTGTTTGGTGGCATCACTGGTGAGATAACTATCTATGCTCGTGGCCGTCTGCCAGATAACCTTCAGCGATGGAGGTTTGCTTGAGCTCACAAAGGTCTTTCCACAGCCAAGACTGCTTTTTTAACTTTCGATCTCAAATCCCGTAGGACACCCGCTGAACCGTAGCTGACTAGCGCGCCTAA
Which is similar to the first line of the fasta file that works, down to the same delimiter:
SH1140860.08FU_HF674537_reps_singleton
CATTACCGAATTGTCGACACGAGTTGTTGCTGGTCCCCAAACGGGGGCACGTGCACGCTCTGTTTGTACATCCATTCACACCTGTGCACCCCATGTAGTTCTGTGGTTTGGGGGACTCTGTCCTCTCGCCGTGGTTCTATATCTTTACACACGCTCTGTAATAAAGTCTCATGGAATGTATGCAGCGTTTAACGCAATACAATACAACTTTCAGCAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCCCTTTGGCTATTCCGAAGGGCATGCCTGTTTGAGTATCATGAACACCTCAACTCTCATGGTTCGCCGTGATGAGCTTGGACTTTGGGGGTCTTGCTGGCCTGCGGTCGGCTCCCTTCAAATGAATCAGCTTTCCAGTGTTTGGTGGCATCACGGGTGTGATAAATATCTACGCTTGTGGTTTCCGGAGGATCATTTCCGAATTGGTGGCACGAAGTGGTGGTTGGTCCCAAACGGGGGCAAGTGCCCGGTTTGGTTGTACCATCCATTACCCCTTGGCACCCCNAGGAGGTTTGGGGGTTGGGGGGATTCGTTCTTTTGCCGGGGTTTTATATTTTTACCCCCGGTTTGTAATAAAATTTCCAGGAAAGGAAGCAGGGTTTAAAGCCATTCCATTCCAATTTTAGCAAAGGATTTTTTGGGTTTTGGCTTGGAGAAGGAAGCAAGGAAAATGGGTAAGTAAAGGGAAATGCCGAAATCAAGGAATTCTTGGATTTTTGAACGCCCCCTGGGCCCCTTGGGTTTTTCGAAGGGCCAGCCTGTTTGAGGATTCAGAACCCCTTAAATTTCCAGGTTTGCCGGGGGGAGGCTGGGACTTGGGGGTTCTGGTGGCCTGCGGTTGGCTCCCTTCAAAAGAATTCACTTTCCCAGGTTTGGGGGCCTCCCGGGGGGGAAAAAAATTNACGGCTGGGGGTTTCCGCCAGGTAACCTTCAGTGATGGAGGTTCGCTGGGGCTCATAAATGTCTCTCCTCAGCGAAGACAG
Does anyone have any thoughts about what I can do to fix this? I'm going to continue to play with the fasta file, since I've had to format it correctly from the version downloaded from Unite.
Cheers,