Plugin Error: feature-classifier classify-sklearn

I tried using feature-classifier classify-sklearn for my tree. I believe the issue is with my fasta file since the example file provided in the tutorial page works.

qiime tools import
–type ‘FeatureData[Sequence]’
–input-path ~/MS_set6/reference-hit.seqs.fa
–output-path ~/MS_set6/refseqs.qza

qiime feature-classifier classify-sklearn
–i-classifier ~/MS_set6/gg-13-8-99-515-806-nb-classifier.qza
–i-reads ~/MS_set6/refseqs.qza
–o-classification ~/MS_set6/Taxonomy/MStaxonomy.qza

The Error (I have reloaded the .fa file generated from qiita and restarted my systems several time and get the same error message) reads…

Plugin error from feature-classifier:

Invalid character in sequence: b’g’. Valid characters: [‘K’, ‘R’,
‘H’, ‘N’, ‘S’, ‘Y’, ‘V’, ‘G’, ‘B’, ‘A’, ‘-’, ‘.’, ‘C’, ‘D’, ‘T’, ‘M’,
‘W’] Note: Use lowercase if your sequence contains lowercase
characters not in the sequence’s alphabet.

The log file reads…
/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/sklearn/feature_extraction/hashing.py:94: DeprecationWarning: the option non_negative=True has been deprecated in 0.19 and will be removed in version 0.21.
" in version 0.21.", DeprecationWarning)
/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/sklearn/feature_extraction/hashing.py:94: DeprecationWarning: the option non_negative=True has been deprecated in 0.19 and will be removed in version 0.21.
" in version 0.21.", DeprecationWarning)
Traceback (most recent call last):
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2cli/commands.py”, line 218, in call
results = action(**arguments)
File “”, line 2, in classify_sklearn
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 201, in callable_wrapper
output_types, provenance)
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 334, in callable_executor
output_views = callable(**view_args)
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2_feature_classifier/classifier.py”, line 184, in classify_sklearn
confidence=confidence)
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2_feature_classifier/_skl.py”, line 45, in predict
for chunk in _chunks(reads, chunk_size)) for m in c)
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py”, line 779, in call
while self.dispatch_one_batch(iterator):
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py”, line 620, in dispatch_one_batch
tasks = BatchedCalls(itertools.islice(iterator, batch_size))
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py”, line 127, in init
self.items = list(iterator_slice)
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2_feature_classifier/_skl.py”, line 44, in
(delayed(_predict_chunk)(pipeline, separator, confidence, chunk)
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2_feature_classifier/_skl.py”, line 97, in _chunks
chunk = list(islice(reads, chunk_size))
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/q2_types/feature_data/_transformer.py”, line 228, in iter
yield from self.generator
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/registry.py”, line 506, in
return (x for x in itertools.chain([next(gen)], gen))
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/registry.py”, line 531, in _read_gen
yield from reader(file, **kwargs)
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/registry.py”, line 1008, in wrapped_reader
yield from reader_function(fhs[-1], **kwargs)
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/format/fasta.py”, line 677, in _fasta_to_generator
**kwargs)
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/sequence/_grammared_sequence.py”, line 338, in init
self._validate()
File “/anaconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/sequence/_grammared_sequence.py”, line 362, in _validate
list(self.alphabet)))
ValueError: Invalid character in sequence: b’g’.
Valid characters: [‘K’, ‘R’, ‘H’, ‘N’, ‘S’, ‘Y’, ‘V’, ‘G’, ‘B’, ‘A’, ‘-’, ‘.’, ‘C’, ‘D’, ‘T’, ‘M’, ‘W’]
Note: Use lowercase if your sequence contains lowercase characters not in the sequence’s alphabet.

I have looked at the file with text edit and do not see a lowercase g on scanning, but the find function is not helpful since it shows me capital G in the sequence as well.

TIA

1 Like

Hi @callaband,
You are correct, this is almost certainly an issue with your fasta file. Before importing into QIIME 2, run the following command to convert lowercase to uppercase in the fasta file:

tr 'acgt' 'ACGT' < ~/MS_set6/reference-hit.seqs.fa > ~/MS_set6/reference-hit.seqs.uppercase.fa 

And then proceed with the import and classification commands that you wrote above. (That command assumes that the sequence IDs do not contain any lowercase acgt characters, so please check first to make sure that IDs are, e.g., numerical. If acgt characters are present, write back and we can do something a little more sophisticated)

We have an open issue related to this error so will provide better support for this issue in the future, e.g., to detect lowercase characters upon import.

I hope that solves your issue! Please let us know if you continue to have the same problem after converting to uppercase.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.