Import custom taxonomy

jose_gacia · December 16, 2020, 12:36pm

qiime2-2020.8

Hi everyone, due to the poor taxonomic resolution of my 18S analysis with SILVA, I performed a hand-made BLAST search against NCBI database of the most frecuent OTUs in my samples. With that, I constructed a custom taxonomy in Excel, I saved It tabulated and looks like this

1a399ae40274a80231de62c4ce7494e6        Unidentified
eead39ba0c5b281c44a4ab13cdce761a        Unidentified
96e9b265b30a6db7d50ab05636e07165        Unidentified
36aac0a0dbbef54db69f47a157ec4f07        Eukaryota;Fungi;Dikarya;Ascomycota;Pezizomycotina;Dothideomycetes;Dothideomycetidae;Cladosporiales;Cladosporiaceae

And tried to import It in the following way

qiime tools import
--type 'FeatureData[Taxonomy]'
--input-format HeaderlessTSVTaxonomyFormat
--input-path blast_taxonomy_18s.txt
--output-path ref-taxonomy.qza

But I get this error

Traceback (most recent call last):
  File "/home/superuser/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2cli/builtin/tools.py", line 158, in import_data
    view_type=input_format)
  File "/home/superuser/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/result.py", line 241, in import_data
    validate_level='max')
  File "/home/superuser/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/result.py", line 267, in _from_view
    result = transformation(view, validate_level)
  File "/home/superuser/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/core/transform.py", line 68, in transformation
    self.validate(view, validate_level)
  File "/home/superuser/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/core/transform.py", line 143, in validate
    view.validate(level)
  File "/home/superuser/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/plugin/model/file_format.py", line 33, in validate
    if not self.sniff():
  File "/home/superuser/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2_types/feature_data/_format.py", line 48, in sniff
    line = fh.readline()
  File "/home/superuser/miniconda3/envs/qiime2-2020.8/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
  File "/home/superuser/miniconda3/envs/qiime2-2020.8/lib/python3.6/encodings/utf_8_sig.py", line 69, in _buffer_decode
    return codecs.utf_8_decode(input, errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 5353: invalid start byte

An unexpected error has occurred:

  'utf-8' codec can't decode byte 0xa0 in position 5353: invalid start byte

See above for debug info.

As I understand there's seem to be some non-ascii character. So does anybody have some soggestion on how to save my taxonomy so I can import It?

SoilRotifer · December 16, 2020, 4:51pm

Hi @jose_gacia,

What command did you run when trying to classify your reads with SILVA? When we observe many poor taxonomy assignments like this, it is typically due to mixed read orientation. I'd recommend using vsearch as outlined here:

Or installing the RESCRIPt plugin, and running rescript orient-seqs on your reads. Then retry the BLAST and/or naíve-bayes classifiers.

-Mike

Nicholas_Bokulich · December 16, 2020, 5:46pm

Hi @jose_gacia,
To add to @SoilRotifer's advice regarding improving taxonomy classification:

Excel is the issue; it is the most likely culprit, inserting special characters into the file that violate the format requirements. You might be able to use mac2unix or a similar tool to automatically remove these special characters from the file... but avoid Excel altogether if possible would be the best way to avoid these issues.

Good luck!

jose_gacia · December 17, 2020, 11:30am

Hi @SoilRotifer

Thanks for your response, This actually made the job, do you recommend extracting the specifics reads from SILVA for my amplicon primers?

Also, I wish I had this kind of answer when I asked for ways of improving my taxonomic resolution (How to enrich my 18S extemophile metagenome - #7 by cherman2)

jose_gacia · December 17, 2020, 11:32am

Hi @Nicholas_Bokulich

Thanks for your response, I manage to find all the non-ascii characters, eliminate them, and more or less been able to import my custom taxonomy.

SoilRotifer · December 17, 2020, 2:48pm

If these are indeed amplicon sequences, then yes you can try to improve your taxonomy assignments by extracting the amplicon region. But I'd suggest running against the full-length SILVA db first, to make sure you do obtain better taxonomy. If so, then it may be worth your time to extract the amplicon region.

-Mike

jose_gacia · December 17, 2020, 4:14pm

I obtained way better results with the full-length SILVA, Ill try again extracting my reads, Thanks again!!

system · January 17, 2021, 10:14pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.