Hello!
I tried now for days trimming the ITS database for the rogue sequences but I always end up with an error. Apparently there is no qiime plug-in for filtering taxa tables (FeatureData[Taxonomy]) for certain taxa like there is qiime taxa filter-seqs for filtering sequences based on their taxonomic assignment.
That’s why I tried to trim the unidentified taxa out of the fasta ref and tax ref file using R. It worked quite well and I was able to import the trimmed sequences in Qiime artifact. When trying to import the taxonomy file I get an error. I am sure it is something which went wrong during file conversion/import/export to R as the error mentions utf-8 code problem:
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/tools.py", line 146, in import_data
view_type=input_format)
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/result.py", line 240, in import_data
return cls.from_view(type, view, view_type, provenance_capture)
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/result.py", line 265, in _from_view
result = transformation(view)
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/core/transform.py", line 70, in transformation
new_view = transformer(view)
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/core/transform.py", line 220, in wrapped
file_view = transformer(view)
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_types/feature_data/_transformer.py", line 177, in _20
_taxonomy_formats_to_dataframe(str(ff), has_header=False))
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_types/feature_data/_transformer.py", line 51, in _taxonomy_formats_to_dataframe
header=None, dtype=object)
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/home/kschaefe/.conda/envs/qiime2-2019.1/lib/python3.6/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1094, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1119, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1240, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas/_libs/parsers.pyx", line 1256, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx", line 1494, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 109: invalid start byte
An unexpected error has occurred:
'utf-8' codec can't decode byte 0xf9 in position 109: invalid start byte
Is there any other more elegant way to trim those rogue taxa out of my taxonomy file. Did I miss some magic plugin which can do it? Sorry for bringing up another issue here...