Hi!
I'm new to qiime2, so I'm not sure if I'm having problems because I couldn't find instructions for importing/creating a SeppReferenceDatabase.
Goal:
Use a customized subset of Silva 138 NR99 for sepp fragment insertion.
What I've done:
Using arb, exported the aligned fasta for the high quality seqs I want to use, after trimming to just the relevant 16S region, and exported the corresponding tree.
Followed Siavash's steps to clean up the tree, resolve polytomies, and re-estimate branch lengths with RAxML.
I know that a 'SeppReferenceDatabase' must include the alignment, tree, and info about the tree obtained from RAxML, and by unzipping sepp-refs-gg-13-8.qza (and after a couple errors) I deduced that those 3 elements should be named 'aligned-dna-sequences.fasta', 'tree.nwk', and 'raxml-info.txt'. I also deduced that the info should be the binary info file produced by RAxML, not the human-readable version. So I made a directory with those files:
$ ls -lh Sepp_Ref_Data_Files/
total 532M
-rw-r--r-- 1 dethlefs microbio 519M Sep 27 16:15 aligned-dna-sequences.fasta
-rw-r--r-- 1 dethlefs microbio 49K Sep 27 16:22 raxml-info.txt
-rw-r--r-- 1 dethlefs microbio 13M Sep 27 16:14 tree.nwk
Command I tried:
$qiime tools import --type SeppReferenceDatabase --input-path Sepp_Ref_Data_Files/ --output-path Sil138_BAhq_SeppRefDB.qza
Error I got:
$ qiime tools import --type SeppReferenceDatabase --input-path Sepp_Ref_Data_Files/ --output-path Sil138_BAhq_SeppRefDB.qza
Traceback (most recent call last):
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2cli/builtin/tools.py", line 158, in import_data
view_type=input_format)
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/result.py", line 241, in import_data
validate_level='max')
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/sdk/result.py", line 267, in _from_view
result = transformation(view, validate_level)
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/core/transform.py", line 68, in transformation
self.validate(view, validate_level)
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/core/transform.py", line 143, in validate
view.validate(level)
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/plugin/model/directory_format.py", line 171, in validate
getattr(self, field)._validate_members(collected_paths, level)
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/plugin/model/directory_format.py", line 101, in _validate_members
self.format(path, mode='r').validate(level)
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/qiime2/plugin/model/file_format.py", line 25, in validate
self.validate(level)
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/site-packages/q2_fragment_insertion/_format.py", line 63, in validate
info = self.path.read_text()
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/pathlib.py", line 1197, in read_text
return f.read()
File "/home/dethlefs/miniconda3/envs/qiime2-2020.8/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8a in position 9608: invalid start byte
An unexpected error has occurred:
'utf-8' codec can't decode byte 0x8a in position 9608: invalid start byte
See above for debug info.
###---### END ERROR MSG ###---###
I'm running a new conda install of version 2020.8 on a CentOS workstation with 80 cores and 1.5TB memory. (Helpful for RAxML.)
I wonder if the error just implies a corrupted file. But before I do the rather lengthy re-running of the RAxML steps in Sriavash's instructions, I thought I'd make sure that these steps would be expected to work.
Thanks!!
Les