I have a quick question regarding training-feature-classifier. I had previously trained a classifier using qiime2-2019.4, but I figure since the new update came out, I might as well train a new classifier. I am doing all the work on the cluster, so it should work, but I am having an error stating:
There was a problem importing ------
------sh_refs_qiime_ver8_dynamic_s_02.02.2019_dev.fasta is not a(n)
------Invalid characters on line 538 (does not match IUPAC characters for a
This is the code that I ran, which is a code I had used before, but for some reason it is not working.
I attempted to replicate your issue by running the bash script you posted, and received the following error on a completely different line from you or @veeraku!
When I pop the file open in vim and go to line 67868, I notice there is a leading space.
This makes me wonder if perhaps you have some trailing whitespace characters on line 548 that are being picked up as non DNA characters, and that's why your import is failing on that line without there being any visibly incorrect characters?
Additionally, when I run the qiime import command without running that awk command on the files, I can successfully import the original file,
and the leading space is not present in the original file.
This leads me to believe there is something in that awk command that is somehow inserting unwanted characters into the file. May I ask why you run that command? Your comment says it is to change the format of the files, but I'm not sure I understand the manner in which it is meant to change the format as the files are already uppercase.
When I import the files as you suggested, I continue to get an error. I am working with the developer files, so I need to make sure that the files are all in uppercase format, so per a different post, I ran the awk command.
After rerunning the code, I did get the same line number error as you did. I went ahead and removed the leading spaces using sed “s/^[ \t]*//” -i and the file was able to import.
@veeraku (Hope this helps) This is the code I ran, in case anyone needs it (I was not sure how to pipe it as one line, but it works)
Thanks for your code @Fabs. But when I run sed ‘67868 s/ //g’ -i its_correct_reference_sequences.fasta > its_correct_refer_sequences.fasta or sed ‘67868 s/^[ \t]*//g’ -i its_correct_reference_sequences.fasta > its_correct_refer_sequences.fasta, I always get a empty file. Can you give me some suggestion?
Thanks very much @Nicholas_Bokulich. It works. I found that the UNITE reference sequences file can be smoothly imported in qiime2-2018.11 but it met the “blank” problem in the qiim2-2019.10. Is the new version stricter for the format of sequences?
Thanks for your reply @Nicholas_Bokulich. The UNITE reference sequence was dowload from https://plutof.ut.ee/#/doi/10.15156/BIO/786349 and then the file was unzipped. I used sh_refs_qiime_ver8_dynamics_s_02.02.2019_dev.fasta file. I used the following command to import reference sequences into two qiime version: qiime tools import --type ‘FeatureData[Sequence]’ --input-path sh_refs_qiime_ver8_dynamics_s_02.02.2019_dev.fasta --output-path its-refer.qza