IUPAC character problem while importing UNITE dataset

qiime tools import
–type FeatureData[Sequence]
–input-path unite/developer/sh_refs_qiime_ver8_97_s_02.02.2019_dev.fasta
–output-path unite-ver8-99-seqs-02.02.2019.qza

when i run this command i get an error about IUPAC characters for dna sequence.

Then I used the command mentioned in this tutorial: Fungal ITS analysis tutorial

awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' unite/developer/sh_refs_qiime_ver8_97_s_02.02.2019_dev.fasta | sed -e '/^>/!s/\(.*\)/\U\1/;s/[[:blank:]]*$//' > unite/developer/sh_refs_qiime_ver8_97_s_02.02.2019_dev_uppercase.fasta

and i still get an error about IUPAC characters but from a different location. First error was about line 1388, second is about line 119870.

seems like the code above solves some of the issue but not all of it.
any help about this?

what error specifically are you getting? this is probably a different error or a different non-IUPAC character.

I checked this recently on the latest release of UNITE and it should work. (reposting from the tutorial since it looks like the code you quoted is not displaying correctly)

awk '/^>/ {print($0)}; /^[^>]/ {print(toupper($0))}' developer/sh_refs_qiime_ver7_99_01.12.2017_dev.fasta | sed -e '/^>/!s/\(.*\)/\U\1/;s/[[:blank:]]*$//' > developer/sh_refs_qiime_ver7_99_01.12.2017_dev_uppercase.fasta
2 Likes

sorry about being late on this topic but i was pretty sick to work :frowning:

did something change in “import” with the update? i was able to run it with 2019.4 but i got the error with 2019.7

nonetheless, i did the analysis successfully but id like to inform you if there is a problem due to the update.

In 2019.7 some of the validators were updated, but I am not sure that’s the issue — I just confirmed that this is working fine in 2019.7. Perhaps we have different versions of the database?

In any case, glad you are better now and got your data imported finally!

1 Like