Full UNITE+INSD dataset for Fungi

I was wondering about which database should we use for ITS sequence analysis, UNITE or UNITE+INSD dataset?
Is the UNITE+INSD dataset stable?

Secondly, I got an error while training the UNITE database

qiime tools import --type 'FeatureData[Sequence]' --input-path sh_refs_qiime_ver8_dynamic_s_02.02.2019_dev.fasta  --output-path unite.qza
Imported sh_refs_qiime_ver8_dynamic_s_02.02.2019_dev.fasta as DNASequencesDirectoryFormat to unite.qza

qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path sh_taxonomy_qiime_ver8_dynamic_s_02.02.2019_dev.txt --output-path unite-taxonomy.qza
Imported sh_taxonomy_qiime_ver8_dynamic_s_02.02.2019_dev.txt as HeaderlessTSVTaxonomyFormat to unite-taxonomy.qza

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads unite.qza --i-reference-taxonomy unite-taxonomy.qza --o-classifier classifier.qza
Plugin error from feature-classifier:

  Invalid characters in sequence: ['a', 'c'].
  Valid characters: ['Y', 'N', 'S', 'R', 'A', 'T', 'H', 'G', '-', 'M', 'D', '.', 'W', 'V', 'K', 'C', 'B']
  Note: Use `lowercase` if your sequence contains lowercase characters not in the sequence's alphabet.

Debug info has been saved to /tempFolder/qiime2-q2cli-err-n5gln933.log

Hi @shashankgpt,

That is really a matter of personal taste/scientific decision-making.

As far as I know the UNITE releases with UNITE+INSD are stable, but you should probably check the UNITE website for more specifics.

Yep, this error is quite common and is known to occur with some releases of UNITE. This tutorial has an example of using UNITE, including a bash one-liner to remove lower-case characters:

Good luck!

A post was split to a new topic: Unite 2020 for Qiime 2

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.