RIM-DB classifier

SoilRotifer · November 15, 2022, 5:39pm

Let's break down these error's:

The "not a DNAFASTAFormat file" error is telling you that the file does not conform to QIIME 2's DNAFASTAFormat type. Which essentially means that the sequence data you are importing must only contain valid DNA (not RNA) IUPAC nucleotides, and must be capitalized. Check to make sure there are no special characters and/or no lower-case characters in your DNA sequences. You can look into the following threads for more info on changing to uppercase:

That is you can use seqkit:

conda install seqkit
seqkit seq db.fasta --upper-case -w 0 > db-upper.fasta

or bioawk:

conda install -c bioconda bioawk
bioawk -c fastx '{print ">" $name;  print toupper($seq)}' db.fasta > db-upper.fasta

For "ID on line 831 is a duplicate of another ID on line 829." and "Taxonomy format feature IDs must be unique." In QIIME 2, all IDs must be unique. I would make sure that there are no underscores (_) in the ID names. Some various tools and code wrapped by QIIME 2 will default reading any text prior to the first _ as being the ID and discard anything else afterwards. Thus, we recommend that all IDs follow this schema, to avoid mis-reading of data labels / IDs.