Hi
I am struggle to train a classifier called RIM-DB (Rumen and Intestinal Methanogen-DB). Since, some issues are occured while importing the fastq file and taxonomy file.
The "not a DNAFASTAFormat file" error is telling you that the file does not conform to QIIME 2's DNAFASTAFormat type. Which essentially means that the sequence data you are importing must only contain valid DNA (not RNA) IUPAC nucleotides, and must be capitalized. Check to make sure there are no special characters and/or no lower-case characters in your DNA sequences. You can look into the following threads for more info on changing to uppercase:
For "ID on line 831 is a duplicate of another ID on line 829." and "Taxonomy format feature IDs must be unique." In QIIME 2, all IDs must be unique. I would make sure that there are no underscores (_) in the ID names. Some various tools and code wrapped by QIIME 2 will default reading any text prior to the first _ as being the ID and discard anything else afterwards. Thus, we recommend that all IDs follow this schema, to avoid mis-reading of data labels / IDs.
It seems there is a U base in the first fasta sequence (possibly more) in the sequences, as stated by the error only 'ACGTRYKMSWBDHVN' characters are allowed.
If you can easily open the fasta file in a text editor like gedit or notepad++ (dependant on OS) and find and replace all of the Us for Ts that will probably solve it.
There is probably a better way to do this as if there are capital Us in the fasta headers then they will also be replaced.
HI @Micro_Biologist ,as I mentioned earlier in this thread the IDs must be unique. If they are not you'll have to modify them in both the sequence and taxonomy files. For example, you can append an incremented number to the end of each ID: