Hi all,
I am sure this is a simple fix, but for the life of me I can't see the solution!
I am trying to build a database from the rrn DB using the rrn_5.9 FASTA file, but first want to dereplicate the sequences using rescript. So I need to import the FASTA file into qiime, but I am getting an error that the FASTA file is not a DNA FASTA format:
qiime tools import --input-path rrnDB-5.9_16S_rRNA.fasta --output-path rrnDB_16S_rRNA_input.qza --type 'FeatureData[Sequence]'
There was a problem importing rrnDB-5.9_16S_rRNA.fasta:
rrnDB-5.9_16S_rRNA.fasta is not a(n) DNAFASTAFormat file:
ID on line 21 is a duplicate of another ID on line 1.
Here is a snapshot of the FASTA file header:
Methanobacterium formicicum|GCF_000762265.1|NZ_CP006933.1|Chromosome: CP006933.1|283389..284864 +
AGTCCGTTTGATCCTGGCGGAGGCCACTGCTATTGGGTTTCGATTAAGCCATGCAAGTCGAA
I have also tried importing a FASTA file from ncbi with the following header, and also get an error on the FASTA file format:
NR_177367.1 Natronocalculus amylovorans strain AArc-St2 16S ribosomal RNA, partial sequence
CCTGCCGGAGGTCATTGCTATTGGGATTCGATTTAGCCATGCTAGTTGTACGAGTTTATACTCGTAGCGGAAAGCTCAGT
Can anyone advise how the FASTA file header should be formatted? Or which import command change I should make?
-Michelle