Error importing rrn fasta as FeatureData[Sequence]

Hi all,
I am sure this is a simple fix, but for the life of me I can't see the solution!
I am trying to build a database from the rrn DB using the rrn_5.9 FASTA file, but first want to dereplicate the sequences using rescript. So I need to import the FASTA file into qiime, but I am getting an error that the FASTA file is not a DNA FASTA format:

qiime tools import --input-path rrnDB-5.9_16S_rRNA.fasta --output-path rrnDB_16S_rRNA_input.qza --type 'FeatureData[Sequence]'

There was a problem importing rrnDB-5.9_16S_rRNA.fasta:

rrnDB-5.9_16S_rRNA.fasta is not a(n) DNAFASTAFormat file:

ID on line 21 is a duplicate of another ID on line 1.

Here is a snapshot of the FASTA file header:

Methanobacterium formicicum|GCF_000762265.1|NZ_CP006933.1|Chromosome: CP006933.1|283389..284864 +
AGTCCGTTTGATCCTGGCGGAGGCCACTGCTATTGGGTTTCGATTAAGCCATGCAAGTCGAA

I have also tried importing a FASTA file from ncbi with the following header, and also get an error on the FASTA file format:

NR_177367.1 Natronocalculus amylovorans strain AArc-St2 16S ribosomal RNA, partial sequence
CCTGCCGGAGGTCATTGCTATTGGGATTCGATTTAGCCATGCTAGTTGTACGAGTTTATACTCGTAGCGGAAAGCTCAGT

Can anyone advise how the FASTA file header should be formatted? Or which import command change I should make?
-Michelle

Hello @michb,

"ID on line 21 is a duplicate of another ID on line 1." This is the important part--the file you're trying to import has duplicated headers. Take a look at the headers on these lines and see if they are in fact duplicated or if a formatting issue is making the parser think they are. In general the restrictions on the headers are pretty minimal: they can't be repeated, must start with a ">" and can't be empty, I think that's it. As far as what happened in your second example, I won't be able to say unless you post the resulting error.

2 Likes

Thanks @colinvwood for your quick reply - I really missed that that was a single error message! I can see now that the ">" character did not paste into my initial question - although I can see it in the FASTA file. I'll have to review the header formatting carefully.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.