Troubles importing FASTA file

Hello!

I am having the same problem as @Pseudomonas84 importing my seqs.fna file into qiime2. I tried following your suggestion of including the source format parameter, so my command looked like this:

qiime tools import
–input-path reference-hit.seqs.fa \
–output-path PMI3_spring_rep-seqs.qza
–source-format QIIME1DemuxFormat
–type SampleData[Sequences]

I also received the error: reference-hit.seqs.fa is not a(n) QIIME1DemuxFormat file

Here is the head of my file:
head reference-hit.seqs.fa

>TACAGAGGTCTCAAGCGTTGTTCGGATTCATTGGGCGTAAAGGGTGCGTAGGCGGCGCGGTAAGTCGGGTGTGAAATCTCGGAGCTTAACTCCGAAACTG
TACAGAGGTCTCAAGCGTTGTTCGGATTCATTGGGCGTAAAGGGTGCGTAGGCGGCGCGGTAAGTCGGGTGTGAAATCTCGGAGCTTAACTCCGAAACTG
>TACGGAGGATCCAAGCGTTATCCGGAATCATTGGGTTTAAAGGGTCCGTAGGCGGTTTAGTAAGTCAGTGGTGAAAGCCCATCGCTCAACGGTGGAACGG
TACGGAGGATCCAAGCGTTATCCGGAATCATTGGGTTTAAAGGGTCCGTAGGCGGTTTAGTAAGTCAGTGGTGAAAGCCCATCGCTCAACGGTGGAACGG
>TACAGAGGTCTCAAGCGTTGTTCGGATTCATTGGGCGTAAAGGGTGCGTAGGTGGTGATGCAAGTCTGGTGTGAAATCTCGGAGCTCAACTCCGAAATTG
TACAGAGGTCTCAAGCGTTGTTCGGATTCATTGGGCGTAAAGGGTGCGTAGGTGGTGATGCAAGTCTGGTGTGAAATCTCGGAGCTCAACTCCGAAATTG
>TACGAAGGGGGCTAGCGTTGCTCGGAATCACTGGGCGTAAAGCGCACGTAGGCGGATTGCTAAGTCAGGGGTGAAATCCTGGAGCTCAACTCCAGAACTG
TACGAAGGGGGCTAGCGTTGCTCGGAATCACTGGGCGTAAAGCGCACGTAGGCGGATTGCTAAGTCAGGGGTGAAATCCTGGAGCTCAACTCCAGAACTG
>TACGAAGGGGGCTAGCGTTGCTCGGAATCACTGGGCGTAAAGGGTGCGTAGGCGGGTCTTTAAGTCAGGGGTGAAATCCTGGAGCTCAACTCCAGAACTG
TACGAAGGGGGCTAGCGTTGCTCGGAATCACTGGGCGTAAAGGGTGCGTAGGCGGGTCTTTAAGTCAGGGGTGAAATCCTGGAGCTCAACTCCAGAACTG

My file does not appear to have any lowercase letters, however I noticed that the ID line is the same as my sequence, and I’m wondering if that is the problem. Is there a way to fix that, or do you have another suggestion as to what the problem might be?

Thanks!
Aeriel :smile:

Hi @aeriel.belk,
your sequences are in standard fasta format, not a QIIME1DemuxFormat file. You can follow this tutorial example to import to a FeatureData[Sequence] file.

I hope that helps!

1 Like

Thanks for your quick reply!

Unfortunately, when I run the command as described in the tutorial I am given the error: reference-hit.seqs.fa is not a(n) DNAFASTAFormat file

This is why I tried using the other method originally, I guess I am unsure what format my file is then, if it isn’t either of these.

Hi @aeriel.belk, sorry to hear things aren’t going well :frowning: . I just created a file (seqs.fna) using the output from the head command you listed above, and imported using the following command (this is what @Nicholas_Bokulich linked to above):

$ qiime tools import \
  --input-path seqs.fna \
  --output-path sequences.qza \
  --type 'FeatureData[Sequence]'

The sequences imported without raising an error. This is noteworthy, because the code that performs the validation of this format only reads the first 5 records to make its assessment — so even if your file has more records in it, the error you mentioned seeing above would’ve been caused by using just the first 5 records, which means I should have seen that error in my test (this isn’t entirely true - the sniffer could detect issues in other lines of the file).

Some requests:

  • Can you please double check that your import command is importing the file you want?
  • Can you please copy-and-paste the exact import command you are running?
  • Can you please copy-and-paste the complete error you are seeing?

Thanks! :t_rex:

1 Like

Thanks @thermokarst!

I tried again, I am certain that I am using the correct fasta file in this command:

qiime tools import
–input-path reference-hit.seqs.fa
–output-path sequences.qza
–type ‘FeatureData[Sequence]’

There was a problem importing reference-hit.seqs.fa:

reference-hit.seqs.fa is not a(n) DNAFASTAFormat file

Thanks!

Hi @aeriel.belk,
If your file is small enough, could you please post it here and we could attempt to debug?

what version of QIIME2 are you running?

It is attached below. I am currently running QIIME2-2017.12, but I had the same issue in 2017.11 and 2017.10

reference-hit.seqs.fa.zip (261.7 KB)

Hi @aeriel.belk,
It looks like the issue is that a number of sequences contain lower-case characters. use the following to convert your file before importing:

tr 'acgt' 'ACGT' < reference-hit.seqs.fa > seqs.fna

That will do the trick!

3 Likes

That worked, thanks!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.