Problems Importing FASTA File

Hello,

I’ve been having the same problem, so I’ve been following this thread to try to solve my problem. After following the script you suggested, I attempted to import my reference-hit.seqs.fa file again, and this time my error said: reference-hit.seqs.fa is not a(n) QIIME1DemuxFormat file
Earlier, the error was: not a(n) DNAFASTAFormat file

This is confusing to me, because I don’t know why it would require a single file to have 2 different formats. Is there a way to force the plugin to choose one format or the other?

Thanks!

Hi @aeriel.belk
I have the same problem with you… I am trying figure out it…:sweat_smile:

Hi @aeriel.belk, we need a bit more information before we can provide assistance:

  • What version of QIIME 2 are you using?
  • What were the exact commands you ran? Copy-and-paste please!
  • What were the exact errors you observed? Copy-and-paste the results when run with --verbose.

It sounds like you ran at least two different commands, judging by the errors you mention above, so if possible please provide both of those. Thanks! :t_rex:

Hi,

I am using qiime2-2017.10. Here were my commands:

qiime tools import --input-path reference-hit.seqs.fa --output-path PMI3_Spring_rep-seqs.qza --type FeatureData[Sequence]
#Error: not a(n) DNAFASTAFormat file

wc -l reference-hit.seqs.fa
#output: 26714 reference-hit.seqs.fa
grep -v ‘^>’ reference-hit.seqs.fa | wc -l
#output: 13357
grep ‘^>’ reference-hit.seqs.fa | wc -l
#output: 13357
grep -v ‘^>’ reference-hit.seqs.fa | sort | uniq | wc -l
#output: 9723
grep ‘^>’ reference-hit.seqs.fa | sort | uniq | wc -l
#output: 9723

qiime tools import --input-path reference-hit.seqs.fa --output-path rep-seqs.qza --type SampleData[Sequence]
#Error: reference-hit.seqs.fa is not a(n) QIIME1DemuxFormat file

I’m realizing now as I typed this that I must have copied something wrong, because on my second try I have a different sample type. So, I just tried to run the import again using FeatureData[Sequence] on the data we changed with the grep commands, and I received the “not a(n) DNAFASTAFormat file” error again.

My best guess is that the program isn’t reading it as a .fasta file because the ID/sequence name is the same as the sequence itself. But if I just changed the sample ID that would probably mess things up downstream, right?

Thanks again!

Great, thanks for the details @aeriel.belk!

First off, I recommend you see my post here, which recommends using q2-deblur instead of deblur --- deblur produces data that needs a bit of massaging before it can be loaded into QIIME 2, that is where the advantage of q2-deblur comes in - it does that clean-up for you!

Yep, I am noticing that too!

The data is unchanged --- grep is just a tool for searching within files, it is non-destructive. @antgonza was asking for @Jingsi_Tang to run some grep commands to get a sense of the structure of the data, to make sure that there wasn't anything too crazy happening in the fasta file.

I can't say for sure, but ID-cleanup is one of the things happening in the q2-deblur plugin, so if possible, I would recommend re-running your deblur step in q2-deblur!

Thanks! :t_rex:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.