importing fasta files and producing one output (.qza) file

Hello,
my purpose is to import multiple fasta files from a directory and produce a single output file. I followed the tutorial (Importing data — QIIME 2 2022.2.0 documentation) but i came across a problem. It seems that i can't produce a single output for all the files.
The code is:

qiime tools import \
  --input-path /home/..../fasta \
  --output-path sequences.qza \
  --type 'FeatureData[Sequence]'

**The fasta folder is where the files are located.

The error i get is the following:

There was a problem importing /home/.../fasta:
Missing one or more files for DNASequencesDirectoryFormat: 'dna-sequences.fasta'

When i tried to specify a single file in the code, the import worked just fine. Is there any way to import a directory with files of fasta format into a single (.qza) output file?

1 Like

Hi @t.gousdovas,

Thanks for reaching out! :nerd_face:

Before making any recommendations as to how you should move forward here, can you share with me what these multiple FASTA files represent? Are they per-sample features? If not, do they represent multiple studies, etc?

Each fasta file represents a different sample. All the files belong to the same study. I hope i was helpful :grimacing:

I'm having a similar issue to Teo with importing multiple fasta files! I am trying to import reference sequences from the mouse reference gut microbiome database. I can select one .fasta file from the folder and the input works fine but not for the entire folder.

1 Like

Hi @lizgehret ,
have you found what the problem may be? Do you have any suggestions?

Hi @t.gousdovas,

Thanks for your patience here! :nerd_face:

There isn't a direct way to import multiple FASTA files within QIIME 2 (only FASTQ files) - but since all of these files are from the same study, what I would recommend is to concatenate all of them together into a single file that you can then import into QIIME 2. You could also contact your sequencing provider and ask if they are able to do this for you, if this isn't something you feel comfortable doing on your own.

I hope this helps!

Cheers :lizard:

I am able to concatenate all the fasta files into one, but the file i get doesn't have the information of which sequences belong to which sample. Is it possible to create a fasta file that i could use to do taxonomy (e.g. In the moving pictures tutorial) and still keep the samples information?

Hi @t.gousdovas,

Apologies for not being more specific - you would need to add in the appropriate sample IDs in your concatenated file. So for each file, you'd include the sample ID as a new column which would then be included in the resultant file that includes all of your samples. Here is an old forum post that discusses a similar situation, which you may find useful.

Alternatively, if your sequencing provider gave you associated quality scores for your data, they could be converted to FASTQ format, which would be much easier to deal with when importing (as QIIME 2 does support importing multiple FASTQ files into a single artifact). Here is another forum post that discusses this, which you may also find useful.

Hope this helps!

Cheers :lizard:

1 Like

Thank you very much!! :grin:

1 Like