multiple fasta files data-import

Hi,
I have multiple sequence files in fasta format (eg: sample1.fna). each file corresponds to one sample. I have seen the “Sequences without quality information (i.e. FASTA)” to import data without quality score, but it is only for a single file.
How to import multiple sequence files?

Best
Nisha

In the example you gave (Importing data — QIIME 2 2021.2.0 documentation), it gives a link to an example: in Clustering sequences into OTUs using q2-vsearch — QIIME 2 2021.2.0 documentation under the heading “Dereplicating a SampleData[Sequences] artifact”, there’s an example that may be useful for you:

qiime tools import --input-path seqs.fna --output-path seqs.qza --type 'SampleData[Sequences]'

I believe that as long as all of your files are in the right directory (which you set in --inputpath parameter), all of them will be imported. If you try it and it doesn’t work, then you can get some troubleshooting.

Thanks for prompt response!
My all files are in same/right directory, but in this command, I can specify one one file (seqs.fna). And I have ~50 sample files.

Run qiime tools import --help. You’ll see that the --input-path parameter takes “Path to file or directory that should be imported.” So I was suggesting that you set this parameter as the name of the directory containing your files.

Oh… I did this and again getting an error.
"Missing one or more files for DNA sequences directory format: ‘dna-sequences.fasta.’ "

Then I don’t know how to help; sorry. Maybe someone else on here will know more.

If I specify " --input-format DNAFASTAFormat"; it is not accepting directory as an input but file.

Hi @Nisha, given:

I think you'll find importing via a file manifest to be quite helpful here.

EDIT: Oops, I misread this. You have FASTA and not FASTQ.

Hi again!
I have solved this problem.

I just have a query; how do we filter our data if it is in fasta format? let’s say I want to remove chimeric sequences from my data.

Nisha

Hi @Nisha,

Would you mind posting your solution, so that others on the forum can benefit?

There are a variety of ways to do this as outlined in the filtering documentation , specifically about filtering sequences.

For example if you wanted to remove plastid sequences you could do:

qiime taxa filter-table \
    --i-table ./table.qza \
    --i-taxonomy ./taxonomy.qza \
    --p-mode 'contains'  \
    --p-include 'p__' \
    --p-exclude 'p__;,Eukaryota,Chloroplast,Mitochondria,Unassigned,Unclassified' \
    --o-filtered-table ./table-no-ecmu.qza

More details can be found here, and here.

If you want to check for and remove chimeras from your data you can use vsearch. Check out this chimera removal tutorial. In case you are performing OTU clustering, etc... you can look through the clustering sequences tutorial too.

-Cheers!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.