multiple fasta files data-import

Nisha · June 8, 2021, 12:18pm

Hi,
I have multiple sequence files in fasta format (eg: sample1.fna). each file corresponds to one sample. I have seen the "Sequences without quality information (i.e. FASTA)" to import data without quality score, but it is only for a single file.
How to import multiple sequence files?

Best
Nisha

wburgess · June 8, 2021, 1:30pm

In the example you gave (Importing data — QIIME 2 2021.2.0 documentation), it gives a link to an example: in Clustering sequences into OTUs using q2-vsearch — QIIME 2 2021.2.0 documentation under the heading "Dereplicating a SampleData[Sequences] artifact", there's an example that may be useful for you:

qiime tools import --input-path seqs.fna --output-path seqs.qza --type 'SampleData[Sequences]'

I believe that as long as all of your files are in the right directory (which you set in --inputpath parameter), all of them will be imported. If you try it and it doesn't work, then you can get some troubleshooting.

Nisha · June 8, 2021, 1:54pm

Thanks for prompt response!
My all files are in same/right directory, but in this command, I can specify one one file (seqs.fna). And I have ~50 sample files.

wburgess · June 8, 2021, 2:14pm

Run qiime tools import --help. You'll see that the --input-path parameter takes "Path to file or directory that should be imported." So I was suggesting that you set this parameter as the name of the directory containing your files.

Nisha · June 8, 2021, 2:57pm

Oh.... I did this and again getting an error.
"Missing one or more files for DNA sequences directory format: 'dna-sequences.fasta.' "

wburgess · June 8, 2021, 2:58pm

Then I don't know how to help; sorry. Maybe someone else on here will know more.

Nisha · June 8, 2021, 3:58pm

If I specify " --input-format DNAFASTAFormat"; it is not accepting directory as an input but file.

SoilRotifer · June 9, 2021, 3:23pm

Hi @Nisha, given:

~~I think you'll find importing via a file manifest to be quite helpful here.~~

EDIT: Oops, I misread this. You have FASTA and not FASTQ.

Nisha · June 16, 2021, 7:00am

Hi again!
I have solved this problem.

I just have a query; how do we filter our data if it is in fasta format? let's say I want to remove chimeric sequences from my data.

Nisha

SoilRotifer · June 21, 2021, 1:35pm

Hi @Nisha,

Would you mind posting your solution, so that others on the forum can benefit?

There are a variety of ways to do this as outlined in the filtering documentation , specifically about filtering sequences.

For example if you wanted to remove plastid sequences you could do:

qiime taxa filter-table \
    --i-table ./table.qza \
    --i-taxonomy ./taxonomy.qza \
    --p-mode 'contains'  \
    --p-include 'p__' \
    --p-exclude 'p__;,Eukaryota,Chloroplast,Mitochondria,Unassigned,Unclassified' \
    --o-filtered-table ./table-no-ecmu.qza

More details can be found here, and here.

If you want to check for and remove chimeras from your data you can use vsearch. Check out this chimera removal tutorial. In case you are performing OTU clustering, etc... you can look through the clustering sequences tutorial too.

-Cheers!

system · July 22, 2021, 7:36pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.