How to import fasta files which have undergone vsearch based QC

Dear Qiime,

My initial fastq.gz samples underwent host contamination removal and merging within qiime2.
I found chimeric removal within the qiime2 framework took too long so I exported by samples out to perform vsearch based chimeric removal.

Now I wish to bring my fasta files back into qiime2 to begin taxanomic analysis however, I am struggling with the syntax of the import function.

My samples (only running 3 atm for a pilot) are all in the sample directory

(qiime2-amplicon-2024.2) root@b1c1dff011df:/Home/Data_vsearch/test# pwd
/Home/Data_vsearch/test
(qiime2-amplicon-2024.2) root@b1c1dff011df:/Home/Data_vsearch/test# ls
A11d1JCon4A_0_L001_R1_001_nonchimera.fasta  B11d2JCon1B_2_L001_R1_001_nonchimera.fasta
B11d1CCon1B_1_L001_R1_001_nonchimera.fasta

I believe these fasta files accommodate to qiime2's DNAfasta requirements as there are 2 items per line and the sequence is in a single line (import demultiplexed fasta files into Qiime2 - #4 by colinvwood).

(qiime2-amplicon-2024.2) root@b1c1dff011df:/Home/Data_vsearch/test# head A11d1JCon4A_0_L001_R1_001_nonchimera.fasta
>A00707:180:HCLMHDSX7:2:1101:10303:26663;size=1
AATTAGAGTTAACAATAATCGGCAGCACCTCTGGTGTCAGGCCAACAGCCGCAGCTAAAGCAAAAATTAAGCTTTCTCCCCAATCGCCTTTAGTCAAGCCATTAATGACAAACAGTAGTGGGATGATAATTGC
>A00707:180:HCLMHDSX7:2:1101:10700:17018;size=1
ACTTATGGACGTCGGATCCTTCAAAGCAAGGT
>A00707:180:HCLMHDSX7:2:1101:10737:31422;size=1
ACTATTTATTACGCaaaaaagtgcaaatttttttcagaaatttaaaaatttagacacgaaaaaaGCCGATGCAAATGCATCGAC
>A00707:180:HCLMHDSX7:2:1101:10782:29684;size=1
ACATGAAAGAGATTACAAAAACAGTTATGATTGCTACTCATGATATGCAGCTGGTCTGCCAGTGGGCGGACAGGATCCTTGTCTTGTGCCAGGGAAAGATT
>A00707:180:HCLMHDSX7:2:1101:1081:16266;size=1
GAATATAGGGAGAGATTATCCTTTCCGCTTAAAAATGGGTAAATTGCAGGATTTTCGATCAAGGCCCCAACATTTTGTAGAGCCTTGTGATTATTGGCAGTAATGGGCTGATTGTTAAAAG

However, I am continuously getting this error from the command below.

qiime tools import \
  --input-path /Home/Data_vsearch/test \
  --output-path /Home/Data_vsearch/sequences.qza \
  --type 'FeatureData[Sequence]' \
  --input-format DNASequencesDirectoryFormat
There was a problem importing /Home/Data_vsearch/test/:

  Missing one or more files for DNASequencesDirectoryFormat: 'dna-sequences.fasta'

Based on other posts (Import data problem), the solution to these types of issues is sometimes in the syntax, which I am hoping it is. However, even looking through the tutorial I cannot find the solution - can anyone help please.

Note. I corrected the lower cases in the sequences above with AWK and can import the fasta files individually.

(qiime2-amplicon-2024.2) root@b1c1dff011df:/Home/Data_vsearch/test# qiime tools import --input-path A11d1JCon4A_0_L001_R1_001_nonchimera.fasta --output-path sequences1.qza --type 'FeatureData[Sequence]'
qiime tools import --input-path B11d1CCon1B_1_L001_R1_001_nonchimera.fasta --output-path sequences2.qza --type 'FeatureData[Sequence]'
qiime tools import --input-path B11d2JCon1B_2_L001_R1_001_nonchimera.fasta --output-path sequences3.qza --type 'FeatureData[Sequence]'
Imported A11d1JCon4A_0_L001_R1_001_nonchimera.fasta as DNASequencesDirectoryFormat to sequences1.qza
Imported B11d1CCon1B_1_L001_R1_001_nonchimera.fasta as DNASequencesDirectoryFormat to sequences2.qza
Imported B11d2JCon1B_2_L001_R1_001_nonchimera.fasta as DNASequencesDirectoryFormat to sequences3.qza

Hi Krutik,

So, exporting data from Qiime artifacts, processing it, and then importing it back into Qiime2 is possible.

However, this 'round-trip' is extremely error-prone and breaks provenance.

Let's see if we can get this working.

Qiime2 also supports vsearch chimera checking, so let's do this same step on-platform. I'm guessing dereplicate followed by uchime-denovo? All the q2-vsearch pipelines are listed here:
https://docs.qiime2.org/2024.5/plugins/available/vsearch/#methods

Let us know what you would like to try next!