From DADA2 to dereplication; error QIIME1DemuxFormat filea[Sequences]

I processed my data with DADA2 on qiime2 and I like to subsequently perform a dereplication and clustering step with vsearch. To dereplicate the data, it looks like the sequence data has to be in the ‘SampleData[Sequences]’ format.
I didn’t find a way to go from the DADA2 output format of FeatureData[Sequence] to SampleData[Sequences].
So instead, I exported the DADA2 FeatureData[Sequence] file and then tried to import it as ‘SampleData[Sequences]’. When I tried this I got the below error. Is there a way to get the qiime2 DADA2 output into the qiime vsearch dereplicate-sequence plugin? Thanks!

!qiime tools export \
  --input-path rep_merge.qza \
  --output-path exported-DADA2_seqs
!qiime tools import \
  --input-path exported-DADA2_seqs/dna-sequences.fasta \
  --output-path exported-DADA2_seqs/seqs.qza \
  --type 'SampleData[Sequences]'  

There was a problem importing exported-DADA2_240_seqs/dna-sequences.fasta:
exported-DADA2_240_seqs/dna-sequences.fasta is not a(n) QIIME1DemuxFormat filea[Sequences]

$ head dna-sequences.fasta


!qiime info
System versions
Python version: 3.6.12
QIIME 2 release: 2020.11
QIIME 2 version: 2020.11.1
q2cli version: 2020.11.1

Installed plugins
alignment: 2020.11.1
composition: 2020.11.1
cutadapt: 2020.11.1
dada2: 2020.11.1
deblur: 2020.11.1
demux: 2020.11.1
diversity: 2020.11.1
diversity-lib: 2020.11.1
emperor: 2020.11.1
feature-classifier: 2020.11.1
feature-table: 2020.11.1
fragment-insertion: 2020.11.1
gneiss: 2020.11.1
longitudinal: 2020.11.1
metadata: 2020.11.1
phylogeny: 2020.11.1
quality-control: 2020.11.1
quality-filter: 2020.11.1
sample-classifier: 2020.11.1
taxa: 2020.11.1
types: 2020.11.1
vsearch: 2020.11.1

Application config directory

Hello @srosales712.
Welcome to the QIIME2 community :qiime2:
The reason it is so complicated and not working is because you are adding extra steps to the pipeline!
You data is already dereplicated so you don't need to dereplicate it! :smile:

At this point you are ready to feed your data into vsearch for clustering!
Luckily vsearch takes both a SampleData[sequences] or a FeatureData[sequences](which you should have):qiime2: . I would suggest this section of our otu-clustering tutorial. This will give you some step by steps on how to go about clustering your data! It also just has a lot of good information!
Hope this helps!
Chloe :turtle:

Hi @cherman2,

Thanks for the response! I think I should have provided more information about my data. I’m conducting a meta-analysis with studies that used V3-V4 primers and other studies that used V4 primers. After running DADA2 on each study, I merged these datasets, and I would like to dereplicate on the merged sequences. The idea is that some of the V3 sequences would cluster with the V3-V4 sequences (assuming that it works similar to CD-Hit). I’ve also processed the data from DADA2 and then did OTU-clustering, but I was trying to see if adding a dereplication step would reduce the number of ASVs/batch effects.

Hey @srosales712
Thanks for the additional information! :qiime2:
So I personally don't have a lot experience merging 2 different primers but I looked into posts on the forum and I think I found some interesting discussions that might be better then dereplicating again.

This post seems like it is doing something very similar to what you are doing

This one is a little older but has some interesting discussion on it( although I don't think as relevant )

Both mention and link to Fragment insertion. Here is another link .

Let me know if this is helpful or if I am way off the mark.
Chloe :turtle:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.