Simultaneous dada2 for files that have not been merge and already merged


I would like to do a meta-analysis using the 16S amp seq data of enterobacteria of rare diseases deposited in SRA. I am trying to solve some problems in the meta-analysis by using q2-fragment and so on. But I still have some problems, so I would like to ask some questions.

Of the 5 datasets that I was able to download, 3 are files that have not been merged yet, and the other 2 seem to be files that have already been merged by other methods (one by PANDAseq, the other not explicitly mentioned). Is there any way to do dada2 on these files at the same time?

I think I searched enough, but sorry if this has already been discussed.

Hi @shibataryohei,
Welcome to the forum!
There are a couple of different considerations here, but the short answer is, no, there is no way for these files to be denoised with DADA2 at the same time, and that is a good thing!
DADA2 builds an error model that is specific to each sequencing run and so you should make sure you run each set separately before merging for downstream analysis.
There's tons of posts on the forum that I recommend you reviewing about how to combine data from different runs. There are some very important considerations regarding selecting target region, trim/truncating parameters, batch effect etc. but those are all well beyond the scope of this post.

Another consideration for your case is that some of the reads are pre-merged. In those cases, have they been processed/QCed in some other way? For example have they gone through quality control? Chimera removal etc? Do they still have their quality scores (i.e. are these FASTA or FASTQ files?) DADA2 is designed to work on FASTQ files, it requires the quality scores to build the model. The QIIME 2 implementation of DADA2 also will do merging, filtering, and chimeral removal on top of its denoising, all in one pipeline. In other words you certainly shouldn't feed it pre-merged FASTQ files.

Your best bet is to try and find the raw FASTQ files for all datasets if you want to use DADA2. If you only have access to unprocessed FASTA files you can still denoise using the stand-alone Deblur tool, but not its Q2 implementation as that still expects FASTQ files.

1 Like

Hi @Mehrbod_Estaki

I am very sorry for the late reply. I sincerely appreciate your detailed reply.

I understand what you have said and it has helped me to understand more about DADA2. Since some of the studies this time have only published FASTA files, I am going to consider excluding these files or using the standalone Deblur tool.

Again, thank you very much for your kindness !

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.