denoising sequences without dereplicating

Hello,

I recently was able to get my sequences denoised using dada2’s denoise paired-end function, and I got the three output files I am supposed to. I am now trying to figure out if there is a way to undo the dereplication process and get the fastq sequences that have been denoised from each sample. The reason I want to do this is so I can export the sequences in fasta format to do another analysis that I have used in the past. Forgive my ignorance, but is their an easy way to do this from the output files dada2 provides? I have checked the forums and haven’t found much.

If this is not possible, is there other ways of denoising data? I have noticed you can remove chimeric sequences using different commands, so is there ways to do the denoising step but not the dereplication step and keep what sample the sequences are coming from? Any help is appreciated!

Thank you,
John

Hey @John_Kincaid,

From my (very basic!) understanding of DADA2: you

  • input fastq files
  • build error profiles of your dataset
  • correct errors based on these profiles
  • dereplicate the sequences
  • output a fasta file

When you’re asking to “undo” the dereplication process, it sounds like what you want is the data as it might be envisioned in a middle part of that workflow - where you have built the error profiles and applied the corrections on a per-sequence level in the fastq space, but haven’t yet dereplicated. Is that right?

I don’t know if QIIME’s installation supports anything of this sort, but it’s definitely the case that you can view the output of the process to see summaries of how many sequences were corrected or filtered with the --o-denoising-stats parameter. That output makes me think that there is certainly some where along the pipeline where this information is tracked, but I suspect you’ll need to wade into a stand alone installation of DADA2 to break it down into the component parts you need. Maybe something along the lines of the Track reads through the pipeline section of the DADA2 tutorial is what you’re after?

If possible, can you elaborate on what you’re trying to do with the other analysis? Do you need fastq-corrected data because you need read counts? Can’t you get that data from the frequency table that QIIME outputs with --o-table?

The other denoising program available through QIIME that I’ve used is deblur, though in my own testing I’ve found DADA2 to perform a bit better (:stop_sign: shameless plug alert!:warning: ).

Good luck

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.