Do I need to remove chimera with qiime vsearch uchime-denovo after denosing by Dada2

Tintin · June 9, 2020, 2:07pm

Hello everybody,

I have used Dada2 for the denoising step. As I know Dada2 also includes removing chimera. I am wondering that it is necessary to remove chimera with qiime vsearch uchime-denovo after denosing by Dada2 or not?

Thank you in advance,

Kind regards,

Tinh

ChrisKeefe · June 9, 2020, 10:52pm

@Tintin, I've moved this post to General Discussion, as it's "a general question about microbiome science, bioinformatics, or other" rather than "a question about specific results".

I'm not even close to being a bioinformatics expert, but I can tell you that the couple analyses I have been involved with, and the tutorials that use DADA2 do not perform additional chimera removal afterwards. If you'd like to learn more about the isBimeraDenovo algorithm DADA2 uses, search for "Chimeras" in the DADA2 paper preprint.

If you'd like more specific information on anything relevant to this topic, feel free to reply here - there are lots of smarter people than I around the forum who may be able to help out.

Best,
Chris

Tintin · June 10, 2020, 2:10pm

@ChrisKeefe Thank you so much for your answer.

Indeed, I have one more question to ask you. In my research, I like to get OTU (97% similarity) and then do taxonomy classification to obtain community composition. My workflow is

Demultiplex
Denoise with Dada2
Removing chimera with qiime vsearch uchime-denovo
Cluster into OTUs using q2-vsearch
Taxonomy classification

Recently, I have read the workflow of Qiime 2 I recognize that Denoise with Dada2 will generate the ASVs. Therefore, I am wondering if I like to get OTUs, it is necessary to do denoise with Dada2 or not. In case I do not use denoising with Dada2, do I need to perform other steps to remove and/or correct noisy reads and chimera? As you mentioned if I use Dada2, I do not need to do removing chimera. However, if I do not run Dada2 I still have to do removing chimera with vsearch uchime-denovo, is that right?

Thank you in advance.

Tintin

ChrisKeefe · June 10, 2020, 7:22pm

Again, @Tintin, I may not be the best-qualified person to answer this question, so I'm going to lean on resources that might help. Please forgive me if any of this is unhelpful to you.

It is not necessary to denoise. See the OTU Clustering Tutorial for a how-to, and note that

these files are analogous to those generated by qiime dada2 denoise-* and qiime deblur , except that no denoising, chimera removal, or other quality control has been applied in the dereplication process."

Taking this approach probably means manually applying your own QC, as you've suggested. By correcting rather than dropping or ignoring noisy reads, DADA2 provides the added benefit of reducing the number of false positives, often apparently yielding fewer ASVs than you would have gotten OTUs from clustering methods.

What you choose for your analysis should fit your study needs, but as a side note, here's an amazing, brief look at the history of taxonomic assignment and clustering, with additional good posts linked in it. Depending on your specific work, you may be able to simplify your pipeline and preserve more data without clustering to 97%. There are definitely cases where clustering/OTU picking avoids pitfalls inherent to ASV/denoising methods, but it's worth a read if you haven't considered this approach.

Best,
Chris

Tintin · June 11, 2020, 2:24pm

@ChrisKeefe

Thank you for sharing ideas. Indeed, it is helpful.
I have recently read the tutorial " Clustering sequences into OTUs using q2-vsearch" . It mentions it is possible to do clustering after running denoising with Dada2 or Deblur if I understand well from the tutorial . You can see below

Clustering of sequences or features into OTUs using vsearch is currently possible from demultiplexed, quality-controlled sequence data (i.e., a SampleData[Sequences] artifact), or from dereplicated, quality-controlled data in feature table and feature representative sequences (i.e., the FeatureTable[Frequency] and FeatureData[Sequence] artifacts, which could be generated using the qiime dada2 denoise-* or qiime deblur denoise-* commands).

I agree with you that ASVs (99% ) generated from Dada 2 is fewer than OTUs (97%) from clustering methods. Indeed, 97% is a common similarity threshold because studies showed that most strains had 97% 16S rRNA sequence similarity. However, many criticisms regarding using percent sequence similarity to define OTUs. I am wondering if we use Dada2, can we generate ASVs with 97% similarity?

Best,

Tintin

ChrisKeefe · June 12, 2020, 4:05pm

@Tintin, I think I've done all I can here. If you're asking whether it's OK to denoise then cluster, I think the tutorial points pretty clearly toward "Yes." If you're asking a deeper bioinformatics question, unfortunately it's beyond my expertise to answer, or even understand.

As a side note, I'm a little concerned that you may have some confusion about what an ASV is:

In my understanding, these are not 99% OTUs, but rather exact sequences, with differences resolved to the single-nucleotide level. There is no clustering, any reduction in counts is not a byproduct of clustering (with associated loss of information), but by correcting "noise" - artificial variance in the sequence introduced during prep/sequencing/whatever. Ben Callahan treats the concepts well in greater detail in this preprint.

Good luck!
Chris

Tintin · June 13, 2020, 3:25pm

@ChrisKeefe I really appreciate your replies. I will go through the recommended paper.

Nice weekend,

Tintin

system · July 14, 2020, 9:25pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.