Statistics from de novo chimera removal

afrinaad · April 7, 2020, 9:35am

Hi everyone,

I am currently trying to do comparison between DADA2 and the OTU clustering methods. As we know, chimera removal is already included in DADA2 while for OTU clustering method, we will have to do it 'manually'.

Therefore, I wish to get the chimera statistic that is almost similar to what DADA2 provides like below:

dada2-stats.qzv (1.2 MB)

I did as below but I couldn't find any sort of statistic or report from the de novo chimera removal.

get chimera by denovo > filter out chimera from denovo clustering table and seqs

Is there any way to get one? Or did I miss any step?

Thanks and be safe and healthy!

thermokarst · April 7, 2020, 2:31pm

Hi @afrinaad - have you seen the "Identifying and filtering chimeric feature sequences with q2-vsearch"?

https://docs.qiime2.org/2020.2/tutorials/chimera/

afrinaad · April 8, 2020, 1:46am

Hi @thermokarst - yes, I did that one as well but it doesn't produce the statistics like the one from DADA2. The file stats.qza from uchime-denovo does not summarize the nonchimera reads from each samples. I would like to find out the total number of non-chimeric reads from all samples.

I am thinking to use the value from the table-nonchimeric-wo-borderline.qza as non-chimeric reads.
However, it is written as feature count and not reads count, therefore I don't think it is correct, isn't it?

thermokarst · April 8, 2020, 10:03pm

QIIME 2 doesn't have a way for mapping what features came from what reads ("OTUMap"), so answering this question isn't possible. Is it necessary, or can you get a good sense of what you're looking for based on the features?

afrinaad · April 9, 2020, 9:26am

Hi @thermokarst, thank you so much for the reply.

My purpose is to compare the chimera removal step in both DADA2 and otu clustering method pipeline.

What I am trying to say is, DADA2 method pipeline has already included chimera removal in the pipeline, right? And at the end of this step, we are able to get stats-dada2.qza by running this:

qiime dada2 denoise-single
--i-demultiplexed-seqs demux.qza
--p-trim-left 0
--p-trunc-len 120
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--o-denoising-stats stats-dada2.qza

From the stats-dada2.qza we can obtain information such as total non-chimeric (just like in the image in the first post).
Therefore, I would like to recreate this information but using otu clustering method pipeline.

For otu clustering method, we would have to do dereplicate/deblur > otu filtering > chimera filtering > abundance filtering, right? However, at the chimera filtering or abundance filtering step, it does not provide any stats for after the chimera filtering so we are unable to get the same information like what stats2-dada.qza can provide.

I am wondering if it is possible to obtain such information from otu clustering method.
Hope this makes sense. Thanks again.

thermokarst · April 10, 2020, 2:39pm

Hi @afrinaad - as I said above, no, this isn't possible, there are no mechanisms currently in place for tracing the features back to their original reads. If you want to compare, you could compute the percentage of merged non-chimeric from the DADA2 results, and directly compare to the OTU clustering.

system · May 11, 2020, 8:49pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.