denoise With DATA2

AHK · September 15, 2021, 3:10pm

Hi all,

I am running this code (below) for my paired-end sequences data (16S, V4).
The output (denoise stats) shows filtering, denoise, merging, and chimera.

What are the parameters for merging and chimera checking? I do not see it in this command (below). How am I getting those results, merging and chimera?
Can you please explain what denoise with the data2 command?

Thank you in advance

Code:

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 250
--p-trunc-len-r 250
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

Keegan-Evans · September 16, 2021, 6:34pm

@AHK,

I think you would be able to get a much more complete understanding by reading the DADA2 paper and then checking out the DOCS. Then hop back on here and we can answer any more specific questions you might have at that point

AHK · September 16, 2021, 10:47pm

I will be more specific, How does this command (code shared above) emerging the reads? Thank you

Keegan-Evans · September 22, 2021, 12:18am

@AHK,

It sounds like you are asking about how DADA2 goes about choosing which reads it keeps and which it discards during the denoising process. Is this correct?

If so, it may require a bit more time to get concise answer together, in the mean time though I would look back over the DADA2 paper and also watch this Denoising video from a recent workshop that gives a good overview of the process. If neither of these answer your question I will get back to you with a more detailed answer

Keegan-Evans · September 22, 2021, 4:26pm

@AHK,

Here is a brief overview of the steps that DADA2 uses to produce its results:

A pairwise sequence comparison is performed on sequences that are part of the same kmer cluster.
An error model is run that calculates how likely it is that slightly differing sequences are caused by error vs actual differences in the sequence.
A statistical test to determine if the number of occurrences of a particular nucleotide in a sequence are statistically likely to occur in an actual sequence.
A divisive partitioning algorithm is then run, where all similar sequences are placed into a partition, then an algorithm is used that compares each sequence in the partition to the "center" of the partition. If it is too far apart, a new partition is created or if they are similar enough they are left together. This algorithm is described in more detail here.
Once the partitions are inferred, an error model parameterization step occurs where the likelihood of any mismatches between a sequence and center of the partition are calculated and these values are stored in a table that is used to estimate parameters in the error model.
Finally, the algorithm alternates between sample inference and parameter estimation until a consistent result emerges.

Chimeras are detected by performing an alignment between a less common sequence and a more common one and then finding where the more common sequence would have to align with some other sequence to produce the less common sequence.

Hope this helps!

AHK · October 5, 2021, 4:08pm

Thank you for clarification.

system · November 5, 2021, 10:08pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.