dada2 workflow questions

moonlight · November 20, 2019, 9:45pm

Hello Nick,

Thanks for your help! Yes, I found a solution. After I demultiplex my Paired-end data, I used the the "Atacama soil microbiome" tutorial to run Dada2 workflow. I check the tutorials and read the help file of Dada2, I think dada2 will run the QC, clean the chimaeras, join the paired end reads and build a feature table (which is roughly equivalent to OTU table in QIIME1). I just have some questions about the dada2 workflow results.

1> I got a feature table like your tutorial (https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2019.10%2Fdata%2Ftutorials%2Fatacama-soils%2Ftable.qzv).

If you check the link, you will find that there are 2 items -- Number of features are 414 and Total frequency is 16,443.

I'm not sure how to interpret the results? Is feature meaning the observed OTU in QIIME1, right?
What about frequency? The total number of reads pass the QC, removed chimera, and after joined. The final reads counts of all samples?

2>Is it possible that I can get the the information by each sample. For example, sample before QC 10000reads, after QC 8000 reads, after joined 7000, after Chimera checking 6000 reads.
Basally, I want to know if the number of reads pass dada2 process by each sample. I want to know how many reads that I lost.

I check denoising-stats.qza file like this (https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2019.10%2Fdata%2Ftutorials%2Fatacama-soils%2Fdenoising-stats.qzv). However, some results don't make a lot of sense to me.

For example, BAQ2420.2, after denoising ,you got 476 reads, but after join, you have got 244? How could it be? If you have 476 reads, the maximum number of joined pair would 476/2=238, right?

3> does the columns order reflect the actual dada2 workflow order?

The workflow order is fitering-->denoising-->merge(join)-->chimeric check

4>The last question about dada2 chimeric check. I used Vsearch to do a lot of chimeric checking (denovo checking) before, I didn't use vsearch to do OTU clustering.

There is no much information about dada2 chimeric check parameters? It does it automatically? Is it denovo or using database?

I know QIIME2 supports vsearch plugin too, but it normally uses vsearch to do traditional OTU picking methods. Would it be possbile I can only run vsearch again to check the chimaeras? If I can? which artifacts I should use before dada2 workflow or after the workflow.

Generally, I trust vsearch for chimeric checking. I want to use most of steps of dada2 workflow, except for the chimeric checking. Can I do this?

Thanks

Mehrbod_Estaki · November 21, 2019, 2:36am

Hi @moonlight,

Correct. Since there is no clustering happening with DADA2, these are not called OTUs, instead we refer to them as amplicon sequence variants (ASVs). In Qiime2, they are also referred to as features since q2 is agnostic to types of technology and sequences (think sequences from shot-gun sequencing for ex that are not amplicons).

The numbers you see in the linked table reflect the values after all those steps (qc,merging, chimera removal etc.) 16,443 is the total number of reads across all the samples. Select the Interactive Sample Detail tab from the top to see the breakdown of these per sample.

As you have already discovered, this exact information is in the stats-summary visualizer.

The 476 reads represents 476 pairs of reads (Forward + Reverse), so the merging number simply reflects of those 476 pairs, how many were merged successfully.

Correct!

DADA2 chimera detection is done automatically. You can choose which method to use but by default it uses a consensus method. You can also use the pooled option or opt to do no chimera removal and do this yourself elsewhere. I would recommend keeping the default method personally. The pooled chimera removal option might be more beneficial in future version of q2-dada2 when the pooled option becomes available for denoising as well.
To see the options for these, see the qiime dada2 denoise-paired --help document:

 --p-chimera-method TEXT Choices('pooled', 'none', 'consensus')
                         The method used to remove chimeras. "none": No
                         chimera removal is performed. "pooled": All reads are
                         pooled prior to chimera detection. "consensus":
                         Chimeras are detected in samples individually, and
                         sequences found chimeric in a sufficient fraction of
                         samples are removed.           [default: 'consensus']

Another parameter that can help fine-tuning chimera removal is the min-fold-parent-over-abundance:

 --p-min-fold-parent-over-abundance NUMBER
                         The minimum abundance of potential parents of a
                         sequence being tested as chimeric, expressed as a
                         fold-change versus the abundance of the sequence
                         being tested. Values should be greater than or equal
                         to 1 (i.e. parents should be more abundant than the
                         sequence being tested). This parameter has no effect
                         if chimera-method is "none".           [default: 1.0]

If you find too many real sequences are being discarded as chimeras you can try increasing this value.
From the dada2 website:

The core dada method corrects substitution and indel errors, but chimeras remain. Fortunately, the accuracy of sequence variants after denoising makes identifying chimeric ASVs simpler than when dealing with fuzzy OTUs. Chimeric sequences are identified if they can be exactly reconstructed by combining a left-segment and a right-segment from two more abundant “parent” sequences.

For even further details, I would suggest reading the original DADA2 paper.

Yes, see the different methods available in the q2-vsearch plugin docs., uchime denovo is probably what you are looking for. From its help file:

Inputs:
  --i-sequences ARTIFACT FeatureData[Sequence]
                          The feature sequences to be chimera-checked.
                                                                    [required]
  --i-table ARTIFACT FeatureTable[Frequency]
                          Feature table (used for computing total feature
                          abundances).                              [required]

So these would be your rep-seqs.qza and table.qza, respectively.

Yup, just tell dada2 not to do chimera checking and do vsearch checking instead separately. That being said, I'm not sure how much better you will be off that way, I've personally found the chimera removal of DADA2 is quite good (a bit on the conservative side of things but can be adjusted as well)
Hope this help.

moonlight · November 24, 2019, 12:22am

Hi guys,

Thanks for the help. The dada2 workflow runs smoothly so far.

I would like to check some concerns.

1>How well does this workflow support on multple cores/threads. for example, "qiime dada2 denoise-paired".

I ran a relative smaller dataset, but it seems cost a long time. Does this script support parallel computation?

2> I follow the tutorial (“Atacama soil microbiome” tutorial — QIIME 2 2019.10.0 documentation) and ran the script like

*qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza --p-trim-left-f 13 --p-trim-left-r 13 *

--p-trunc-len-f 150 --p-trunc-len-r 150 --o-table table.qza --o-representative-sequences rep-seqs.qza*
--o-denoising-stats denoising-stats.qza*

I understand this cutting based on the results here (https://view.qiime2.org/visualization/?type=html&src=https%3A%2F%2Fdocs.qiime2.org%2F2019.10%2Fdata%2Ftutorials%2Fatacama-soils%2Fdemux.qzv)

I am just confused about the parameter setting --p-trim-left-f and --p-trunc-len-f 150

In the tutorial, the both trim left and right set to 13? I check the quality plot. the first 12 nt has bad quality. I am not sure why this sets to 13?

a>Does this mean (trim left or right) means excluding 13th base?

b> Also, the -trunc-len seems differently. The 151st base has low quality. If trunc-len sets to 150, it means include 150th base?

It seems a little odd to use two different standards in one script's parameter. Normally, computer engineer designs using same standard. If include a base, both parameter include the base.

I just want to know how to set this, so I wont cut more or less base.

3> Last question about the provenance. All your scripts have Provenance. It's really good. I don't know why I can't see my provenance in my browser.

Mehrbod_Estaki · November 28, 2019, 11:42am

Hi @moonlight,
Sorry about the delay on this, I was actually waiting for confirmation on something from the devs but unfortunately most of them are away at the moment so I'll take a crack at this and we can confirm my answers later when they return.

Yes! Completely, some plugins in qiime2 support multi-core/thread usage, just use the --help command for each command (or see their online documentation) to see if they support parallel running. In DADA2 this is set with --p-n-threads

--p-n-threads INTEGER  The number of threads to use for multithreaded
                         processing. If 0 is provided, all available cores
                         will be used.                            [default: 1]

Your second question is the one I'm waiting for confirmation:

I THINK the answer is trim/truncate will remove up to (including) the the given position, so

If your reads positions are like this:
1 2 3 4 5 6 7 8 9 10
And you set trim=3, trunc=8
Then you end up with:
- - - 4 5 6 7 - - - -

As to why the tutorial is using those specific values, I'm not sure if it was as methodical as you are looking into. Of course there is a reasonable attempt at picking good parameters, but it may just be a rough choice. To me it looks like they picked the forward reads parameters first and then just used the same ones for reverse, because well, it works ok too. You would take more care to optimize your reads. Again, this is my guess, the developers can confirm this when they return.

This is odd, when you upload an artifact into the qiim2view, do you not see the provenance tab on the top right hand corner? Clicking that tab should show those. If it's not showing, could you please provide us with some more detailed information about the exact problem, browser (and version) you are using, your OS. This may actually be better followed up on a separate thread as it doesn't really have anything to do with the original topic of dada2 workflow.
Hope this helps!

system · December 29, 2019, 5:43pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.