Denoising results queries

Hello, All

I presented my qiime analysis result in my weekly meeting. My senior asked me after the de noising (Dada2 analysis) why sample 6 has very low sequence count and he asked me is it good to take that sample for further analysis.

Before de multiplexing, sample 6 has 85523 sequence count. After de noising, sample 6 has 315 sequence count. I didn't trim or truncate the low quality reads.

De multiplexing step (sequence count of all the sample)

image Sample name Sequence count
sample-12 318689
sample-10 316279
sample-9 309068
sample-4 282897
sample-8 268205
sample-7 216594
sample-11 210280
sample-1 173988
sample-3 133036
sample-2 107743
sample-5 90754
sample-6 85523

After dad2 analysis

Sampling depth:

(Zero implies no even sampling.)

Retained 518,292 (100.00%) sequences in 12 (100.00%) samples at the specifed sampling depth.

Number of SamplesSO_8154_S001_ITS1SO_8154_S001_ITS2SO_8154_S006_ITS1SO_8154_S006_ITS2SO_8154_S024_ITS1SO_8154_S024_ITS2SO_8154_S030_ITS1SO_8154_S030_ITS2SO_8154_S034_ITS1SO_8154_S034_ITS2SO_8154_S039_ITS1SO_8154_S039_ITS20.00.10.20.30.40.50.60.70.80.91.0

Sample ID Sequence Count
sample-9 114,019
sample-7 96,251
sample-11 82,305
sample-10 61,008
sample-3 57,312
sample-2 36,932
sample-5 32,659
sample-1 29,805
sample-12 2,747
sample-4 2,476
sample-8 2,463
sample-6 315

This is the command I ran

qiime dada2 denoise-paired --i-demultiplexed-seqs ITSpaired-end-demux.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 0 --p-trunc-len-r 0 --o-representative-sequences 2ITSrep-seqs-dada2.qza --o-table 2ITStable-dada2.qza --o-denoising-stats 2ITSstats-dada2.qza

Could you please explain me how sequence count value is determined? My opinion about sequence count is number of base pair in sequence. For Instance, Sample 6 has 315 base pair after de noising.

If I am wrong, could anyone please explain about that sequence count?

One more query is what is the minimum OTU sequence length to consider as valid OTU sequence
?

Thanking you in advance. Looking forward to your reply.

1 Like

Hello, All

I presented my qiime analysis result in my weekly meeting. My senior asked me after the de noising (Dada2 analysis) why sample 6 has very low sequence count and he asked me is it good to take that sample for further analysis.

Before de multiplexing, sample 6 has 85523 sequence count. After de noising, sample 6 has 315 sequence count. I didn’t trim or truncate the low quality reads.

De multiplexing step (sequence count of all the sample)

sample-12 318689
sample-10 316279
sample-9 309068
sample-4 282897
sample-8 268205
sample-7 216594
sample-11 210280
sample-1 173988
sample-3 133036
sample-2 107743
sample-5 90754
sample-6 85523

After dad2 analysis

Sample ID Sequence Count
sample-9 114,019
sample-7 96,251
sample-11 82,305
sample-10 61,008
sample-3 57,312
sample-2 36,932
sample-5 32,659
sample-1 29,805
sample-12 2,747
sample-4 2,476
sample-8 2,463
sample-6 315

This is the command I ran

qiime dada2 denoise-paired --i-demultiplexed-seqs ITSpaired-end-demux.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 0 --p-trunc-len-r 0 --o-representative-sequences 2ITSrep-seqs-dada2.qza --o-table 2ITStable-dada2.qza --o-denoising-stats 2ITSstats-dada2.qza

Could you please explain me how sequence count value is determined? My opinion about sequence count is number of base pair in sequence. For Instance, Sample 6 has 315 base pair after de noising.

If I am wrong, could anyone please explain about that sequence count?

One more query is what is the minimum OTU sequence length to consider as valid OTU sequence
?

Thanking you in advance. Looking forward to your reply.

Hello @Asha1,

Thanks for posting on the forums! I saw this question a few days ago, but didn't jump in as I'm not an expert on dada2.

It looks like dada2 is removing many of your reads! Let's collect clues and figure out what's going on. :female_detective:

As shown in the Moving Pictures tutorial, you can use qiime metadata tabulate to visualize each step in the dada2 denoising process. The output to your table will look like this.

Yes. The dada2 denoising process proceeds in several steps, and the final count is the number of filtered, denoised, non-chimeric reads in that sample.

So one of these steps is removing a huge number of reads from one of the samples, and we got to figure out which one!

Let us know what you find!

Colin

P.S. You don't have to open new threads. We will reply to every question when we have time. :hourglass_flowing_sand:

Hello sir,

I am extremely sorry sir for posting same query two times . My presentation is on tomorrow and people except answer for that query from me which is why I did like that.

I didn't trim or truncate the sequence. This is the command I ran

qiime dada2 denoise-paired --i-demultiplexed-seqs ITSpaired-end-demux.qza --p-trim-left-f 0 --p-trim-left-r 0 --p-trunc-len-f 0 --p-trunc-len-r 0 --o-representative-sequences 2ITSrep-seqs-dada2.qza --o-table 2ITStable-dada2.qza --o-denoising-stats 2ITSstats-dada2.qza.

Sample 6 sequence count got reduced drastically in merged step sir, I read in forum that we should have minimum 20 nucleotide overlap for merging.

Doesn't my 36172 reads have 20 nt overlap? Could you explain me why it happened like that?

Is 315 reads enough for Otu sequence picking and taxonomic assignment ?

Could you please tell me what is the minimum OTU sequence length for species level taxonomic assignment ?

My final query is I didn't trim or truncate the sequence but why my sequence count value got reduced in filtered step ?

Thanking you in advance.

1 Like

Good morning,

Thanks for posting that table. This now that we know reads are getting lost at the merging step, we can look for settings that will let more reads be joined.

I understand the time pressure of group meetings, so let's dive in! :swimming_man:

Yep, something is going poorly at the merging step. Having 20 bp for overlap should be enough, so let's figure out what could be going wrong.

I'm not sure... the number of reads 36k or just 300 reads will not effect overlap, but the length of the amplicon and length of reads will effect that. So if your amplicon is 250 bp long and your reads are 150 long, you will expect 50 bp overlap. Like this:

250 bp amplicon  |-------------------------|
150 bp read      |--------------->
150 bp read                <---------------|
50 bp overlap               ^^^^^

However, if your forward or reverse read is low quality, then the reads will overlap, but dada2 will be unable to pair them.

250 bp amplicon  |-------------------------|
150 bp read      |--------------->
150 bp read                <000------------| (30b are low quality)
50 bp overlap (with errors)    ^^ (so only 20 can join)

One solution to this problem is to trim the end of that reverse read so that the part that's left is high quality and is able to join.

Maybe trimming will help! Try running qiime demux summarize and look at the Interactive Quality Plot tab. This will show you the quality of both forward and reverse reads and you can use this information to see where you need to trim.

While you can make OTUs or ASVs using these reads, having 36k reads would be much better! Let's see if we can find some settings that let you make use of your full read library.

Let me know what you discover in the quality plots! :mag: :bar_chart:

Colin

2 Likes

Thank you so much sir for your reply and doubt clarification. I will check them out and let you know the outcome soon.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.