Parameters of DADA2

Hi, I have a problem with DADA2.
I have about 50,000 reads/sample before running DADA2, but only 5,000 reads/sample after running DADA2. Is it a because of the sequence quality?(Q.1)

I got more reads, if I run DADA2 with higher maxee value. How much can I increase the value with keeping appropriate reliability?(Q.2)

And do you have any other solution to get more reads?(Q.3)

Thank you in advance.

Hi @kiara,

Just to be certain we are talking about the same thing, where do you get the number 5,000 reads/sample from? (I assume the second tab of the feature-table summary?). And yes often poor quality can result in losing data. Would you be able to provide the qiime demux summarize visualization for this data?

How many more reads did you get with a different max_ee value? And what value did you set? It may be the case that your data is just very noisy. In that case, it isn't necessarily a bad thing that you have less reads, it just means the relative abundances of different ASVs are little more uncertain. So whether that is a problem is going to depend on what questions you are trying to answer with the dataset.

Also, is your data paired-end? An easy way to lose lots of sequences is if your reads don't overlap enough.

Another possibility is that your real reads are being seen as chimeric (and then removed), which usually happens because your reads still have primers/adapters/non-biological sequence data attached. These need to be removed before processing (we don't have any tooling to help this yet in QIIME 2, but cutadapt is nice).

Thanks!

2 Likes

Thank you for the reply.

Yes, I'm talking about the 2nd tab of feature-table summary.

I provide the test data with small scale.
demux.qzv (283.8 KB)

And the the feature-table summary by this command.

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trunc-len-f 280
--p-trunc-len-r 280
--o-representative-sequences rep-seqs.qza
--o-table table.qza

table_default.qzv (319.4 KB)

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trunc-len-f 280
--p-trunc-len-r 280 \

--p-max-ee 10
--o-representative-sequences rep-seqs.qza
--o-table table.qza

table_maxee10.qzv (322.5 KB)

I imported demultiplexed fastq files from MiSeq. Is it better to remove adapters by useing cutadapt?

Thanks.

Hey @kiara!

Thanks for the data.

These quality scores seem pretty good. It looks like something upstream is clipping the length (but that doesn't happen to the vast majority of the reads) which is consistent with some basic quality control (probably the MiSeq/Casava?). To the best of my knowledge shouldn't be a problem for DADA2.

But I would probably set a trim-left for this data because there is a pretty noticeable dip in the beginning.

I'm sorry I should have mentioned that you can usually use trim-left for this since the length of your non-biological data at the start of each read is known ahead of time. It's the reverse-primers that QIIME 2 has trouble with at the moment, in which case cutadapt is a great way to handle the issue (usually this matters for ITS).

Given the difference in reads between max-ee=2 and max-ee=10 isn't that much relative to the number of reads, my guess is your data is getting caught up in chimera detection. So you should probably set a trim-left-f/r that covers your non-biological data.

Let me know if that works for you!

1 Like

Thank you for your very kind reply.

I tried this command.
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trunc-len-f 280
--p-trunc-len-r 280
--p-trim-left-f 15
--p-trim-left-r 15
--o-representative-sequences rep-seqs.qza
--o-table table.qza

table_trim15.qzv (328.3 KB)

and another one.
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trunc-len-f 280
--p-trunc-len-r 280
--p-trim-left-f 15
--p-trim-left-r 15
--p-max-ee 10
--o-representative-sequences rep-seqs.qza
--o-table table.qza

table_trim15_maxee10.qzv (330.1 KB)

I got enough read with maxee 10.
How much maxee value is acceptable for 16s megagenomics analysis?
I got less read if I decreased maxee 6.

table_trim15_maxee6.qzv (328.6 KB)

Thanks.

1 Like

Hey @kiara,

It looks like setting trim-left to just after the quality dip basically doubled the number of features!

The last thing that we can check is if denoise-single results in a great many more features than its paired counterpart. You can run it with the same demux.qza it will just only look at the forward reads. This let's us tell if the merge step is problematic (but it seems unlikely, since max-ee and trim-left seem to be controlling the number of features we see).

I don't think there is a hard or fast rule, 10 seems pretty high, but both 6 and 10 have similar feature distributions. Since you have these tables already (denoising is the step that takes the longest), you might try running some preliminary analysis on each to see if they say different things. @benjjneb, do you have any suggestions on maxEE?

2 Likes

maxEE of 10 is pretty high. There is not much value in pushing more high-error reads through, it’s usually better to trim off a bit more of the tails while keeping maxEE lower.

I’l also add: If adding trim-left increased the number of reads getting through by a lot, that probably means that you have primers at the start of your reads (trimming off primers will reduce the number of reads lost to spurious chimera detection due to the ambiguous nucleotides in the primers).

I can’t stress enough: Make sure your primers are removed! Primers are not biological nucleotides, and they usually contain ambiguous nucleotide positions. You could kind of get away with leaving primers on when making fuzzy OTUs. You can’t when you are calling exact sequences!

4 Likes

Also a quick follow-up: If your primers are on the start of your reads, and if you know their lenghts, you can use trim-left to remove them.

For example, if using the EMP primers the forward 515F primer is 19nts, and the reverse 806R primer is 20nts. So trim-left-f 19 and trim-left-r 20 will remove the primers for you!

In fact that is the primary use of trim-left. If primers aren’t on the reads, trim-left should generally be set to 0.

4 Likes

Thank you @ebolyen and @benjjneb

I'm using 341F and 804R, so I set trim-left-f 17 and trim-left-r 21.
But the result wasn't so different from previous one.
maxee_2_trim_17_21_table.qzv (328.1 KB)

Then I change the value of trunc-len, from 280 to 250.
In consequence, I got more reads.

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trunc-len-f 250
--p-trunc-len-r 250
--p-trim-left-f 17
--p-trim-left-r 21
--p-max-ee 2
--o-representative-sequences 250_rep-seqs.qza
--o-table 250_table.qza

250_table.qzv (339.0 KB)

I thought that the sequence quality of merged region was low when I used trunc-len-f/r 280. Is it correct?

Yes, the drop-off of quality at the ends of the reads causes many reads to be lost to the quality filters. When you truncated earlier, you removed the worst parts of that tail and kept more reads.

In rough numbers, the size of your amplicon (from primer start positions) is ~425 nts. You need 30 nts of overlap to be safe, So you need 425+30~455 nts after truncation. That is, trunc-len-f + trunc-len-r > 455.

As long as that condition holds, you can reduce trunc-len to get more reads through the filter, and that is often the right choice when these low quality tails exist.

4 Likes

Thank you for your answer.
I got enough read dow to your help.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.