dada2 parameter question

shinseung · May 7, 2019, 3:19pm

demux370.qzv (297.4 KB)
demux.qzv (316.4 KB)
Hi.
I would like help with the dada2 parameter.
illumina miseq
paired end 300bp
primer: 341F / 805R (V3,4)

sample: 370 (demux370.qzv)
trim-left-r / f: 10
trunc-len- r / f: 260
=> No result in 20 otu table (0 is displayed)
sample: 1850 people (including 370 people) (alldemux.qzv)
trim-left-r / f: 10
trunc-len-r / f: 280
=> Results of all 1850 people

Overall quality is poor.
20 sample When this parameter is set to 260, the result is not displayed. When the parameter is set to 280, the result is displayed.
Is it meaningful to get a result of 280 in poor quality?
Or you can suggest another parameter.

I'll wait for your answer
Thank you.

Mehrbod_Estaki · May 7, 2019, 9:30pm

Hi @shinseung,
Could you also share with us the dada2 stats-summary visualization for both those runs please.
In addition could you clarify what you mean by:

Do you mean that 20 of your samples are not showing up in the output?

Also:

Do you mean that there is an error or the output is blank?

shinseung · May 8, 2019, 3:24am

Hi

1.no, there is a sample but the result is zero.
2.output is blank

thank you

denoising_stats.qzv (1.2 MB)

Mehrbod_Estaki · May 8, 2019, 5:36pm

Thanks @shinseung,

This one is a bit odd for me too but here are some thoughts on the matter:
When looking at your stats-summary, we see that you are losing a significant amount of your reads in the initial filtering step in both runs. There’s a lot of reasons this can happen but a few common ones are ambigious nt in your reads (look for Ns in your raw fastq files), these reads will get discarded, they especially tend to appear at the beginning of your reads if they are present. The presence of non-biological sequences in your 5’s (remove these prior to dada2 if haven’t already), and maybe even the initial poor dip in quality in your forward reads. I would recommend trimming more than 10 bp from your 5’, up that to maybe 20bp.

Your truncating parameters look ok to me but you might be better off truncating a bit more from your reverse reads. You have about 140bp overlap with your primers, and leaving about 30bp to be safe for merging, I would truncate up to ~ 110bp total between both reads, say truncate 230 on reverse and 265 on the forward.
When you truncate at a higher length which allow longer reads to pass through you actually risk having those reads discarded initially since they are more likely to have instances of really poor q scores which would cause them to filtered out. The odd behavior is that you mention your 280bp trunc run is performing better which is the opposite of what I would expect.
Could you also share the stats-summary from the second run as well please so we can look into that a bit more carefully.
Thanks!

shinseung · May 9, 2019, 6:54am

THANK YOU

Will there be more Ns in the fastq file? Are there any related parameter settings? If there are more than N, will it be discarded?
If trunc-len is set to 260, is it filtered out ahead of the q score? Is not it possible to read unconditionally even if the q score is low up to 260?

please answer about my question.

thank you

Mehrbod_Estaki · May 9, 2019, 10:32am

Hi @shinseung,
The initial filtering step in DADA2 does a few things:

Look for PhiX reads and discards those if they haven't already been done by your sequencing facility. Usually this shouldn't be more than 2-20% of all your reads depending on your facility and run, so this alone isn't explaining your big loss at this step.
Trim and truncate
Remove reads with ambigious basecalls. Any sequences that have an ambigious nt are discarded. This happens when the sequener couldn't call a C, G, T , or A with any level of certainty, instead the nt is left as an N. There is no parameter setting for this in q2-dada2, any reads with even a single instance of N will be discarded. In the native version of dada2 you can change this with maxN but not in the q2 version.
This might be one step to look further into. Count how many sequences had an instance of N in your fastq files. Maybe check on one of your fastq files to see how many sequences had an N in them? Something like this might work

sed '1,2d' test.fastq | awk 'NR % 4 == 0'| grep "N" -i | wc -l

(I'm sure there are more elegant ways of doing this ^)

Discard reads that don't meet the minimum parameters based on expected errors, qscore, and lengths.

To be honest, I'm not sure exactly in what order all these steps are done. If I had to guess I would think remove PhiX, trim/truncate, filter, but However, I do know at least that qscore filtering occurs after trimming since we always see an improvement in initial filtering when we get rid of those low quality tails, in fact that is one main reason why we recommend trimming as much as low quality nts as possible.

system · June 9, 2019, 4:32pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.