Loss reads in filtering step

Minghong_Jiang · October 18, 2022, 10:09pm

Hi,
I am a newbie in microbiome analysis. Can someone help me with choosing the correct parameters for dada2 step? My dataset is 16s with V4 region, using Forward primer sequencing: 515F and 806R primer sequences. Before I start the qiime2 pipeline I used cutadapt to remove the primers sequences and bbmap "repair.sh" for fixing disordered paired reads.
In the dada2 step, this is the code I used. I have tried different parameter and so far this is my best result. I want to increase the read to about 80% if it is possible but I dont know how to get there without cutting too much reads.

qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 190
--p-trunc-len-r 100
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

paired-end-demux.qzv (296.5 KB)
denoising-stats.qzv (1.2 MB)

colinbrislawn · October 19, 2022, 3:02pm

Hello @Minghong_Jiang,

Welcome to the forums!

Thank you for posting your read quality scores and dada2 stats. Your read quality looks great!

To get more of your reads to pass the filter, not only can you trim to a shorter length like you have done, you can also increase the --p-max-ee-f and --p-max-ee-r settings to allow reads with more expected errors. The default setting is 2 but increasing this to 3 could get more of your reads to join, hopefully up to 80% like you mentioned.

Given the quality of your reads, you could also increase your --p-trunc-len-* settings to keep more of your read area during merging.

There's lots of DADA2 settings to try. Let us know what you find!

Minghong_Jiang · October 21, 2022, 4:57pm

Thank you so much! I actually went all the way back to the cutadapt step and loose my quality parameter a little bit and on top of that I also did
--p-max-ee-f 3
--p-max-ee-r 5
Now I am able to have more reads passed the dada2 step.

Ming
metadata.txt (8.0 KB)

colinbrislawn · October 21, 2022, 5:03pm

OK! I'm glad you got the result you wanted.

I did notice this:

That's a lot of errors to allow into a read.
5/250 == 2% difference between true read and expected errors.

In combination with this:

5/100 == 5% of the bases in a read could be wrong, and it would pass this filter.

Is this what you want?

system · November 21, 2022, 11:04pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.