What is the acceptable percentage of merged?


Could anyone let me know what the acceptable percentage of merged after denoise of paired-end sequences of 16S rRNA (2x300 bp)?

I am doing bioinformatics using Qiime2 on 16S rRNA sequences which was sequenced in Illumina miseq and the file I work with is fastq file. Firstly, I cut primer using cut-adapt and then denoise with DADA2 providing different trim/trunc value to provide 14 - 56 overlap.

Then, the stat-dada2 showed that the percentage of input merged range 40 - 55%. I wonder if this is acceptable? or what is the acceptable percentage of input merged?

Many thanks.

Please see more details below;

1 Like

Hello Tararag,

Welcome to the forums! :qiime2:

This is a great post! I always find parameter sweeps very helpful.

One goal is to keep the most data possible. The other goal is to keep only the highest quality data. So it's a tradeoff between quantity and quality, hopefully with a sweet spot in the middle.

The trim/trunc step:
I choose these settings by looking at the graph of quality scores in my imported data. Did you use your graph to choose these settings?

The filtering step:
As you have noticed, trimming has a small effect on the percentage of reads that pass the filter. The dada2 settings that set the filter threshold are --p-max-ee-f and --p-max-ee-r. Did you use the default of 2?

The merging step:
This is less of a tradeoff: you want all your reads to join, and as you can see, most of the reads that pass the filter are joined!

1 Like

Many thanks Colin.

  1. Yes, I detemine trim/trunc value using the quality plot at phred=20. At highee quality score, the trimming is shorter and did not provide overlap. So I pick at quality score of 20.

  2. I did not put '--p-max-ee-fand--p-max-ee-r'​ in denoise step, so I think it is default.

1 Like