Could anyone let me know what the acceptable percentage of merged after denoise of paired-end sequences of 16S rRNA (2x300 bp)?
I am doing bioinformatics using Qiime2 on 16S rRNA sequences which was sequenced in Illumina miseq and the file I work with is fastq file. Firstly, I cut primer using cut-adapt and then denoise with DADA2 providing different trim/trunc value to provide 14 - 56 overlap.
Then, the stat-dada2 showed that the percentage of input merged range 40 - 55%. I wonder if this is acceptable? or what is the acceptable percentage of input merged?
This is a great post! I always find parameter sweeps very helpful.
One goal is to keep the most data possible. The other goal is to keep only the highest quality data. So it's a tradeoff between quantity and quality, hopefully with a sweet spot in the middle.
The trim/trunc step:
I choose these settings by looking at the graph of quality scores in my imported data. Did you use your graph to choose these settings?
The filtering step:
As you have noticed, trimming has a small effect on the percentage of reads that pass the filter. The dada2 settings that set the filter threshold are --p-max-ee-f and --p-max-ee-r. Did you use the default of 2?
The merging step:
This is less of a tradeoff: you want all your reads to join, and as you can see, most of the reads that pass the filter are joined!
Yes, I detemine trim/trunc value using the quality plot at phred=20. At highee quality score, the trimming is shorter and did not provide overlap. So I pick at quality score of 20.
I did not put '--p-max-ee-fand--p-max-ee-r' in denoise step, so I think it is default.