Hi everyone,
I am currently analyzing 16S rRNA gene sequences (V3-V4 region) using QIIME2. I am facing a challenge during the denoising step with dada2 denoise-paired.
The problem is that my sequencing quality drops significantly around 220 bp for the Fw reads and 190 bp for the Rv reads. Taking this approach my combined high-quality length would be 410 bp, so I would be approximately 70 bp short of achieving the required 20 bp overlap for merging.
So, my questions are:
-
If I prioritize sequence quality and truncate at 220/190 bp, the reads will not overlap. Is there any reliable evidence or benchmark regarding the taxonomic resolution (specifically at the species level) when using non-overlapping paired-end reads (e.g., using N-padding or concatenation)?
-
In this scenario, would you recommend proceeding with Fw reads only (denoise-single) to maintain high quality, or is it better to risk lower quality bases to force an overlap?
-
Are there any alternative strategies within QIIME 2 to handle V3-V4 samples with poor distal quality without losing too much taxonomic depth?
Thanks in advance for the help!
Gorka
2 Likes
Hi @Gorka_Garcia, While a little more work, I would try both approaches: a paired-end read analysis that is more permissive of the low quality bases (and so truncates later, if at all), and in parallel a single-end read analysis of the forward reads. Then, compare whether you see differences in downstream results that impact the conclusions you draw from the data. You could always take the approach of presenting one set of results as your main analysis, and then supporting it with the other analysis (which is supplementary).
If you share the .qzv with the quality score plots, we could take a look at that too and advise on whether we'd do anything differently.
1 Like
Hi @gregcaporaso. These are the quality score plots:
fastq_trimmed.qzv (315.5 KB)
I greatly appreciate the help!
Gorka
Hi @Gorka_Garcia, These data look pretty good to me. I wouldn't worry about single-end read analysis, but rather just go with paired end read analysis and higher truncation values than you were originally thinking. It's true that the quality starts to drop, but it's still very high. Maybe think about where it looks like it's dropping and staying below Q30 for a few bases in a row as a place to truncate.