Losing a lot of reads during merging

Greetings. I'm new to QIIME2. So there are still a lot of things I need to learn and I appreciate all of helps I can receive.

Currently, I'm conducting a 16s QIIME2 analysis pipeline based on moving picture tutorials to know about the workflow and what does each parameter do in each step.

The data I'm using is from this article (V3-V4 paired end 16s 2x250): The pesticides carbofuran and picloram alter the diversity and abundance of soil microbial communities - PMC

The current problem I am meeting is that:
During DADA2, I noticed that a lot of reads were lost during the merge process. And I have already tried to read on the forum about this matter but to be honest, I'm still lost on this matter.

This is the quality plot after importing:

These are all of my version running DADA2 (I mainly played around with p-trunc-f/r since it seems to be the main reason why I lose a lot of reads):
Ver 1 (Forward: 217; Reverse: 196)

Ver 2 (Forward: 250; Reverse: 200)

Ver 3 (Forward: 250; Reverse: 250)

Ver 4 (Forward: 229; Reverse: 205)

From all of 4 vers, I can only see that all 4 didn't retain that many reads during the merging step, only ver 3 can retain around 20%.

Additionally, I have already read about this problem. The problem seems to stem from not enough overlap regions after truncating. But in ver 3, I basically didn't truncate but I still didn't have a lot of reads retained.

Moreover, I tried to read about the impact of parameters via Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing | Nature Methods and DADA2: Fast and accurate sample inference from amplicon data with single-nucleotide resolution and denoise-paired: Denoise and dereplicate paired-end sequences — QIIME 2 2024.10.1 documentation

2 Likes

Welcome to the forums! :qiime2:

This is a great first post! I really like your approach of trying multiple joining settings and seeing what works best.

Ver 3 (Forward: 250; Reverse: 250)

This result is fascinating, as it shows that merging works best with the longest reads you have.

You may have found this already, but I bet the length of the V3-V4 region is causing problems.

How many basepairs long do you expect this amplicon to be?
With 250x2 bp reads, how much overlap do you expect with no trimming?
(here is how to calculate the length of overlap during merging)

3 Likes

Hi @colinbrislawn , thank you for your reply!

I reread the article, the primers they used were S-D-Bact-0341-b-S-17 and and S-D-Bact-0785-a-A-21.
I have tried to calculate the overlap using the formula you provided:

(forward read) + (reverse read) - (length of amplicon) = overlap

=> (250-5) bp + (250-5) bp - (785-341) bp = 46 bp

With 250x2 bp reads, I think the overlap will be around 50 bp with no trimming. From what I read and understand, it seems that the overlap should be around 20 bp minimum, is that correct?

1 Like

Very cool!

Okay, let's see if shorter trimming so the overlap is only 20 bp or 12 bp, which I think is the DADA2 default now, helps.

2 Likes