Merging, Quality control and overlapping

Hello to everyone.

I have some questions related to qiime2 and I would appreciate it if you can help me out please.

First of all, I have to say that one part of my question is similar to an old post in qiime forum (problem with merging step in qiime2 with dada2) but since those explanations weren’t clear for me I will ask again.

I try to give details as much as I can. I am analyzing the 16S data and I have 48 paired-end samples. The primers are: 16S-341F and 16S-805R.

The questions are :

1- After quality control and denoising I didn’t get a good result. I think the problem is in “merging”. I also did quality control and denoising for a total different 16S dataset and again the problem was with the merging step and consequently I didn’t get enough non-chimeric data.

The codes that I have used and also the photo are the following:

qiime2 dada2 denoise-paired
–i-demultiplexed-seqs demux.qza
–p-trim-left-f 0
–p-trim-left-r 2
–p-trunc-len-f 216
–p-trunc-len-r 218
–o-tabe table.qza
–o-reresentative-sequences rep-seqs.qza
–o-denoising-stats denoising_stats.qza

2- Can anyone explain the concept of merging and how does that work, please? It is not that clear for me.

3- How to know how much overlapping do we have?

P.S: I do not want to remove any reverse reads. I want to keep them.

Thank you.

Armin

Hello Armin,

Thank you for posting this really great question along with the detailed screenshots. I think you are on the right track.

denoised: 99,046, merged: 57

:scream_cat:
Yep, that’s your problem. Let’s fix it!


Sure. The pair of Illumina reads sequence your amplicons from both ends, with some overlap in the middle.

250 bp amplicon  |-------------------------|
150 bp read      |--------------->
150 bp read                <---------------|
50 bp overlap (correct!)    ^^^^^

Here’s how I work it out:

(forward read) + (reverse read) - (length of amplicon) -  = overlap
150 forward + 150 reverse - 250 amplicon = 50 overlap
This matches the diagram I showed above!

For your data, here’s the numbers I know so far:

216 forward + (218 - 2) reverse - (16S-341F and 16S-805R) amplicon = overlap
216 forward + 216 reverse - (805 - 341) amplicon = overlap
432 - 464 = overlap
32 = overlap

32 overlap is OK, but not a lot… especially when the quality is dropping off.
(Also the dada2 plugin requires 0 mismatches in the area of overlap, which is really strict and will cause you to lose reads. :crying_cat_face: )

I agree! Let’s me know if this helps answer your questions, then we can see how we can fix it!

Colin

1 Like

Thank you so very much Colin,

Your answers were really great and well explained!!
So, now I have to find a way to fix the merging problem.

Thanks again,

Armin

1 Like

Right.

Or, the dada2 plugin could expose the maxMismatch parameter so we could pair reads even if they did have mismatches in them, which is the real solution. This is on the todo list for January 2020! :calendar:

Colin

1 Like

Hello Armin,

I have made a mistake!!

432 - 464 = -32, not positive 32. (I messed up the math!) You know what this means… :sob:

464 bp amplicon  |----------------------------------------------|
216 bp read      |-------------------->
216 bp read                               <---------------------|
32 bp gap!!!                           ^^^

Your reads are not merging because they don’t overlap at all, at that length.

If you trim at 260 bp on both forward and reverse, you will get
260+260-464 = 56 bp of overlap
or if you trim at 240 you will get
240+240-464=16 bp of overlap

Got to try some of these settings!

Colin

P.S. This is a reminder to myself to always double check the math :abacus:

4 Likes

Hi Colin,

Thanks so very much for your comments. I am going to run it now and will see what the result would be.

Thank you,
Armin

1 Like