Hello @cookingrice
Thanks for reaching out to me again.
(I've made this post public so that this excellent discussion can help other users. You are asking great questions!)
No, I think it's important to understand each step in your analysis so you can defend it during review. Let's dive in!
In both cases, vsearch will try to find the best overlap that it can, either starting from 50 and up or 200 and up. Sometimes vsearch finds a 100% perfect alignment that's very short and chooses it over a long alignment that has a few mismatches.
Which alignment it correct? Well, we can figure out how long the overlap should be by looking at the amplicon length and expected read length. Based on this, we could know that the longer alignment is the correct one, and we set the minimum overlap length so that vsearch will return the correct overlap.
Great! If I remember your setup correctly, you expected nearly full 250 bp overlap, so it's good news that this setting works well.
Now let's clear up some misconceptions.
That's true... When reads overlap, they 'double check' the base pairs in the area of overlap and fix errors leading to high higher quality. But we want the overlap that's correct, not just longer.
Here's an example of correct overlap:
250 bp amplicon |-------------------------|
150 bp read |--------------->
150 bp read <---------------|
50 bp overlap (correct!) ^^^^^
Here's what could go wrong:
250 bp amplicon |-------------------------|
150 bp read |--------------->
150 bp read <---------------|
150 bp overlap :-0 ^^^^^^^^^^^^^^^ longer, but wrong
also ends of true amplicon missing??
250 bp amplicon |-------------------------|
150 bp read |--------------->
150 bp read <---------------|
20 bp overlap ^^ 100% matching :-) but also wrong
also extra basepairs outside of true amplicon??
Having really similar sequences is good! I'm not sure why similar sequences would cause problems. Maybe he's thinking of something else? You can have him post about it on the forums if he wants. Maybe there's something about pairing that I'm missing.
Colin