Following from question about minovlen

Hey, so sorry to disturb you but I’m still a bit puzzled about minovlen - what is the difference between setting it to 50 and 200? From your posts, I understand that a higher value is better cause it results in higher quality reads but my friend said that the sequences would be too similar? (which I don’t get and am not sure if he is right)…

I tried setting minovlen to 50 and 200 and I lost about a lot of sequences for 50 but not so much for 200…

I am not so sure how to proceed as I have been stuck on this for a long time - am I worrying too much?

Thank you so much.

1 Like

Hello @cookingrice

Thanks for reaching out to me again.

(I've made this post public so that this excellent discussion can help other users. You are asking great questions!)

No, I think it's important to understand each step in your analysis so you can defend it during review. Let's dive in! :swimming_man:


In both cases, vsearch will try to find the best overlap that it can, either starting from 50 and up or 200 and up. Sometimes vsearch finds a 100% perfect alignment that's very short and chooses it over a long alignment that has a few mismatches.

Which alignment it correct? Well, we can figure out how long the overlap should be by looking at the amplicon length and expected read length. Based on this, we could know that the longer alignment is the correct one, and we set the minimum overlap length so that vsearch will return the correct overlap.

Great! If I remember your setup correctly, you expected nearly full 250 bp overlap, so it's good news that this setting works well.


Now let's clear up some misconceptions.

That's true... When reads overlap, they 'double check' the base pairs in the area of overlap and fix errors leading to high higher quality. But we want the overlap that's correct, not just longer.

Here's an example of correct overlap: :slightly_smiling_face:

250 bp amplicon  |-------------------------|
150 bp read      |--------------->
150 bp read                <---------------|
50 bp overlap (correct!)    ^^^^^

Here's what could go wrong: :slightly_frowning_face:

250 bp amplicon  |-------------------------|
150 bp read           |--------------->
150 bp read           <---------------|
150 bp overlap :-0     ^^^^^^^^^^^^^^^ longer, but wrong
also ends of true amplicon missing??
250 bp amplicon  |-------------------------|
150 bp read   |--------------->
150 bp read                <---------------|
20 bp overlap               ^^ 100% matching :-) but also wrong
also extra basepairs outside of true amplicon??

Having really similar sequences is good! I'm not sure why similar sequences would cause problems. Maybe he's thinking of something else? You can have him post about it on the forums if he wants. Maybe there's something about pairing that I'm missing. :woman_shrugging:

Colin

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.