Why do I loose 36.8% of reads while using PEAR for merging?

colinbrislawn · March 21, 2022, 4:30pm

OK! So vsearch is doing a little better, but still not making enough overlap.

Could you re-run with --verbose then attach that log? It should give us more clues about how joining is working. The log will look like this:

Merging reads 100%
199077 Pairs
190464 Merged (95.7%)
8613 Not merged (4.3%)

Pairs that failed merging due to various reasons:
792 too few kmers found on same diagonal
4 potential tandem repeat
1727 too many differences
6080 alignment score too low, or score drop to high
10 staggered read pairs

Statistics of all reads:
250.82 Mean read length

Statistics of merged reads:
454.51 Mean fragment length
13.02 Standard deviation of fragment length
0.62 Mean expected error in forward sequences
0.70 Mean expected error in reverse sequences
0.87 Mean expected error in merged sequences
0.44 Mean observed errors in merged region of forward sequences
0.53 Mean observed errors in merged region of reverse sequences
0.97 Mean observed errors in merged region

(That example is from this post)

Then we can start playing with the vsearch settings to give us the correct overlap. See --p-minovlen in the vsearch join-pairs docs.