I am having poor merging results with vsearch in QIIME and after trying a few things I found on this forum that didnt really change anything I figured I'd reach out for advice via post. Any suggestions on how to boost merging results or are my reads just crap?
the size range of this region: 267-511 bp amplicon using the primers I used in this region (Figure 1 in the below paper)
Illumina sequencing length: 2 × 250 bp
Primers are from this paper: Taylor, D. L., Walters, W. A., Lennon, N. J., Bochicchio, J., Krohn, A., Caporaso, J. G., & Pennanen, T. (2016). Accurate estimation of fungal diversity and abundance through improved lineage-specific primers optimized for Illumina amplicon sequencing. Applied and environmental microbiology , 82 (24), 7217-7226.
Uh... it's not just the ends that have low quality. The quality is highly variable throughout, which I guess is a common problem with variable length regions.
I like your idea of using trimmomatic (or vsearch itself) to cut off the ends of reads once their quality drops. This has to support variable length trimming per-read because read length and quality also varies per-read.
Thanks for the update. I'm glad some of the reads are merging.
Would trimming more off help?
What other suggestions do you have?
I'm not sure....
Does the most common error, ' too few kmers found on same diagonal' mean that the reads just arent similar enough to overlap??
Yes. This is an explanation of why it failed: the read pair can't join because it can't align because too few kmers were found. And this is expected sometimes. Remember:
An overlap of -11 is a gap of 11. You can't join when there's no overlap.
Choosing to join will functionally filter for short amplicons because only those will overlap.
Perhaps it's best to analysis this data set twice. Once with unjoined reads, and again with paired reads. This will let you view the data with and without length bias.
Thanks for such a fast response to all my post. It makes a difference!
As for the protocol with unjoined reads- I would just continue data analysis as usual, skipping the merge pairs step and heading to the quality filtering step? I've been trying to look for protocols to follow and not having much luck.