And I could not merge the forward and reverse reads without lossing a lot of counts. So I have 2 ways to deal with that.
One, only use around 120 bp forward sequence. The other way is two use the joined forward and reverse sequence around 400bp, but with 15 ambiguous base paire. Both way, I saved more than 80% of my reads. But I do not know which one give the better result(e.g. diversity, taxa...). I assume the longer sequence with ambiguous base pair give the better result than shorter sequence.
What primers did you use for V3? How long do you expect your amplicon to be?
Here is what I think happened: the miseq correctly sequenced your full region, then it had nothing left to sequence. The 'low quality region' is just noise.
That crazy high maxdiffs will mean that all reads will join if they can, and allowmergestagger will allows reads to be merged with >100% area of overlap.
I like this idea. If your region is ~120 bp long, this is the perfect choice.
Thanks for your reply. I use v3-v4 primers expecting 460bp length. Sorry for misleading of V3 kit. I changed my post to v3-v4 region. So this probably not the issue of read length.
Got it. So with 140 bp of overlap expected, you might be able to salvage this with vsearch, but that quality is still pretty rough. If you have tried that vsearch command, may I ask how many sequences were able to join?
You have probably thought of this already, but given the quality of R2, could you ask the sequencing core to resequence this run for you? I would consider R2 to be a failed run based on the Q scores and some sequencing cores are willing to resequence
The most practical option is to just use your forward read. The quality of R1 is pretty good, and dada2 should be able to deal with some of the dips in quality throughout the run process.
We could also ask around and see if others have more experience with V3-V4 or difficult MiSeq runs.
Only 1-2% of the sequences were left after join. And we tried to ask for resequence. Unfortunately, the sequencing core did not find any technical issue. So, we did not get a resequence.
That's a shame. I guess you will just have to use the forward read.
I'm sorry this run turned out poorly. You can still use the high quality part of this data and proceed with analysis! Let me know if you have any questions.