Short sequence or longer sequence with ambiguous base of 16S v3-v4 sequencing

C.F_Zhang · March 18, 2020, 9:56pm

Hi all,

I was using MiSeq 2 x 300 bp to sequence 16S rRNA v3-v4 region. However, I get some low qulity bases in the MIDDLE of the forward sequence.

And I could not merge the forward and reverse reads without lossing a lot of counts. So I have 2 ways to deal with that.

One, only use around 120 bp forward sequence. The other way is two use the joined forward and reverse sequence around 400bp, but with 15 ambiguous base paire. Both way, I saved more than 80% of my reads. But I do not know which one give the better result(e.g. diversity, taxa...). I assume the longer sequence with ambiguous base pair give the better result than shorter sequence.

Looking forward to your opinions!

Thanks,
Changfeng Zhang

colinbrislawn · March 19, 2020, 5:30pm

Hello @C.F_Zhang,

Welcome to the Qiime 2 forums! :qiime2:

What primers did you use for V3? How long do you expect your amplicon to be?

Here is what I think happened: the miseq correctly sequenced your full region, then it had nothing left to sequence. The 'low quality region' is just noise.

Here is one way you could check this:

vsearch --fastq_mergepairs R1.fastq --reverse R2.fastq \
--fastqout merged.fastq \
--maxdiffs 300 --fastq_allowmergestagger

That crazy high maxdiffs will mean that all reads will join if they can, and allowmergestagger will allows reads to be merged with >100% area of overlap.

I like this idea. If your region is ~120 bp long, this is the perfect choice.

Colin

C.F_Zhang · March 19, 2020, 8:23pm

Hi@ colinbrislawn

Thanks for your reply. I use v3-v4 primers expecting 460bp length. Sorry for misleading of V3 kit. I changed my post to v3-v4 region. So this probably not the issue of read length.

120bp is relatively short in my circumstance.

Changfeng

colinbrislawn · March 20, 2020, 1:35am

Hello Changfeng,

Got it. So with 140 bp of overlap expected, you might be able to salvage this with vsearch, but that quality is still pretty rough. If you have tried that vsearch command, may I ask how many sequences were able to join?

You have probably thought of this already, but given the quality of R2, could you ask the sequencing core to resequence this run for you? I would consider R2 to be a failed run based on the Q scores and some sequencing cores are willing to resequence

The most practical option is to just use your forward read. The quality of R1 is pretty good, and dada2 should be able to deal with some of the dips in quality throughout the run process.

We could also ask around and see if others have more experience with V3-V4 or difficult MiSeq runs.

Colin

C.F_Zhang · March 21, 2020, 2:48pm

Hi Colin,

Thanks for your reply again!

Only 1-2% of the sequences were left after join. And we tried to ask for resequence. Unfortunately, the sequencing core did not find any technical issue. So, we did not get a resequence.

Changfeng

colinbrislawn · March 21, 2020, 3:47pm

Good morning Changfeng,

That's a shame. I guess you will just have to use the forward read.

I'm sorry this run turned out poorly. You can still use the high quality part of this data and proceed with analysis! Let me know if you have any questions.

Colin