Most reads failed to merge due to staggered read pairs

amirza · October 4, 2019, 6:36pm

Dear q2-ers,

All of my reads do not merge after using native cutadapt. The vast majority failed to merge due to staggered read pairs, according to the log from qiime vsearch join-pairs.

First, I used native cutadapt to remove forward and reverse primers from my paired-end Illumina sequences (V4 region) using the command below. My primers are: 515fXT (GTGBCAGCMGCCGCGGTAA) and 806rXT (GGACTACHVGGGTWTCTAAT). As you can see in the log below, primers were detected in over 90% and over 90% of the read pairs were written. This was the case for pretty much all of the fastq pairs.

Generic cutadapt command:
cutadapt --cores=0 -g GTGBCAGCMGCCGCGGTAA -G GGACTACHVGGGTWTCTAAT --discard-untrimmed -o R1.cutadapt.fastq.gz -p R2.cutadapt.fastq.gz R1.fastq.gz R2.fastq.gz >> primer_trim.log

Then I tried to merge using vsearch join-pairs:

qiime vsearch join-pairs --i-demultiplexed-seqs demux.cutadaptgG.qza --o-joined-sequences demux.cutadaptgG.merged2.qza --verbose

stdout:

Here are the quality graphs:

Any ideas why nearly 100% do not merge?

colinbrislawn · October 4, 2019, 6:50pm

Hello Ali,

Welcome back to the forums!

Yep. That exactly what's going on here.

Fortunately, vsearch can merged staggered reads! Just add the --fastq_allowmergestagger flag, and vsearch can merge these reads no problem!

Let me know if that new flag works well with your data set.

Colin

amirza · October 4, 2019, 10:32pm

You meant --p-allowmergestagger flag correct?

It worked. Thank you. I was surprised that this became an issue because im sure many are also using the V4 region and have used cutadapt without the --p-allowmergestagger flag but have not reported any problems. Why was this only an issue for me?

I salvaged a majority of my read pairs but I am concerned at how many I am losing because the alignment score was too low or too many differences between the pairs. See below for an example. Total number of reads came down from 47,952,468 (236,219) to 36,486,390 (mean 179,735). That's a ~24% loss. Some samples lost as much as ~40% of the reads. Is it worth trimming off the at the ends (before merging) to increase the quality of the reads but sacrificing sequence overlap?

colinbrislawn · October 5, 2019, 11:08pm

That's right. Good catch!

I'm glad you got more of your reads to pair.

Yes, that's a great idea.

Also, the default --p-maxdiffs 10 is super low. I would up that to at least 20 (or 30!!) and see if your reads get paired.

The alignment score too low is harder to fix. Some reads just don't pair.

I'm not sure either. This is good question to bring up with your sequencing provider or PI. Are these the EMP primers or from another organization? That could explain differences in region sequenced, and thus in pairing.

Colin

amirza · October 7, 2019, 11:29pm

I set --p-maxdiffs to 30. That salvaged another ~14% reads. Many thanks!

system · November 8, 2019, 5:29am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.