Looking for help joining paired-end reads

Thanks for sharing these, @Hui_Yang!

I agree. It isn't the worst I have seen, but there is a lot of reads being discarded during the merge step. As well, it looks like a few of your samples are losing many reads at the chimera filtering step.

I'm pretty sure 515f/806r is V4 only (can @jwdebelius or @SoilRotifer confirm that?), not V4-V5. Can you confirm what your target region is, and what primers you used? These are important to keep in mind when choosing your DADA2 trim and trunc parameters, and they have a direct relationship to the merging step - if you pick too aggressive trim and trunc params, you run the risk of making the reads unmergeable. Also, q2-dada2 needs at least 12 nts of overlap in order to merge. @Mehrbod_Estaki has a few good posts floating around the forum demonstrating how to calculate the overlap, check the "Search" functionality here in the forum for more details.


1 Like

Hi @Hui_Yang & @thermokarst,

Correct, 515-806 is V4 only. V4V5 is usually something near the 515F-958R range.



Thank you! Sorry I put the wrong primers, it was 515F and 926R.

I got demultiblexed raw data but I couldn’t remove the primers (somehow the tool they provided didn’t work) so I trimmed off 20 nts on the left side of R1 and R2. I set trunc len = 220, which should give me about 30nts of overlap, but that may be too short. I’ll try longer trunc length.


Thanks for clarifying, @Hui_Yang!

515F 926R is ~411 nts long. Based on the provenance in the files you shared, your trim and trunc params were:

  • trunc_len_f: 240
  • trim_left_f: 20

That means that the forward reads were trimmed to 220 nts

  • trunc_len_r: 200
  • trim_left_r: 20

That means the reverse reads were trimmed to 180 nts

I think that means there is no way the fwd and rev reads can overlap:

411 nt amplicon |----------------------------------------------|
240 nt fwd read |-------------------------->
200 nt rev read                    <---------------------------|
29 nt overlap                      ^^^^^^^^^

@SoilRotifer, am I doing this math right?

I think you will need to relax your params quite a bit, you need at least 23 nts more to ensure sufficient overlap.

Yeah that works out! :white_check_mark:

I think your forward reads look great @Hui_Yang! Based on the demux paired-end output, you should be able to go out to further for each read.

Based on the co-ordinates of the demux qzv file, I’d try the following truncation point values (or combinations thereof) :

  • FW: 268 | 283
  • REV: 211 | 243

I’d try the 268 - 211 pair first.


Why not try running q2-cutadapt on your demuxed data prior to dada2 / deblur? You can search the forum for many examples of running cutadapt. Then re-visualize your cutadapt output (quality plots) the same way you did your demuxed data. This will help you determine the appropriate truncation points after the primers have been removed, i.e. they’ll likely be 20-30 bp shorter. At this point there is likely no need to use the trim options.


1 Like

Oh I thought trunc len was the length after the trimming. Good to know!

I tested a few params and here’s what I got:

Trim: f = r = 10, Trunk len: f = r = 220, 294 features
Trim: f = r = 10, Trunk len: f = r = 240, 254 features
Trim: f = r = 20, Trunk len: f = r = 260, 255 features
Trim: f = r = 20, Trunk len: f = r = 290, 207 features
Trim: f = r = 0, Trunk len: f = r = 280, 205 features
Trim: f = r = 20, Trunk len: f = 240 r = 220, 272 features

Seems trimming didn’t make too much difference, and trunc length shouldn’t be too long or too short. Is it ok to proceed with the setting that gives the most features?

I’ll try that. Thanks!

1 Like

Well… this was specifically in reference to my comment about using cutadapt. That is, if you run cutadapt to remove the primers, your sequences will be shorter. So you’d have to re-visualize them.

Trimming the 5’ ends will not have an effect on the merging process, specifically. It is the quality and overlap of the 3’ ends that matter.

As long as the quality is good, and there is enough over-lap to merge the reads, being “too long” or “too short” won’t matter.

Did you try the specific truncation values I suggested? Especially the fw: 268 with rev: 211? There can be large differences just shifting a few bases over.


1 Like

Hi Mike,

I tried to trim off the primers with cutadapt. Here's my cmd:

qiime cutadapt trim-paired
--i-demultiplexed-sequences csv18-pair-end.qza
--p-error-rate 0
--o-trimmed-sequences csv18-pair-trim.qza

I got something slightly different but the length didn't change.
Before: csv18-pair-end.qzv (313.8 KB)
After: csv18-pair-trim.qzv (319.1 KB)

You’ll want to add

 --p-match-adapter-wildcards \
 --p-match-read-wildcard \
 --p-discard-untrimmed \

to your command. The first two allow matches to IUPAC ambiguity codes (e.g. N, M, R…) while the last discards any pairs in which both primers are not found. This is why there are two drop-offs at the end of the quality plots, some are not being trimmed.


Nice! Now it works. There are some scattered points in the reverse reading csv18-pair-trim2.qzv (317.4 KB)

That should be fine @Hui_Yang. These are just a few spurious long, and likely off-target, amplicons in your data. Which is normal. If you :computer_mouse: over them you’ll see they are low-count.


Awesome, thanks.

I just ran another round of DADA2, got 259 features.mrdna-tm2-tab.qzv (421.3 KB)

Also tried to join the trimmed ends for Deblur, using vsearch:
qiime vsearch join-pairs
--i-demultiplexed-seqs csv18-pair-trim2.qza
--o-joined-sequences trim2-joined.qza

Got a read of two ends stitched directly, no signs of overlap. trim2-joined.qzv (304.9 KB)

Looking at the provenance, you forgot to adjust the truncation length of my initial suggestion of fw:268 and rev:211 by subtracting the length of the primer. So, your new truncation lengths, after running cutadapt, should be something like:

fw : ~ 250
rev : ~ 190

1 Like

My apologies, still fixed on the thought that trunc length is applied after trimming.

Here’s another round following your suggestion and subtracted the primers: FW: 268 | 283 & REV: 211 | 243
–p-trunc-len-f 249 --p-trunc-len-r 191, --> 257 festures
–p-trunc-len-f 264 --p-trunc-len-r 223, --> 265 features


The trim length setting only applies when running DADA2 / deblur. Remember you ran cutadapt to remove the primers as a separate prior step. So the sequence are already shorter prior to running DADA2 / deblur. :slight_smile:

Does that make sense?


Yes thanks for clarifying!