sequence alignment

momay · May 6, 2026, 2:24am

Hello,

I have 16S rRNA sequencing data targeting the V3–V9 region, generated using paired-end 2×300 bp reads. I’ve been trying different truncation length combinations to improve read merging, but the merge rate consistently stays around 65–70%, even though filtering retention is always above 99%.

I would appreciate any suggestions on how to improve the merging percentage.

Thank you.

colinbrislawn · May 6, 2026, 2:30am

That's very helpful. How long do you expect the amplicon to be? What length of overlap?

Could you share with me the full command you are running?

I've heard of using figaro to optimize merging, but let's start with the expected overlap calculation!

momay · May 6, 2026, 3:06pm

Hello,

Amplicon length could be between ~425-460 bp.

Here is my code, again I tried many combination of truncate length.

qiime dada2 denoise-paired
--i-demultiplexed-seqs trimmed.qza
--p-trunc-len-f 280 --p-trunc-len-r 250
--p-n-threads 18
--o-denoising-stats dns
--o-table table
--o-representative-sequences rep-seqs

Metadata on denoising

qiime metadata tabulate
--m-input-file dns.qza
--o-visualization dns

qiime feature-table tabulate-seqs
--i-data rep-seqs.qza
--o-visualization rep-seqs

=

colinbrislawn · May 6, 2026, 8:30pm

Okay!

--p-trunc-len-f 280 --p-trunc-len-r 250

280+250=530 total bp
530 - 425 shorter = 105 pasepairs of overlap
or
530 - 460 longer = 70 bp overlap

Have you tried way shorter, to get to 20 bp of overlap?

For example, 460 longer +20 overlap = 480 total
480 total = 280 forward + 200 reverse? For example:
--p-trunc-len-f 280 --p-trunc-len-r 200

What does the quality score plot look like?

momay · May 8, 2026, 3:43pm

Yes I tried shorter length, quality score is really high, above ~35

cherman2 · May 11, 2026, 6:54pm

Hi @momay

Another possible option.

If this is a new sequencing run, it might have a poly-g tail (coming from the new 2-color chemistry set, where the sequencer can't tell the difference between no base and a G base). Additionally this gets complicated by the fact that a G in the 2-color chemistry is always a 40.

I found for a recent sequencing run that my quality looked really good and then I removed the poly-g tails and my quality was not as good, as I had originally thought. The poly-g's also could effect merging too.

edit: here is the command I used:

qiime cutadapt trim-paired \
    --i-demultiplexed-sequences {input} \
    --p-nextseq-trim 1 \
    --o-trimmed-sequences {output} \
    --p-cores 12 \
    --verbose > polyg_trim.out

Just some more food for thought!