Merge or not merge? When merging introduces taxonomy bias

Dear Qiime community,
I'm using Qiime2 (installed with conda) for the first time and I'm analyzing Illumina PE data obtained by 2x250 bp on V3-V4 region (341F-806R).
These are the reads :

1)Why quality of reverse reads is ( as usually seen) so bad? Are these reverse reads still usable in PE analysis or is better a single-end forward reads approach? What can I do to avoid this quality drop in reverse reads next time?
2)I've noticed that trimming and maxEE parameters in this situation have a strong effect on taxonomy composition. Bifidobacterium sp. is a core species in this microbiota and is consistently present in all the samples when I analyze only forward reads. However, this species is absent when I do PE analysis with DADA2 using the following command :

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 249
--p-trunc-len-r 240
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

Bifidobacterium is present only in few samples when I trim more the reverse reads, and increasing maxEE to 10, with the following command:

qiime dada2 denoise-paired
--i-demultiplexed-seqs ../../../demux-paired-end.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 249
--p-trunc-len-r 232
--p-n-threads 4
--p-max-ee-r 10.0
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza

Here are the results of taxonomy and the representative sequences:

rep-seqs.qzv (270.0 KB) taxonomy.same.silva_blast_v3v4_16S.qzv (1.3 MB)

How do you explain this situation? I've noticed that Bifidobacterium sequences has only 445 bp, while I expect something like 460 bp using these primers... Do you think that this length variability can be responsible of this different species composition using different denoising parameters?

Hello Andrea,

Welcome to the forums! :wave:

Unfortunately, I think your reverse reads may have failed. Like you mentioned, it might be better to use only your single-end forward reads, as their quality is much higher.

That’s a great question to bring to the team or company that sequenced your amplicons!

It’s also possible that you did everything right and the run just failed. That happens sometimes. :man_shrugging:

Given that the quality really tanked on R2, your sequencing company may be willing to rerun your samples at a discount, or for free! It’s always worth asking :money_mouth_face:

This makes sense to me. DADA2 trains an error model based on the data you provide to it, which is why it’s important to trim and filter our noisy data before training and denoising.

To make things more complicated, taxonomic assignment happens after denoising, and is highly depending on read length and the region sequenced. Some taxa have a very similar V3 region, but could easily be differentiated down to the species level because they have a distinctive V3-V4 region.

Because taxonomy assignment depends on both read length and quality, there’s a bunch of reasons why your Bifidobacterium could have gone missing. We need more clues… :female_detective: :male_detective: :mag_right:

What percentage of your reads do you lose after running those two denoise-paired commands? It’s possible that your Bifidobacterium have trouble joining when you use the low quality reverse R2 reads, and are simply absent from downstream analysis.

Let me know what you find!

PS. --maxEE to 10 is pretty high! Like, 10/(806-341) = 2.1% error rate across the full amplicon :grimacing:

If that’s what it takes to get the reads to merge, this could be a good argument for only using your forward reads, or finding a way to resequence :money_with_wings:


Thank you Colin for your answer,

Yea, that's what I think may happen here... However, I've attached here the screenshots of denoise stats:

  1. stringent conditions PE :

  2. relaxed conditions PE:

3)forward reads only:

Yes, I agree. I think “2. relaxed conditions PE:” is especially interesting as it shows that even when most reads pass filter (80-85%), as huge number of reads still can’t merge (40-60%).

Using only forwards reads will let you set stronger filters, avoid bias from merging, and still keep most of your reads.

Forward it is! :arrow_forward:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.