Where should I truncate my forward and reverse reads?

bkramer · December 17, 2021, 10:03pm

Hello,

I'm having issues determining where I should truncate my forward and reverse reads prior to denoising using dada2. Based on the reads (see attached .qzv file), it would seem I'd have to truncate to regions where the quality of the reads drops below a certain threshold, but I'm not sure where that should be. I've used a quality score of 22 as my cutoff for truncation: wherever it is lower than that is where I truncate. Yet almost nothing is passing the filters.

mcyE_121721_denoising-stats.qzv (1.2 MB) mcyE_121721_rep-seqs.qzv (201.7 KB) mcyE_121721_table.qzv (389.5 KB)

Best regards,

Ben

colinbrislawn · December 19, 2021, 5:02am

Hello Ben,

Could you run qiime demux summarize on your demultiplexed file and post a screenshot of the interactive quality plots, or the full .qzv file? Based on the size of your amplicon and the quality drop-off of your reads, let's see if we can find some settings that will get your reads to pair!

bkramer · December 19, 2021, 5:47pm

Of course! Thank you for your quick response. Here it is!

This includes reads that have NOT had their primers removed. These reads were amplified using primers for the hepatotoxin microcystin gene mcyE, specifically the HepF/R primers:

Hep F: 5'- TTT GGG GTT AAC TTT TTT GGG CAT AGT C -3'
Hep R: 5'- AAT TCT TGA GGC TGT AAA TCG GGT TT -3'

When I used cutadapt trim-paired in QIIME, the forward primer was added in the 5-3 direction, while the reverse complement of HepR in the 3-5 direction was put in the command (AA ACC CGA TTT ACA GCC TCA AGA ATT).

I THINK I did that right so hopefully that isn't the problem

Anyways, attached is the demultiplexed .qzv file. Thanks again for your help!

Ben

mcyE_demux.qzv (318.7 KB)

colinbrislawn · December 19, 2021, 6:29pm

Hey Ben,

Wow, I've never seen these primers before! How long do you expect the amplicon to be?

Here's a screenshot of the quality scores, for my own reference. Where we trim depends a lot on how long the intended amplicon is...

bkramer · December 20, 2021, 2:50pm

Hi Colin,

The amplicon should be 413 bp in length...based on the number of base pairs for the forward and reverse reads, you would think that the amplicon is already too big, although the reads I usually get from Mr DNA and visualize in QIIME are usually always less than or equal to 301 bp...yet somehow during denoising they're always slightly larger.

Ben

colinbrislawn · December 20, 2021, 6:05pm

I've seen Mr. DNA data on the forums before, and it's been stochastic. I've had a hard time understanding how they process data. What did Mr. DNA tell you about the sequencing method and the analysis their team performed before sending data to you?

If the amplicon is truly 413 bp long, we can do the math on overlap using dada2 denoise-paired and these settings:

--p-trunc-len-f 230
--p-trunc-len-r 180

forward + reverses - amplicon = overlap
230 + 180 - 413 = overlap
-3 = overlap
3 bp gap!

So even with these pretty lax quality settings, the reads can not join a 413 long amplicon.

I'm not sure about our next step here. Perhaps you could try using dada2 denoise-single on just your forward read?

bkramer · December 20, 2021, 7:16pm

Hi Colin,

So there's one paper that stated the amplicon length was ~413bp (Jankowiak et al. 2019) and there was another, the original, paper that said the amplicon using those primers was ~472bp in length. So it's possible that the amplicon length hasn't been very well established yet using these primers.

The protocol used for sequencing, as per the methodology in description in Jankowiak et al. (2019), was as follows:

HotStarTaq Plus Master Mix Kit

94oC for 3 min
28 cycles of 94oC for 30 seconds
53oC for 40 seconds (annealing)
72oC for 1 minute
72oC for 5 minutes (final elongation step)

When I don't truncate/trim at all, I get 13 representative sequences of various lengths ranging from 53 - 443 bp in length (a little too variable in my opinion). I'm mostly trying to figure out where to truncate/trim based on where the quality of the reads is too low. I'm hoping that will give me representative sequences in my samples using amplicons that are comparable in length to previous papers.

colinbrislawn · December 21, 2021, 1:44am

Maybe, or it could be due to how the reads fail to join.

Yikes! The highly variable length makes me really worried about this primer set, and if the reads are joining correctly. If you are expecting this to be a highly conserved region or a low diversity sample, 13 ASVs could be fine. Otherwise it's quite low. DADA2 denoise-paired only works with the reads that join, so this makes me think you are losing most of your data...

I'm going to double-down and suggest dada2 denoise-single again, as that will avoid any issues with amplicon length or joining. By avoiding these issues, we can be more confident in our results and continue our investigation.

bkramer · December 21, 2021, 3:05pm

Hi Colin!

So I wanted to try denoise-paired ONE MORE TIME to see if maybe I could get something to work...I tried setting the truncation cutoff for quality of reads at 35 and 25, but then decided to set it lower to 15.

Turns out that yielded 12 representative sequences of similar size: 11 were 443 bp in length and 1 was 463 bp, so within the range of lengths of base pairs that have come before (413-472bp). So I think I might have solved the problem while still keeping the reverse reads, but I'll see what my advisor has to say about this.

Thanks so much for your help!

Ben