Quality score and trimming length in DADA2

Yoha · May 31, 2021, 6:48pm

I am a first-time QIIME user, and I am working with an Illumina MiSeq (2 x 300 kit) paired-end sequence. It is worth noting that both reads (R1 and R2) are presented in forward order. In addition, I used a manifest file to import the demultiplexed fastq files:-
qiime tools import
–type ‘SampleData[PairedEndSequencesWithQuality]’
–input-path manifest file
–output-path demux_seqs.qza
–source-format PairedEndFastqManifestPhred33
The sequence quality was then visualized using a qzv file (below I have attached the quality score plot of forward and reverse reads). For the denoising step, I intend to use DADA2. However, as compared to forward reads, the reverse reads have a very poor quality score. Could you perhaps suggest some trimming points for the forward and backward reading, taking the compatibility of the two reads after trimming into account?

colinbrislawn · June 1, 2021, 3:29pm

Hello @Yoha,

Welcome to the forums!

Thank you for posting your read quality plots. MiSeq runs often lose quality near the end of R2, so this not unexpected.

What region did you amplify and how long is this region?
(This will let us estimate how much overlap is expected so we don't trim off so many base pairs that the reads cannot be joined.)

I might start with

--p-trim-left-f 0 --p-trim-left-r 6

but I would like to know more about your amplicon before suggesting --p-trunc-len-* values.

Yoha · June 1, 2021, 4:17pm

Thanks so much for your reply. We have sequenced the V4 hypervariable region of the bacterial 16S rRNA coding gene in paired end mode (2 × 300 bp) on the MiSeq platform (Illumina) using primers 515F and 806R (source of primer is from the article "Ultra-high-throughput microbial 559 community analysis on the Illumina HiSeq and MiSeq platforms"). Thank you for your suggestion.

colinbrislawn · June 3, 2021, 2:45am

Hello Yoha,

I have also done this!

So if the forward primer starts the amplicon at 515 and the reverse primer ends the amplicon at 806
806 - 515 = 291
we would expect the resulting amplicon to be around 291 base pairs long.

Is this what you expect too?

Your sequencing kit includes 300 base pairs starting from both ends of your amplicon.
So when we compare your sequencing length to your amplicon length
(300 + 300) - 291 = 600 - 291 = 309
we would expect 309 base pairs of overlap.

Wait...

That's the full length of your 16S V4 amplicon! I think your two reads fully cover your amplicon in both the forward and reverse direction.

(Did I understand this correctly of am I missing something?)

If you have full coverage of your amplicon in both directions, you have 2x coverage and can trim off almost half your data and still get it to join just fine.

I might try adding these trunc flags to the trim settings I mentioned above:
--p-trunc-len-f 220 --p-trunc-len-r 110

This means that your trimmed data would be
(220-0) + (110-6) = 324
about 324 bp long.

When you compare this to your expected amplicon length
324 - 291 = 33
the 33 base pairs of overlap should be enough for DADA2 to join just fine!

Colin

P.S. There is no 'right way' to do this, so try some settings and see what you find!

system · July 4, 2021, 8:46am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.