Sequence Quality Control Help

Hi All,

My data is paired-end reads and 16S. I am unsure what to write for p-trunc-len-f/r or --p-trim-left-f/r in dada2 method. Any suggestions on this would be greatly appreciated! Here is the output of my qiime demux summarize.

Thanks,
Collin

Hey @cgregg1227!
Making these decisions is covered lightly in the moving pictures tutorial, and has been discussed extensively on this forum (try searching :mag: DADA2 trim or DADA2 parameters).

Basically, you’re trying to keep as many sequences of your whole amplicon as possible. DADA2 drops sequences if their quality is too low, and drops sequences if the forward and reverse reads together aren’t long enough to overlap and cover the whole amplicon.

So, trim/trunc the low-quality data, ensuring that
len(forward) + len(reverse) - 12 (the minimum overlap) >= expected amplicon length.
Some extra overlap is generally a good thing - actual sequence lengths often vary for a given amplicon.

Best,
Chris :ox:

2 Likes

Hey @ChrisKeefe,

Thank you for the help! This is what I was planning to use for trimming/truncating my sequences:
qiime dada2 denoise-paired
–i-demultiplexed-seqs ./mnt/nfs/labs/howell/qiime2_practice/paired-end-demux.qza
–p-trunc-len-f 280
–p-trunc-len-r 280
–p-trim-left-f 11
–p-trim-left-r 11
–o-table ./table-dada2.qza
–o-representative-sequences ./rep-seqs.qza
–o-denoising-stats ./dada2_stats.qza

Does this look good based on the visual I provided above?

Thanks,
Collin

That depends, @cgregg1227. What length is your target amplicon? If it’s not too long (see the formula above), I might truncate more aggressively. You have good-quality data, so you might be able to get away choosing truncation lengths where the median quality scores stay at/above 30.

some notes:

  • You’re not required to pass matching arguments to the -f and -r parameters
  • Unless you are truncating at 11 because, for example, you are trying to remove an 11-bp primer you know is there, many people prefer not to truncate unless necessary for quality/read-joining reasons. (This makes meta-analysis easier by keeping the start and end of your reads at commonly-used primer locations rather than cutting to an arbitrary point in the sequence)
  • dada2 produces a denoising-stats qzv, which can be really useful in determining how well you chose parameters. If you lost a lot of sequences in the filtering step, you might need to trim more aggressively for quality. If you lose a lot in merging, for example, you probably cut off too many NT for proper joining.

DADA2’s run time is much shorter now than it used to be (~1 hour for most of my sequence runs), so it might be worth trying a few things and seeing what works best.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.