V3-V4 amplicon trimming adapter

arlandan · April 13, 2020, 12:52pm

Continuing the discussion from Different taxa result from DADA2 and Deblur:

Thank you @Setiawan @benjjneb @Nicholas_Bokulich for your post! Can I have a few more questions? Any suggestions are appreciated.

Primer sequences are retained in the V3-V4 amplicons. Is it necessary to trim the primer sequence off? I am assuming not, since they are at 5' end, right?
the stats file generated from dada2 denoising step have percentage of input filter/denoised/merged/non-chimeric values. how do you evaluate these numbers? The higher the better. what is considered as good enough? any minimums? Thank you very much.

Stay well,
Arlandan

Nicholas_Bokulich · April 13, 2020, 3:39pm

Welcome back, @arlandan,

For dada2 it is always necessary to trim the primers, see here for more info.

It is all fairly subjective, based on your expectations and requirements. In general, filtering at the "filter" step is okay as long as you are getting a reasonable number of reads out the other end, but losing reads at the "merge" step is bad news, as this will bias the taxonomic composition of your samples (since shorter amplicons will be selectively excluded). See this post for a little "guide" I put together as a rule of thumb for interpreting read loss with dada2:

I hope that helps!

arlandan · April 14, 2020, 1:09am

Hi @Nicholas_Bokulich,
Thanks a lot for the reply. I trimmed the primers and rerun dada2, while the results were surprisingly worse I know this has been discussed a lot and read through many posts regarding V-V4 amplification, but still have no idea why that happened. Can you take a look please?

I am working on a data generated from 2x250 bp run, where V3-V4 primer was used to prepare the library following the Illumina 16s seq lib prep. As you know the amplicon size for V3-V4 is ~ 460bp = 420nt biological sequence + ~20bp primer sequence. I run twice with the same set of data with and without trimming the primer sequence, but the results were totally different.

Without primer trimming:
Sequences were imported and denoised directly with dada2 using command:

qiime dada2 denoise-paired
--p-trim-left-f 1
--p-trim-left-r 1
--p-trunc-len-f 250
--p-trunc-len-r 250
--i-demultiplexed-seqs dataimported.qza
--o-representative-sequences repseqs_denoised.qza
--o-table table_denoised.qza
--o-denoising-stats stats_denoised.qza

The stats file from this step:

With primer trimming:
341/785 primers (see below for sequences) were used. Data imported and trimmed, then denoised using following commnads:

qiime cutadapt trim-paired
--i-demultiplexed-sequences dataimported.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--p-error-rate 0
--o-trimmed-sequences dataimported_trimP.qza
--verbose

qiime dada2 denoise-paired
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 251
--p-trunc-len-r 250
--i-demultiplexed-seqs dataimported.qza
--o-representative-sequences repseqs_denoised_trimP.qza
--o-table table_denoised_trimP.qza
--o-denoising-stats stats_denoised_trimP.qza

And the stats file is:

Looks like most of the reads have been removed at the filtering step. Not to mention merging and chimera removal...

For your reference, I am also attaching the interactive quality plot.

Additional questions:

Should I replace the W, H, V with N? Does it make difference?
As I have previously asked, primer sequences are on 5' end, trimming should not change the length of overlapping, right?
2x300 bp is better for v3-v4 amplicons, but a 2x250 PE was run instead. Would you think there is enough overlap for doing PE analysis? Or use only the forward reads instead?

Thanks in advance!
Arlandan

Nicholas_Bokulich · April 14, 2020, 4:37pm

This is a mystery, sounds a lot like the topic I linked to above.

In your case, though, I think I see the issue. You are removing primers but then truncating at the same lengths even though your reads are now ~20 nt shorter, so basically you are filtering everything out because they are shorter than the trunc length.

My recommendation:

generate a new demux summary after removing primers
set trunc-len based on those results

OR just use the trim parameters in q2-dada2 to remove primers (trimming will happen after truncation)

no probably not

right

it's getting close but you should still have enough since your read quality looks good. Looks like you are getting good merging without removing primers, so this should work equally well after trimming primers.

Good luck!

arlandan · April 14, 2020, 5:42pm

Thanks a lot! That was indeed the issue. Most of the samples were filtered and merged well, except 1 sample with the percentage of input merged value only 7.81. Should I drop it out or is there anything that I could do to save the sample?

Also, by checking the repseqs_denoised_trimP.qzv, 2% of the sequences are very short ~264 bp, which is only half length of the V3-V4 region. Should they be included in the further analysis?

Screen Shot 2020-04-14 at 12.37.05 PM

Thanks again,
Arlandan

Nicholas_Bokulich · April 14, 2020, 5:55pm

strange. I'd investigate some more — the differences in this sample could be biologically interesting — but ultimately dropping that sample may be the only solution unless if you can retrieve more reads, e.g., by increasing the trunc-len.

hmm... same as above: I'd investigate further (these could be biologically interesting! or contaminants) but ultimately filtering these may be the way to go, since these seem unusually short and my guess is that they are non-bacterial.

arlandan · April 14, 2020, 7:37pm

Thank you @Nicholas_Bokulich for your quick response.
Much appreciated!

arlandan · April 14, 2020, 11:32pm

Hi @Nicholas_Bokulich, I have one more question regarding this comment. I have also read the shared link and understand "The primers aren’t sequences from the sample, they are sequences that were added into the PCR reaction, and the ambiguous nucleotides in the primers interfere with denoising and chimera detection".

In my case, the library prep included the primer sequence on reads, as described by benjjneb's response and this picture:
Screen Shot 2020-04-14 at 6.15.10 PM

The region of interest-specific primers (in the picture) is 341/785 primers used to amplify the v3-v4 region, and it is actually part of the biological sequence from the sample. The overhang adapters attached to the 341/785 have been removed as I can tell from the original fastq file.

There is a big difference between whether or not to remove this 341/785 primers from the reads. 50-70% of reads were discarded in dada2 denoising step if the primers-removed reads were used. While only 10-20% removed if untrimmed reads were used. I am just wondering, whatever the primers are, they are at 5' end and should not have much impact... Can you please suggest anything? Thanks!

Best,
Arlandan

Nicholas_Bokulich · April 15, 2020, 2:21am

That's precisely why the dada2 developer recommends that primers must be trimmed. The impact you are seeing appears to be in the chimera filtering step, and this is caused, e.g., by ambiguous bases in the primers.

system · May 16, 2020, 8:21am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.