Primer sequences are retained in the V3-V4 amplicons. Is it necessary to trim the primer sequence off? I am assuming not, since they are at 5' end, right?
the stats file generated from dada2 denoising step have percentage of input filter/denoised/merged/non-chimeric values. how do you evaluate these numbers? The higher the better. what is considered as good enough? any minimums? Thank you very much.
For dada2 it is always necessary to trim the primers, see here for more info.
It is all fairly subjective, based on your expectations and requirements. In general, filtering at the "filter" step is okay as long as you are getting a reasonable number of reads out the other end, but losing reads at the "merge" step is bad news, as this will bias the taxonomic composition of your samples (since shorter amplicons will be selectively excluded). See this post for a little "guide" I put together as a rule of thumb for interpreting read loss with dada2:
Hi @Nicholas_Bokulich,
Thanks a lot for the reply. I trimmed the primers and rerun dada2, while the results were surprisingly worse I know this has been discussed a lot and read through many posts regarding V-V4 amplification, but still have no idea why that happened. Can you take a look please?
I am working on a data generated from 2x250 bp run, where V3-V4 primer was used to prepare the library following the Illumina 16s seq lib prep. As you know the amplicon size for V3-V4 is ~ 460bp = 420nt biological sequence + ~20bp primer sequence. I run twice with the same set of data with and without trimming the primer sequence, but the results were totally different.
Without primer trimming:
Sequences were imported and denoised directly with dada2 using command:
Should I replace the W, H, V with N? Does it make difference?
As I have previously asked, primer sequences are on 5' end, trimming should not change the length of overlapping, right?
2x300 bp is better for v3-v4 amplicons, but a 2x250 PE was run instead. Would you think there is enough overlap for doing PE analysis? Or use only the forward reads instead?
This is a mystery, sounds a lot like the topic I linked to above.
In your case, though, I think I see the issue. You are removing primers but then truncating at the same lengths even though your reads are now ~20 nt shorter, so basically you are filtering everything out because they are shorter than the trunc length.
My recommendation:
generate a new demux summary after removing primers
set trunc-len based on those results
OR just use the trim parameters in q2-dada2 to remove primers (trimming will happen after truncation)
no probably not
right
it's getting close but you should still have enough since your read quality looks good. Looks like you are getting good merging without removing primers, so this should work equally well after trimming primers.
Thanks a lot! That was indeed the issue. Most of the samples were filtered and merged well, except 1 sample with the percentage of input merged value only 7.81. Should I drop it out or is there anything that I could do to save the sample?
Also, by checking the repseqs_denoised_trimP.qzv, 2% of the sequences are very short ~264 bp, which is only half length of the V3-V4 region. Should they be included in the further analysis?
strange. I'd investigate some more — the differences in this sample could be biologically interesting — but ultimately dropping that sample may be the only solution unless if you can retrieve more reads, e.g., by increasing the trunc-len.
hmm... same as above: I'd investigate further (these could be biologically interesting! or contaminants) but ultimately filtering these may be the way to go, since these seem unusually short and my guess is that they are non-bacterial.
Hi @Nicholas_Bokulich, I have one more question regarding this comment. I have also read the shared link and understand "The primers aren’t sequences from the sample, they are sequences that were added into the PCR reaction, and the ambiguous nucleotides in the primers interfere with denoising and chimera detection".
In my case, the library prep included the primer sequence on reads, as described by benjjneb's response and this picture:
The region of interest-specific primers (in the picture) is 341/785 primers used to amplify the v3-v4 region, and it is actually part of the biological sequence from the sample. The overhang adapters attached to the 341/785 have been removed as I can tell from the original fastq file.
There is a big difference between whether or not to remove this 341/785 primers from the reads. 50-70% of reads were discarded in dada2 denoising step if the primers-removed reads were used. While only 10-20% removed if untrimmed reads were used. I am just wondering, whatever the primers are, they are at 5' end and should not have much impact... Can you please suggest anything? Thanks!
That's precisely why the dada2 developer recommends that primers must be trimmed. The impact you are seeing appears to be in the chimera filtering step, and this is caused, e.g., by ambiguous bases in the primers.