Thank you! Sorry I put the wrong primers, it was 515F and 926R.
I got demultiblexed raw data but I couldn’t remove the primers (somehow the tool they provided didn’t work) so I trimmed off 20 nts on the left side of R1 and R2. I set trunc len = 220, which should give me about 30nts of overlap, but that may be too short. I’ll try longer trunc length.
I think your forward reads look great @Hui_Yang! Based on the demux paired-end output, you should be able to go out to further for each read.
Based on the co-ordinates of the demux qzv file, I’d try the following truncation point values (or combinations thereof) :
FW: 268 | 283
REV: 211 | 243
I’d try the 268 - 211 pair first.
Why not try running q2-cutadapt on your demuxed data prior to dada2 / deblur? You can search the forum for many examples of running cutadapt. Then re-visualize your cutadapt output (quality plots) the same way you did your demuxed data. This will help you determine the appropriate truncation points after the primers have been removed, i.e. they’ll likely be 20-30 bp shorter. At this point there is likely no need to use the trim options.
Oh I thought trunc len was the length after the trimming. Good to know!
I tested a few params and here’s what I got:
Trim: f = r = 10, Trunk len: f = r = 220, 294 features
Trim: f = r = 10, Trunk len: f = r = 240, 254 features
Trim: f = r = 20, Trunk len: f = r = 260, 255 features
Trim: f = r = 20, Trunk len: f = r = 290, 207 features
Trim: f = r = 0, Trunk len: f = r = 280, 205 features
Trim: f = r = 20, Trunk len: f = 240 r = 220, 272 features
Seems trimming didn’t make too much difference, and trunc length shouldn’t be too long or too short. Is it ok to proceed with the setting that gives the most features?
to your command. The first two allow matches to IUPAC ambiguity codes (e.g. N, M, R…) while the last discards any pairs in which both primers are not found. This is why there are two drop-offs at the end of the quality plots, some are not being trimmed.
Looking at the provenance, you forgot to adjust the truncation length of my initial suggestion of fw:268 and rev:211 by subtracting the length of the primer. So, your new truncation lengths, after running cutadapt, should be something like:
The trim length setting only applies when running DADA2 / deblur. Remember you ran cutadapt to remove the primers as a separate prior step. So the sequence are already shorter prior to running DADA2 / deblur.
Gotcha. Thanks for clarifying:) I think that just brought me back to my initial questions:
Why didn't my reads merge despite the overlap? When I use vsearch, it simply stitch the two reads together instead of merging. I want to make sure it is not something wrong with my sequences.
Because of that, now I wonder if DADA2 truly joined my reads or did the same thing, and is that why I lost about 40% of my features when denoising paired end reads?
If it helps, here's a brief recap of what I did so far:
Attempt to denoise with DADA2, params and feature counts:
Trim: f = r = 20, Trunc len: f = r = 240 -- 396 features
Trim: f = r = 10, Trunk len: f = r = 220, -- 294 features
Trim: f = r = 10, Trunk len: f = r = 240, -- 254 features
Trim: f = r = 20, Trunk len: f = r = 180, -- 48 features
Trim: f = r = 20, Trunc len: f = r = 260, -- 255 features
Trim: f = r = 20, Trunc len: f = r = 290, -- 207 features
Trim: f = r = 20, Trunc len: f = 220 r = 200, -- 42 features
Trim: f = r = 20, Trunc len: f = 240 r = 220, -- 274 features
Trim: f = r = 30, Trunk len: f = r = 150, -- 44 features
Trim: f = r = 0, Trunc len: f = r = 280, -- 205 features
Trim: f = r = 0, Trunc len: f = r = 295, -- 212 features
Trimed pair end (trim = 0):
Trunc len: f = 249 r = 191, -- 257 features
Trunc len: f = 264 r = 223, -- 265 features
Let me know if you want me to upload any table or status files.
@Hui_Yang, can you provide explicit sequence examples where you think the reads are simply being stitched together end-to-end? This should not be the case, especially as there must be a minimum overlap of 10 bases for a successful merge in vsearch (default), unless you are changing this value. While DADA2 requires a 12 base overlap, (currently cannot be altered via the dada2 plugin at the command-line interface).
What would be most helpful is the output from the --o-denoising-stats of DADA2 or --o-stats from deblur. These outputs will inform you of where most of your data is being lost. That is, mergning, denoising etc…