Good Morning,
I am brand new to qiime 2 and have spent a lot of time on the forum reading over posts to get familiar with problems I may run into. I have successfully imported by data and I am currently working on denoising using dada2. I received end-paired demultiplexed data which by the looks of it, barcodes and primers have already been removed.
(I hope the image uploads, if not i apologize. I was having some issues)
From reading a lot of different posts on the forum, it seems many people had the same issue I am having which is determining proper trim/trunc parameters for denosing. From what I understand p-trunc-len-f and p-trunc-len-r are based on the where the quality score begins to drop off. For mine, I am thinking --trunc-len-f 240 and --trunc-len-f 220? However, from what I've read I have to make sure the trunc paramters leave enough overlap for joining pairs which, depsite reading several posts, I can not understand how to determine that there will be sufficient overlap (~20 nt?). Since barcodes and primers have been removed would my p-trim values be zero?
thank you in advance for any help. I went through many tutorials before getting my data to get familiar with qiime2 but doing it on my own now it feels all brand new. I hope my quality plots upload properly.
Hi @ycastro715,
Welcome to the forum!
Thanks for reading previous discussions before posting and providing details about your case, very helpful.
So, the rough calculations for this are also discussed in other threads which you may or may not have been seen, but they go something like this.
Your expected V3-V4 amplicon size: 806-341= ~465
Your total sequenced length: 2 x 250 = 500
Dada2 minimum overlap required = 12 nt (this has been changed from 20 nt that was in previous versions)
So, your total overlap is ~ 500-465=35. You need a minimum of 12nt for proper merging, so that means 35-12=23 bp can be cut (combined) from your Forward and Reverse without compromising merging. BUT consider that there are natural variations in amplicon length at V3V4, so some taxa are naturally shorter and more important some are longer in that region. This means that if you trim all those available 23 bp then you may actually systematically remove naturally longer taxa because they will fail to merge. This can introduce bias to your downstream analysis. In my experience natural variation in V3V4 in human and mouse fecal samples can range at least 20± from the expected 465. That means that you basically can’t afford to truncate any bp from your 3’, lest you potentially introduce some bias. This is the downside of sequencing V3V4, in that unless you do a 2x300 run, you can’t really afford to truncate any reads. You can try running DADA2-paired-end here with minimum/no truncating and see how the output is. If its too low, as I expect it to be, I would suggest only running the forward reads and discarding the reverse.
Yes, no need for trimming from the 5’ side. However, your reverse reads do have an initial dip in quality within the first 20 bps which may cause you to lose some reads in the initial filtering step. Trimming ~20 of that might be useful and because this is on the 5’, it won’t affect merging.
Hope this helps!