From this output: trimmed-seq.qzv (323.2 KB), I've analyzed the interactive plot and decided to truncate only my reverse reads, due to a high quality score from my forward reads. I've tried truncating in 195bp considering a threshold Phred-score of 15:
As you can see I have odd results, my filter is varying a lot (from 0.01 to 69%), there are a low percentage of merge, low number of reads and ASVs. I have never seen these variability...
Welcome Back to the forum!
Thank you for including so many details in your post.
So there are two thing:
it looks like the quality of your sequences may require you to trim more. I would less around with the forward read trunc length:
--p-trunc-len-f 0
This is a balance of truncating so you have good quality but also having enough sequence length to merge your forward and reverse reads.
There is some quick math that I typically use as a guide to see if my sequences will merge. This is definitely not full proof because there are variations of sequence length in the 16s region
merged sequences length = ([Reverse Primer BP Location]-[Foward Primer BP Location]+Necessary overlap)
So for the v3-v4 region those primer locations are typically 341f and 785r. Please check that these are your primer locations to confirm that they are the same. Dada2 requires 12 nt of overlap. So your equation so look something like 785-341 = 444 + 12 = 456. So your Forward and Reserve read when added together should be around 456 nt for a successful merge.
So my advice would be to play around with your truncation values and see if your get better filtering and merge percentages. If this doesn't work, let us know we would be happy to continue debugging this with you!
I tried again according with your insights, I changed my commands to p-trunc-len-f 240 and p-trunc-len-r 220 with the aim to increase the lenght of my sequences to 460 nt.
I'm thinking that a lenght higher than 195nt for the R read it's including some parts that have a Phred-Score < 15... This could be having an impact on filtering and merging?
I don't think that my sequences are too bad for these low percentages, therefore before considering the use of only the forward reads, are there more tests to try?
Hi @joaomiranda,
You are right these do not look better.
I think this is the one of the reasons, the Reverse read has some pretty low quality scores that are getting filtered out. We may struggle to get enough sequence length on this reverse read to merge sucessfully.
I would continue playing around with the trim and truncation scores. Like have you tried -f 240, -r 195? That would be alittle shorter then my math for sequence merging above but amplicon regions have variation in length so you may have decent luck merging if you can get it passed filtering.
I would contrinue to try to mess around with the trunc values but single end is always a decent back up.
Hope this helps!