Hi I have just trimmed primers and joined my paired end reads and created a summary plot joined.qzv (314.6 KB).
I am using qiime2 v2020.11 and have looked at some online tutorials which say that the red boxes should give me an indication of how to truncate my sequences during the deblur step. I cannot see any red boxes in my plots, does this mean that I do not need to trim my sequences any further?
I am not aware of any tutorials that recommend truncating at the red boxes - would you mind sharing that tutorial with us? The viz you linked to will sometimes have “red boxes” in it to indicate that not all of your sequences are the same length - if any sequences are shorter than the others, the histograms after that nt position will be colored red to let you know, “hey, just so you know the sample size of observations at this bp position isn’t the same as the other histograms.”
Thank you for your reply, the tutorial that I was refering to was the ’ Alternative methods of read joining in qiime 2’. Although I may have interpreted it slightly wrong. I also read in a paper (Estaki et al. 2020) that I should truncate the sequences at the point where the median quality score drops below 30. I checked on my plot and it looks like the median quality is above 30 consistently. Does this mean that I should not truncate my sequences at all or is it best to a little bit? I have already done the deblur quality filter check with the default values and lost no reads from that step.
Thanks for posting your .qzv file! That's very helpful.
If I was working with your data set, I would trim off the low quality region after 360 bp or so (at the start of the reverse R2 read), and continue onto denoising. But that's just me.
In fairness, Estaki et al. 2020 is very flexible about how they filter data:
I think your joined data looks pretty good, even with the drop in quality near the end. A Q-score of 20 is considered low, but here 'low' means 99% accurate. If you choose to trim off the end, then the quality is very good throughout!
I think these are both good options. The choice is up to you!
It is very hard to make a general recommendation that works on every dataset and so in that paper we did our best to make a recommendation that should be ok in most situations. Don’t feel bound by that recommendation and your best bet is to fit what works best for your data.
However, I wanted to add a further recommendation. Even though Deblur can operate on pre-merged reads, it probably is not your best option here. This is because as read-length increase, the number of sequences you retain with Deblur drops dramatically, see here for an example of the numbers you might see with 300 bp, and then consider your case that has much longer reads. Deblur was really designed to work with with shorter, single end reads, it performs really well in those cases. However, if you are working with paired-end reads, I would recommend going with DADA2, which does not penalize read lengths to the same extend. So you would end up with loads more reads. There are lots of posts on this forum about selecting truncating parameters with DADA2 and merging paired-end reads in mind.
Thank you @Mehrbod_Estaki I will try DADA2 as well and compare the results.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.