One of my colleagues has made me aware of the possibility of using trimmomatic before running dada2, trimming away sequences when you meet 4 bases with a phred score below a predefined level. The upside of this might be that you’re able to skip the subjective evaluation of the quality plots, generating more reproducible results, and frankly I think it’s quite difficult to decide how to truncate the runs.
However, I gather that the developers and other bright minds already have considered this possibility and I would really appreciate some input on this.
(Matthew Ryan Dillon)
The subjectivity of truncating parameters is certainly not ideal, and from experience I know in some cases it can even affect the experiment outcome (I’m working on a demo on this fact at the moment actually), but I’m not sure how with this other approach you mention, reproducibility is affected since there are still user-defined choices required i.e. # of consecutive low q bases or the phred score cutoff. As far as I am aware there isn’t a thorough comparison of different quality control approaches or parameters with this in mind, and in fact it would be near impossible to generalize these across board given that the needs of each project and dataset are different.
A couple of additional comments: Inside of qiime2 you can already do what you proposed using the q-quality-filter q-score action.
DADA2 has several internal quality controls build within it as well which take into account quality scores, I recommend having a look through the docs and the DADA2 official documentation with regards to understanding these parameters. But perhaps more importantly, DADA2 works by creating an error model based on your quality scores so doing additional filtering and quality scores prior to DADA2 might actually interfere with building an appropriate error model. The DADA2 developer generally recommends to input raw reads without prior filtering.
I know this didn’t exactly answer your question but perhaps that’s because there is no real answer here
Some further comments, though:
Setting a predefined treshold for a phred score in a # of consecutive bases is more reproducible, because the criterion is objective. If you gave your code to a colleague, she would be able to replicate the applied criterion instead of trying to figure out exactly how you found that in RUN3 you truncated the reverse read at position 245, whereas in the two previous runs you truncated at position 223.
But perhaps more importantly, DADA2 works by creating an error model based on your quality scores so doing additional filtering and quality scores prior to DADA2 might actually interfere with building an appropriate error model.
I somehow struggle with understanding how the truncation by visually screening of the quality plots does not lead to this issue as well? You still remove some of the data that the error models are built on, don’t you?
Note quite though since you are still having to make a subjective choice about the threshold of the phred score and the # of consecutive bases you use in your code. And your choice might be affected by the purpose of your project. For example, you may be more lenient with your parameters if the end goal is diagnostics or you may be more stringent if the goal is community characterization where higher depths is preferred. The same principles apply with choosing your truncating/trimming parameters.
The need for passing on this information to your colleagues for replication in both cases is the same. Luckily in qiime2 the provenance tab keeps track of all this for you and so passing your feature-table.qza will hold all that information for them (including trunc/trim values) so they can reproduce your parameters to a key.
As a side-tangent, I know you were just making a point but with your specific example about combining different runs, with DADA2 if the end-goal is to merge samples from separate runs, those trunc/trimming parameters actually need to be identical across runs in order for the ASVs to be comparable.
Yes and no. Yes in that you are removing some ‘portions’ of the data. The key here is that the error-model makes an assumption about the probability of a base-call being true based on the quality scores of that position from the whole run (or at least a training subset) and not just the specific q score of that particular basecall. For example if the overall average q score at position 120 is 5 the confidence for us to infer that a G call is a true G is not great. I think the concern is that now if you were to remove all the reads in your run that had poor q scores at that spot you would be artificially inflating the average q-score at that position leading to the false assumption that we are now fairly confident about our base-call. With trimming/truncating you are simply removing portions of the reads globally and not changing the overall q scores at a particular position. At least this is how I’ve come to understand the process/concern, hopefully someone will correct me if I am way off target.
I agree that it’s not objective strictly speaking, but the cut-off is easy to reproduce in other projects, whereas when you truncate based on quality plots, it is harder to reproduce the process of how the actual values were chosen.
Nevertheless. Are you sure about needing to choose the same truncating values for all runs in a project (i.e. if I have 15 MiSeq Runs in one project, do I choose the same truncating values for all runs)? Could @benjjneb perhaps comment on this?
Absolutely yes if the reads are single-end. The reason is that a single sequence trimmed at 3 different places will be considered 3 distinct ASVs — so even slight differences in trimming between single-end runs will cause ALL ASVs in those runs to be unique to that run.
Even if they are paired, I would keep things as standardized as possible since truncating at different read lengths could impact read join quality and hence bias results from different runs — so you could trim at different sites if needed due to per-run quality, but I would carefully inspect to make sure bias isn’t creeping in. Using dynamic q-score trimming (e.g., with qiime quality-filter q-score or the trunc-q parameter for dada2) is a similar story for paired-end — I think that would still be okay as long as it does not cause read joining to fail on a particular run.
I am jumping into the discussion here - working with Rune, not with the same samples, but with Illumina data being analyzed similarly through DADA2.
Concerning the discussion on trimming: Estaki´s information on trimming is a bit opposing to Callahan´s feedback here: https://github.com/benjjneb/dada2/issues/175 - taking into consideration that we do have paired-end seq.
For our sequences --p-trim-left-f --p-trim-left-r are constant for all runs, it removes the primer regions. Callahan emphasize this as being important as a non-consistent trim-left option will give sequences of different lengths after merging sequences.
Meanwhile the --p-trunc-len-f --p-trunc-len-r will be adjusted after inspection of the quality plots, and therefore vary somewhat between runs. Again with the answer given by Callahan this has not been a concern to me so far! And I still do not understand why varying truncating length in order to keep Q-scores above a chosen limit is more troubling than varying Q-score levels which would be the consequence of setting a strict truncating length for all runs…? This being said - we are fortunate enough not to encounter the issue of having to truncate so early that overlap is a concern.
To summarize my understanding of things just to see if you need to correct me:
Pre-dada2 quality filtering as with trimmomatic can be troublesome because you interfere with the error model.
With paired-end sequences make sure trim-left option is kept consistent between all your runs such that sequences are of the same length.
Truncating length needs not be the exact same across runs, but the more similar the better - and do not ignore that you need a minimum overlapping!
Ben’s comments are pretty consistent with my advice above (for paired-end data), with the exception that I prefer minimizing the differences in parameters used for each sequencing run — if possible! — to avoid introducing bias. My concerns are unsubstantiated and we do not have benchmarks to inform best practices one way or another, but it is based on speculation particularly regarding issues with merging (which can cause clear issues, e.g., if trimming is too short. This can cause obvious run bias, especially if amplicons are not all of equal length as is common). So yes, I agree, parameters may need to be adjusted between runs, but:
It is unnecessary because you can use the trunc-q parameter to more or less replicate what trimmomatic does (and qiime quality-filter q-score performs more or less the same operation if you want to trim in the same way as trimmomatic, where you see a run of multiple low-quality bases).
Yes, absolutely keep this consistent (unless if your runs are not consistent, e.g., some have primers some do not). All sequences should be trimmed to the same exact site on the 5’ ends.
My apologies if I added further confusion to the topic, I probably should have expanded more on that comment I made, I just didn’t want to derail the topic too much. But to clarify, yes, exactly as Ben’s comment says and @Nicholas_Bokulich clarified. The trim AND truncate parameters need to be the same if comparing single-end reads, but with paired-end reads the truncate doesn’t need to be the same though @Nicholas_Bokulich’s points about minimizing differences even in that region is interesting, would be neat if someone further benchmarks that notion. In short, I just meant that in order to compare ASVs they need to be of the exact same positions, even a single bp off makes them non-comparable. Unless of course later on you collapse everything down to species or genus levels.