After chatting with a collaborator, I am now questioning a little bit my interpretation of the terms trimming and truncation. What's the difference between them to you?
My interpretation is that trimming is used when we are removing a fixed number of bases from the reads ends, while truncation is used when we are removing a variable number of bases from the reads ends in order to get an homogeneous set of reads as a result.
For instance, consider the following set of reads:
AAAAAA
AAAAAAAAA
AAAAAAA
i) After trimming 2 last bases
AAAA
AAAAAAA
AAAAA
ii) After truncation at 5th position
AAAAA
AAAAA
AAAAA
That is, when we trim the action is the same for all reads (remove 2); when we truncate the result is the same for all reads (same length). Of course we can get the result observed in (ii) by trimming a variable number of bases per read, so that's maybe another way to think about it: truncation is trimming a variable number of bases per read in order to make the result homogeneous. Is this correct? Am I missing something obvious?
I would also like to ask: does this nomenclature changes somehow when we go from reads to ASVs? Is the way we call it influenced by how "raw" the data is?
After chatting with a collaborator, I am now questioning a little bit my interpretation of the terms trimming and truncation. What's the difference between them to you?
My interpretation is that trimming is used when we are removing a fixed number of bases from the reads ends, while truncation is used when we are removing a variable number of bases from the reads ends in order to get an homogeneous set of reads as a result.
For instance, consider the following set of reads:
AAAAAA
AAAAAAAAA
AAAAAAA
i) After trimming 2 last bases
AAAA
AAAAAAA
AAAAA
ii) After truncation at 5th position
AAAAA
AAAAA
AAAAA
That is, when we trim the action is the same for all reads (remove 2); when we truncate the result is the same for all reads (same length). Of course we can get the result observed in (ii) by trimming a variable number of bases per read, so that's maybe another way to think about it: truncation is trimming a variable number of bases per read in order to make the result homogeneous. Is this correct? Am I missing something obvious?
I would also like to ask: does this nomenclature changes somehow when we go from reads to ASVs? Is the way we call it influenced by how "raw" the data is?
Thanks for reaching out, these are great questions! I'll discuss below the differences between trimming and truncating (as they are utilized within the QIIME 2 core framework).
Trimming: The number of base pairs that will be removed from either the 3' or the 5' end. In QIIME 2, you'll most commonly see trim-left (which will trim off the requested number of base pairs from the 5' end), but in qiime feature-classifier extract-reads, there is also a trim-right parameter, which refers to the 3' end.
Truncating: The position at which reads will be truncated (counting from the 5' end). Reads that are shorter than the value provided will be discarded. The command you'll see this action represented by within QIIME 2 is trunc-len. Something to note is that in the denoising pipelines within the QIIME 2 core framework, truncation occurs before trimming.
Here is an example that may help further clarify - take the following words as sequence examples: QIIME2 TRIMMING TRUNCATION
I will first truncate the above sequences with this command: trunc-len 7
This is what I will be left with:
TRIMMIN TRUNCAT
Note that QIIME2 was dropped because it is shorter than the truncation of 7 that was specified.
I will now trim the remaining sequences with this command: trim-left 2
This helps a lot! It is nice that you mentioned that not always the practical meaning of this terms will be the same, so I guess it really depends on the tools we are using. I'll mark your reply as the solution because this answers whether I should say trimming or truncating in a presentation I've been working on.
I wanted to reach back out because it was brought to my attention that my understanding of trimming vs. truncation wasn't quite correct - I've gone back and modified my original response from above to more accurately reflect how trimming and truncating are used within the QIIME 2 core framework. Apologies for the mix-up on my end!
Thank you for the clarification. I guess now your answer is much more in line with my previous notion, but much better explained and the examples really help! We both probably went back and forth into what these terms mean within the QIIME 2 core framework throughout this topic, so definitely evidence that their meanings are not trivial. Hence, I assume this topic will help a lot of other people in the :qiime2: community with similar doubts.