i try to train a V3V4-spesific classifier for use with deblur, and would like to truncate all reference sequences to have fixed length 400 bp before training. (same length as my reads after deblur-denoise). i use qiime2-2025.4.
But it seems that --p-trunc_len parameter in extract-reads does not discard shorter reads. i thought truncate (as also use in dada2 denoise-paired and deblur denoise-16S) should make all reads have the same length? (by discarding the shorter sequences and trimming bases off the longer). the following command results in sequences being in the range 64-400:
This is a totally reasonable assumption, but it's not the case here.
Zooming out a little bit, most qiime2 plugins expose parameters from the underlying program they are calling, and different programs work different ways.
DADA2 denoise-paired was designed for reads that achieve full coverage of their amplicons, so it makes sense to discard reads that don't cover the full amplicon length, especially when we know they should.
In contrast, extract-reads pull these regions from a larger database, so having to choice to keep or omit shorter reads can be helpful, thus the option to do it both ways.
Fair! Do you think we should change this or update the examples to clarify?
thanks, just a little confusing when "truncate" means different things in different qiime functions, but now i understand why. Then i think it is enough to state what the parameter does (whether it also discards shorter sequences or not) in the extract-seqs help, rescript tutorial and classifier-training tutorial). And also update deblur-denoise-16S help text specifying that the parameter --p-trim-length that also discards shorter seqs.