--p-trunc-len of extract-reads not discarding shorter sequences

i try to train a V3V4-spesific classifier for use with deblur, and would like to truncate all reference sequences to have fixed length 400 bp before training. (same length as my reads after deblur-denoise). i use qiime2-2025.4.

But it seems that --p-trunc_len parameter in extract-reads does not discard shorter reads. i thought truncate (as also use in dada2 denoise-paired and deblur denoise-16S) should make all reads have the same length? (by discarding the shorter sequences and trimming bases off the longer). the following command results in sequences being in the range 64-400:

qiime feature-classifier extract-reads
--i-sequences silva-138.2-ssu-nr99-seqs-cleaned-filt-derep_uniq.qza
--p-f-primer ACTCCTACGGGAGGCAGCAG
--p-r-primer GGACTACHVGGGTWTCTAAT
--p-read-orientation 'forward'
--p-trunc-len 400
--o-reads silva-138.2-ssu-nr99-seqs-cleaned-filt-derep_uniq-319f_806r_forward_400bp.qza

if adding also --p-min-length 400, then the short sequences are also removed, but this should not be neccessary imo.

best,
Kristian

Hello Kristian,

This is a totally reasonable assumption, but it's not the case here.

Zooming out a little bit, most qiime2 plugins expose parameters from the underlying program they are calling, and different programs work different ways.

DADA2 denoise-paired was designed for reads that achieve full coverage of their amplicons, so it makes sense to discard reads that don't cover the full amplicon length, especially when we know they should.

In contrast, extract-reads pull these regions from a larger database, so having to choice to keep or omit shorter reads can be helpful, thus the option to do it both ways.

Fair! Do you think we should change this or update the examples to clarify?

1 Like

thanks, just a little confusing when "truncate" means different things in different qiime functions, but now i understand why. Then i think it is enough to state what the parameter does (whether it also discards shorter sequences or not) in the extract-seqs help, rescript tutorial and classifier-training tutorial). And also update deblur-denoise-16S help text specifying that the parameter --p-trim-length that also discards shorter seqs.

best,
Kristian

okay!

Here's the real question: Are you willing and able to help make this change to save future users from this confusion and frustration?

Here's what I've found so far:

I'm keenly aware that Misunderstood parameter ... impacts the correctness of bioinformatics workflows, so I think this is worth fixing.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.