Filtering output sequences below a certain length


I’m filtering merged sequences, trying to set a specific cutoff for the minimum length of the sequence length that I get as an output but I’m not getting the results I’m looking for. This is what I’m trying:

qiime quality-filter q-score \
–i-demux SampleName.qza
–p-min-length-fraction 0.97
–o-filtered-sequences SampleNameOutPut.qza
–o-filtered-stats SampleName_stats.qza

My desired seq length is a minimum of 400, I’ve played with different “–p-min-length-fraction” parameters (0, 0.9, 0.97) but I don’t see a clear change. The mean length of my seq is 445.02 with a standard deviation of 26.82.

Thank you!

Hi @rosave ,

It looks like you are using the wrong method. qiime quality-filter q-score is specifically for filtering sequences based on their quality scores (in fastq format), not on length alone.

If your sequences are demultiplexed but not yet denoised, then the denoising methods (deblur or dada2) have options to set a trimming length. This will discard sequences that are shorter than that length, so will be one way to discard these, but only if your sequences are in a fastq format (pre-denoising). If your sequences fit those criteria, you can see the help documentation for those methods to learn more.

Another option is to use RESCRIPt (a plugin that must be installed separately) to filter sequences explicitly based on length. This method takes FeatureData[Sequence] artifacts as input (i.e., fasta format sequences after denoising). You can read more about that method here:

Good luck!

1 Like