How to truncate sequences without denoising


I am trying to cluster my sequences into OTUs. For demultiplexing I used an another pipeline (i.e., Stacks) due to the nature of my data. Then I quality fitered and trimmed primer sequences and Illumina dapters with Trimmomatic. So now I uploaded the data into Qiime and I would like to produce a feature table of my data. I read in some Qiime 2 tutorial that the data need to be:

  • Pair end reads are merged

  • Non-biological sequences are removed

  • Reads are trimmend to the same length

  • Low quality reads are discarded

With Trimmomatic I performed all the steps except for the truncation of the reads to the same length. I set a limit of 100 bp for the sequences to be kept. So now my reads are of minimmum legth 100 bp, the primer and adapter sequences are removed, they were quality filtered based on Q-score, but they are of varying lengths.

My sequences look like this:

This quality plot correspond to the merged, quality filtered and without primer and adapter sequences reads. This is the summary of lengths:


Is there a way to cut the sequences to the same length in Qimme2 without doing anything else? I would like to cut all the sequences to a length of 300 since I am working with ITS-2. I am aware of DADA2 and Deblur being able to do this before denoinsing. I know that before OTU clustering I need to dereplicate my sequences with:

qiime vsearch dereplicate-sequences
--i-sequences joined-ITS-2.qza
--o-dereplicated-table table-dereplicated-ITS-2.qza
--o-dereplicated-sequences rep-seqs-ITS-2.qza

and then I can performed the clustering with:

qiime vsearch cluster-features-de-novo
--i-table table-dereplicated-ITS-2.qza
--i-sequences rep-seqs-ITS-2.qza
--p-perc-identity 0.99
--p-threads 12
--o-clustered-table table-dn-99-ITS-2.qza
--o-clustered-sequences rep-seqs-dn-99-ITS-2.qza

Thank you very much for your help!! :grinning:

Thank you for your detailed post, including quality graphs. :bar_chart: :+1:

Not that I know of... The issus is that many Qiime2 pluging are pipelines that do multiple things at once (e.g. cutadapt), so you may not have as much control as if you were using the original program...

I think both cutadapt and vsearch could do this outside of Qiime2 using their full set of options.

Check out RESCRIPt. I wonder if one of these options would be an even better fit for your data set...
qiime rescript filter-seqs-length-by-taxon
qiime feature-classifier extract-reads

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.