I wonder if there a possibility to perform OTU clustering 99% around taxonomy sequences within qiime software.
I found vsearch tool for similar purpose but it requires some frequency table. In fact I don't have and don't need such table for that analysis. I have only original sequences with taxonomy labels.
Also I would wish that such clustering work as filter ( so some sequences are uniting in cluster with some most represantative label and others remain the same)
Most close I found in ReSCRIPT like "dereplicate" function, but it works like 100% Clustering, so I need the same but 99% clustering.
Is such possibility within qiime? If no - could you suggest software for such tools.
This is how vsearch works. Also, it is more efficient to do dereplicate the data first, then cluster from that dereplicated set. Especially, if you’d like to cluster at several different similarity levels, then there is no need to dereplicate each time.
But I suppose a pipeline for this would be something to consider.
I looked in RESCRIPt dereplicate manual and there is already --p-perc-identity parameter. So probably it would be better use RESCRIPt for same purpose instead of vsearch? If it do the same thing
If your intent is to make a reference database based on clustered sequences, then yes you can use this approach. But if you are simply clustering your reads to generate OTUs for analyses, then you should use vsearch.
In general, we do not recommend clustering reads for the purposes of making a reference database as your ability to correctly assign taxonomy to your reads declines. This is covered in our RESCRIPt manuscript. Usually, clustering reference sequences is performed to reduce the file and memory size of the reference database when computational resources are limited.