Hi,
I downloaded the sequence file of Greengenes2 (2024.09.backbone.full-length.nb.qza, 2024.09.backbone.tax.qza) and used the following command to create the classifier in the V34 area. I would like to know if it is necessary to cluster the Greengenes2 sequences with a 99% similarity and then create a classifier?
The command is as follows:
Again, I'd advise against this... There is potential to loose more robust classification, as there are some members contained within the family or genus level that can be erroneously clustered together ,even at 99% similarity threshold.
The historical reason for pre-generating pre-clustered reference databases at 99%, 94%, etc..., was to reasonably allow users to run classification on machines with limited resources.