Hello. In the tutorial open-reference clustering, the reference sequences used in --i-reference-sequences are at 85% identity. Meanwhile, the reference sequences provided by silva are at 97% or 99% indentity, so could you please tell me how to make the reference sequences at 98% indentity? If using de-novo-clustering with the 99% reference sequences as input, how to get the reference taxonomy of the consequent 98% reference sequences?
Hi @nmgduan,
The point of de-novo-clustering is that - unlike closed-reference, and open-reference clustering- it does not depend on a reference database. So if you want your OTU table to be clustered at 98% identity, you can simply set your feature-table, and rep-seqs file and set your desired identity clustering, as per the Clustering sequences tutorial:
If you want to use open-reference clustering, simply use your 99% silva identity reference-sequences in the --i-reference-sequences parameter and set the identity to .98. You can always cluster to lower identity % of your references, but not higher.