I am working with human gut microbiota V3-4 region. Its is one paired-end sample that I am practicing on. I denoised the sample and now I want to cluster.
It is mentioned in qiime2 documentation that clustering at certain % identity requires the denoised data that I have and a reference database clustered at the same % identity. I want to use the SILVA database.
In order to process the database, I will be following the RESCRIPT tutorial up until this step of the 1st part for preparing the Silva database:
There is no need to cluster your reference sequences to match the clustering of your data, for reasons previously mentioned within these threads:
Additionally:
RE 3:
Simply add the paramater --p-perc-identity 0.85 to that command. : I would not recommend using 85%. This is just provided as an example to drastically reduce the database size.
RE 4:
Yes you can classify your 85% clustered reads against the 99% SILVA database. Again, I'd avoid 85%. Also, the content I linked for RE 2, applies here too.
Thank you for the reply, it explained a lot of things I did not know.
I do, however, have another question as I am trying to cluster.
Is it better to cluster against a full length dereplicated database or amplicon-region specific classifier?