Reference database for clustering

Rakaya · September 5, 2023, 6:52am

Hello,

I am working with human gut microbiota V3-4 region. Its is one paired-end sample that I am practicing on. I denoised the sample and now I want to cluster.

It is mentioned in qiime2 documentation that clustering at certain % identity requires the denoised data that I have and a reference database clustered at the same % identity. I want to use the SILVA database.

In order to process the database, I will be following the RESCRIPT tutorial up until this step of the 1st part for preparing the Silva database:

qiime rescript dereplicate \
    --i-sequences silva-138.1-ssu-nr99-seqs-515f-806r.qza \
    --i-taxa silva-138.1-ssu-nr99-tax-derep-uniq.qza \
    --p-mode 'uniq' \
    --o-dereplicated-sequences silva-138.1-ssu-nr99-seqs-515f-806r-uniq.qza \
    --o-dereplicated-taxa  silva-138.1-ssu-nr99-tax-515f-806r-derep-uniq.qza

By this I would have the dereplicated database, is this correct?

Can I use dereplicated database instead of clustered one for clustering?
If I want to cluster my database at 85% identity threshold, what would be the command to use?
Can I use 99% silva to cluster at 85%?

Thank you!

SoilRotifer · September 14, 2023, 8:16pm

Hi @Rakaya,

RE 1:

Yep.

RE 2

There is no need to cluster your reference sequences to match the clustering of your data, for reasons previously mentioned within these threads:

Additionally:

RE 3:

Simply add the paramater --p-perc-identity 0.85 to that command. : I would not recommend using 85%. This is just provided as an example to drastically reduce the database size.

RE 4:

Yes you can classify your 85% clustered reads against the 99% SILVA database. Again, I'd avoid 85%. Also, the content I linked for RE 2, applies here too.

Rakaya · September 28, 2023, 10:08am

Thank you for the reply, it explained a lot of things I did not know.
I do, however, have another question as I am trying to cluster.
Is it better to cluster against a full length dereplicated database or amplicon-region specific classifier?

SoilRotifer · September 28, 2023, 2:50pm

I personally like to use the amplicon-specific classifier, for reasons outlined by Werner et al. 2011.

system · October 29, 2023, 8:51pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.