Clustering of amplicon variants does not change the cluster ID?

Paul_Czechowski · March 27, 2018, 8:09pm

Dear Qiime developers,

I am clustering the Amplicon Variants (AV) produced by the DADA2 script using the VSEARCH script to yield clustered OTUs. The taxonomy of the un-clustered AV can be established as outlined in the Qiime 2 documentation:

I am assuming that I can use the AV taxonomy assignments also with the the clustered OTUs since the identifiers are a string that won't change during the clustering procedure.
This means that the clustered OTUs are always subsets of the unclustered amplicon variants and the sequnec ids are hashes or random strings because they are meant to be always unique.

Am I right? Otherwise I would need to do taxonomy assignment after each possible clustering script.

Thank you for mainting and supporting Qiime.

Kind regards,

Paul

ebolyen · March 28, 2018, 5:46pm

Hi @Paul_Czechowski!

What form of clustering are you using? De-novo, open-reference, closed-reference, closed-reference merged with unmatched ASVs?

I think you have the right idea, but I need to know the OTU picking strategy to be sure I'm giving you the right advice.

Paul_Czechowski · March 28, 2018, 7:23pm

Hello @ebolyen,

thank you for your reply on my forum post.

In reply to your question - I have been using de-novo clustering.

Kind regards,

Paul

ebolyen · April 3, 2018, 2:37pm

Hi @Paul_Czechowski,

Sorry for the very delayed response.

Regarding your original question then, you are completely correct. You would expect the clustered ASVs to be a subset of the entire set. You will also find that your representative sequences after clustering are still ASVs themselves so you can use all of the same FeatureData[...] artifacts that you would typically use with the original ASVs as most methods allow the FeatureData to be a superset of the data you are working with.

In particular, you can reuse the FeatureData[Taxonomy] artifact just fine with your de-novo clustered data.

Paul_Czechowski · April 3, 2018, 3:12pm

Hello @ebolyen, thanks very much for helping out here I appreciate it.