Clustering of amplicon variants does not change the cluster ID?

Dear Qiime developers,

I am clustering the Amplicon Variants (AV) produced by the DADA2 script using the VSEARCH script to yield clustered OTUs. The taxonomy of the un-clustered AV can be established as outlined in the Qiime 2 documentation:

  • I am assuming that I can use the AV taxonomy assignments also with the the clustered OTUs since the identifiers are a string that won’t change during the clustering procedure.
  • This means that the clustered OTUs are always subsets of the unclustered amplicon variants and the sequnec ids are hashes or random strings because they are meant to be always unique.

Am I right? Otherwise I would need to do taxonomy assignment after each possible clustering script.

Thank you for mainting and supporting Qiime.

Kind regards,

Paul

Hi @Paul_Czechowski!

What form of clustering are you using? De-novo, open-reference, closed-reference, closed-reference merged with unmatched ASVs?

I think you have the right idea, but I need to know the OTU picking strategy to be sure I’m giving you the right advice.

Hello @ebolyen,

thank you for your reply on my forum post.

In reply to your question - I have been using de-novo clustering.

Kind regards,

Paul

Hi @Paul_Czechowski,

Sorry for the very delayed response.

Regarding your original question then, you are completely correct. You would expect the clustered ASVs to be a subset of the entire set. You will also find that your representative sequences after clustering are still ASVs themselves so you can use all of the same FeatureData[...] artifacts that you would typically use with the original ASVs as most methods allow the FeatureData to be a superset of the data you are working with.

In particular, you can reuse the FeatureData[Taxonomy] artifact just fine with your de-novo clustered data.

Hello @ebolyen, thanks very much for helping out here I appreciate it.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.