How to know which sequences are included in an OTU?


I have a number of unidentified OTUs after running the feature-classifier plugin and I am trying to determine how many sequences/which sequences from my DADA2 output were clustered together to create each OTU. Is there a way to determine this in Qiime2?



DADA2 / deblur does not cluster sequences. But returns only unique features, or exact sequence variants, i.e. a shorthand for denoised 100 % identity OTUs. To obtain the counts for that feature / ESV, simply look at the feature-table summary visualization.

Sorry, I'm aware that DADA2 doesn't cluster. I forgot to mention that I used VSEARCH to run the de-novo clustering on the DADA2 output. I'm wondering how I can determine which sequences make up each OTU created after clustering with VSEARCH.


Ahh okay! That makes sense. Is there a reason why you are clustering your sequences? Often information, or subtle patterns, are lost when doing this. You can read about an exquisite example here. :open_book:

Anyway, regarding your question, I think there is a potential solution outlined within this thread. :thread:

Let us know if this works.

1 Like

Thanks for the response. That forum kind of helped... although I am still not finding a way to generate an "OTU map". There has been some info in other forums about using the metadata tabulate command to discover which sequences clustered where, but there is no example code to go off of and the instructions were a bit vague. Any additional help with this issue would really be appreciated.

Hi @KWyssmann,

It looks like this is still an open issue. Others on the forum might have ideas to work around this in the mean-time.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.