Curious about how feature names are selected, technical question.

So let's say I am analyzing 2 sample groups that are from different regions of 16S. Before cryptic assignment, some cryptic names are assigned to the ASVs that are detected in my files. Let's say there are ASVs that will assign to E. coli later on with classifier. Should I expect for their cryptic ASV names to be same or is it randomly assigned to the each detected?
I hope it's the right place to ask. I assume them not to be same. Thank you!

1 Like

ASV IDs are generated based on the sequences itself. So, the same sequences (100% identity in length and letters order) will have the same IDs.

Please check this topic.

1 Like

This is the perfect place to ask! Welcome to the forums, Açelya. :qiime2:

Here's a real ASV with a cryptic name :closed_lock_with_key: and DNA sequence :dna: :


The name (4b5ee...) is the MD5 hash of the sequence.

Try it for yourself with the Linux command md5sum:

4b5eeb300368260019c1fbc7a3c718fc  -

As Timur mentioned, that MD5 hash changes if a single basepair changes.
But the same sequence always makes the same MD5 hash.

So to your question:

It will not be the same (because the sequence is not the same).
It will look random (but once you know it's the MD5 hash, you know it's 100% deterministic).

'cryptic' is a great work because MD5 "was widely used as a cryptographic hash function"!


Thanks a lot! You mentioning it being MD5 hash of the sequence made a lot of sense to me!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.