Curious about how feature names are selected, technical question.

Acelya_Dalgic · March 13, 2024, 7:08pm

So let's say I am analyzing 2 sample groups that are from different regions of 16S. Before cryptic assignment, some cryptic names are assigned to the ASVs that are detected in my files. Let's say there are ASVs that will assign to E. coli later on with classifier. Should I expect for their cryptic ASV names to be same or is it randomly assigned to the each detected?
I hope it's the right place to ask. I assume them not to be same. Thank you!

timanix · March 13, 2024, 7:21pm

Hello!
ASV IDs are generated based on the sequences itself. So, the same sequences (100% identity in length and letters order) will have the same IDs.

Please check this topic.

colinbrislawn · March 13, 2024, 7:33pm

This is the perfect place to ask! Welcome to the forums, Açelya.

Here's a real ASV with a cryptic name and DNA sequence :

>4b5eeb300368260019c1fbc7a3c718fc
TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGATGTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATATCTT

The name (4b5ee...) is the MD5 hash of the sequence.

Try it for yourself with the Linux command md5sum:

echo -n TACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGATGTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATATCTT | md5sum
4b5eeb300368260019c1fbc7a3c718fc  -

As Timur mentioned, that MD5 hash changes if a single basepair changes.
But the same sequence always makes the same MD5 hash.

So to your question:

Neither!
It will not be the same (because the sequence is not the same).
It will look random (but once you know it's the MD5 hash, you know it's 100% deterministic).

'cryptic' is a great work because MD5 "was widely used as a cryptographic hash function"!

Acelya_Dalgic · March 13, 2024, 8:33pm

Thanks a lot! You mentioning it being MD5 hash of the sequence made a lot of sense to me!

system · April 14, 2024, 2:34am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.