Is it safe to assume the representative sequence cannot be “reverse engineered” if one only has the ASV ID?
I would argue that it is not safe to assume that the sequence can’t be found. But this question really depends on what level of privacy is needed. If one were so inclined it would not be too hard to create hashes for different 16S regions from a database such as Green Genes and compare those against a data set to get the sequence from the hash. Of course this would not work on sequences that were not in a reference database, and someone would have to have a lot of time on their hands and be really interested in what you were doing.
This question has actually come up in my work, and I am curious about other’s thoughts as well
We need to unambiguously state that the md5 hash is too fast to be cryptographically secure, and assume that folks can figure out the source sequence. This should not be taken as a privacy or security measure.