Are Feature IDs hashed from reads?

Hi @Gil_Sharon! Unfortunately there is not a way to reverse the md5 hash (MD5 is a cryptographic hash algorithm --- it is intended to be a one-way transformation for cryptographic purposes --- we use it in QIIME 2 because it is fast and relatively cheap to compute). As @jakereps mentioned, you can toggle hashing at runtime with q2-dada2 and q2-deblur, but this would require you to re-run your analyses from this step.

As far as using the hashed IDs for comparison purposes, the md5 sum of the sequence should always be the same (that is part of the point of a cryptographic hash), so if you see the same feature ID across datasets, this (most likely) comes from the same sequence (technically hash collisions exist in the MD5 space, which means multiple sequences can technically hash to the same md5 sum, but in reality this isn't a problem for our purposes). You can run feature-table tabulate-seqs to tabulate your seqs, which displays the feature ID and the actual sequence --- this is pretty helpful if you want to get back to the original sequences for investigation purposes.

5 Likes