Im in the middle of writing up a manuscript, and Im struggling with ASV identification. In the context of analysis, I like the 16 character hash because its an intrinsic property of the ASV. But, 16 characters in my manuscript takes up a lot of space and looks like junk. (Yay random MD5 hash!) I’m trying to come up with an alternative way to identify ASVs. One thought has been to just number them, with the pontential concern you see in most de novo OTU papers, where my OTU 1 and your OTU 1 are likely to be different things and no one walks away happy. Another thought was to take a genus level designation, so like, if its a prevotella, asv prev_01.
How are other people handling this? What would people like to see? What makes sense?
Great question, something I’m currently struggling with myself. I think I’ve settled with a hybrid of the ideas you mentioned. I’m doing my analyses at ASV levels (for the most part) but when I need to discuss a specific ASV in the paper I use its highest annotated designation and provide the full hash+sequence in a supplement. That way the average reader gets a feel for what taxa is being discussed, the article reads cleaner, and at the same time providing the keen readers with access to the exact ASV/sequence to compare to their own work. Not sure how well this will work logistically…
Would love to hear what others are doing though!
It’s a good idea to rename the hash in something that is more readable by us (humans), and, as it was already mentioned, you can supply as a supplement to the article a table with original hashes for other researchers. I did the same and I will attach such table when manuscript will be ready
Thanks! I also tend to work at ASV level, and my text is something like this:
a member of genus Lactobacillus (ASV ID eca95390ce78c2e1fca8ef621ead6c18) and a Granuilcatella spp (ASV ID 7770387e217bc9cbc12dcb87a16f779d)…
And then, as both you and @timanix mentioned, I’ve got a table in the supplement that maps the hash to full taxonomy (and some other analysis information), as well as a fasta with the full sequences.
But, I really hate the full hash in the text?
You can rename it, for example, “Pseudomonas_asv1”, explain the principle of names formation at the beginning, so no need to type the hash in the text or figures. In the supplemental material all names will have an original hash, who wants to know it will be able to obtain all hashes from the table.
Okay, I feel like I should mention my final solution to this because I saw the post… I ended up going with the first four letters of the genus and the first 4 letters of the hash:
a member of genus Lactobacillus (Lact-eca9) and a Granuilcatella spp (Gran-7770)
Which I kind of think works?
I did approximately the same for figures