Hi there,
I just finished the DADA2 process and obtained an ASV table and sequences with MD5 hash headers. Is there any method to know the original sequence IDs that each ASV corresponding to? Thanks!
Each ASV represents many sequences from many samples. All these sequences have unique IDs.
Are you looking for a table like this?
Example:
ASV ID: | Seq IDs: |
---|---|
HASH1234 | Sample_1;read_4, Sample_7,read_245, Sample_9,read_4781, ... |
HASH5678 | Sample_1;read_38, Sample_2,read_113, Sample_9,read_5, ... |
Yes, that's exactly what I want. Do you have any ideas?
Here's no easy way to do this within DADA2 or Qiime2.
Here's the code that makes the hashes from input sequences:
To make that table, you would want to preserve both the original SequenceID and the new MD5 hash it receives after it is renamed.
It may be easier to make this table another way: List all reads in your study along with their unique MD5 hashes, then map your original reads and their SeqIDs against this list. I think this can be done using vsearch, but that would take some custom scripting.
@11131, can you clarify what you'd like to do with that information? As @colinbrislawn mentioned, we don't generate that mapping of identifiers directly, but there may be some other way to help you get the information you need. For example, if you're interested in the sequence associated with each hashed identifier, that is generated by the DADA2 denoise methods. For example, it is the rep-seqs-dada2.qza
file generated in this step of the Moving Pictures tutorial, and you can turn that into a visualization that you can explore using qiime feature-table tabulate-seqs
(see here).
Great! Thanks for your help.
Yes. I'm trying to reanalyze a published V3-5 sequencing dataset. But I found the ASV sequence was not started from the V3 primer position, though the parameter was --trim-left 0. So I just want to backtracking the original sequences the shorter ASV represent.
Now I've found the method to find out the original sequences. Many thanks!