I successfully used QIIME2 to get the ASV table and taxonomy for my ITS sequence without having particular problems. However, I was wondering if there is a way to get for each feature the accession number of the reference sequence from the UNITE database that matched the features’ sequence I have. In the taxonomy file I get only the feature ID, the taxonomy classification and the confidence.
Thanks in advance and have a nice day,
Good question. The short answer is no, not usually, because the way the q2-feature-classifier methods work is to determine the most probable lineage and/or some consensus among the top hits, rather than identifying a single top hit… so there is no single accession ID associated with any given classification (usually).
However, it is possible to configure the classify-consensus-* methods to just look for the top hit instead of performing a consensus classification by using the
max-hit parameters (see the documentation for more details related to each method). To see the feature ID associated with these top hits, though, you would need to put the accession number in the taxonomy label somehow, so that this appears in the classification results.
Another option (especially if you are interested in the accession IDs for just a handful of sequences) would be to filter your
FeatureData[Sequence] artifact to contain only the query sequences you want to match, then use
qiime quality-control evaluate-seqs… that method is just running blastn under the hood and will show you the full blast report so you can see the accession IDs for all potential matches. Or run blastn directly.
many thanks for the in-depth explanation. I will have a look and play around with the suggested parameters.