I have been trying to find information on how to match the feature ID of the microorganism to an NCBI ID. I want to learn more about specific organisms.
The taxonomic information that the pre-trained classifier (Pre-trained Silva 138) of QIIME2 is giving me the taxonomic name, however, I want to know the Accession or NCBI ID of such organism. Specifically when exported from the taxabarplot.qzv into a CSV file.
I'm assuming that you worked on ASVs and then did taxonomic classification? In this case, there isn't a direct one-to-one relationship between the taxonomic assignment or accession and a public database. Multiple ASVs can be mapped to the same taxonomic clade.
There are 2 ways I can come with to approach this. One is to just look up the name in NCBI and get the accession number there. That's probably the easiest solution.
A second option is to filter your representative sequences so you get the ASVs mapped to that taxonomic clade, to tabulate them, and then click the hyperlink to blast them against NCBI. This will give you an ID or set of IDs and you can work from there.
I have been working on the second option you gave me because the organism I am interested in was classified as an uncultured genus of the Oscillospiraceae family.
However, now that I have filtered the sequences. I run into the problem that there are 5 sequences related to that taxonomic reference. I have looked into my taxonomy. qza and realize that 5 organisms are classified with the same taxonomy (d_Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__uncultured)
My question is the next, when the taxonomy bar plots are exported into a CSV, do all 5 organisms are added into one column of observed features? I only have one column with that taxonomy classification.