How to get unique feature ids from the exported BIOM feature table

buritta · August 25, 2022, 1:11pm

Hi everyone,

I have a question about how to get unique feature ids from the exported BIOM feature table. Thanks in advance if anyone could discuss the possibilities to do so with me.

Here is a bit of the background. I don't think it's that important but I still put it here. So I used the pretrained sklearn classier gg-13-8-99-classifier.qza (by sequences in Greenggenes) to classify my reads and then used "qiime metadata tabulate" to export the taxonomy. Then I collapsed the taxonomy table at the genus-level and in the end, I used "qiime tools export" to visualize it. The exported BIOM file was converted to txt and looks like this:

As you could see, a lot of ASVs/ only have blanks in higher resolution and don't have a name. I understand from this Related question in the forum that the differences between "k__Bacteria;_" and "k__Bacteria;p__". However, if I understand it corectly, IDs like this are not unique because for example, you could get two "k__Bacteria;p__" in a feature table like this, which are both pylum-level ASVs that don't have a name in the Greengenes database. This will become a big issue if I want to compare the features classified in two different studies. How can I know the "k__Bacteria;p__" in the result from one study is the same "k__Bacteria;p__" from the result from another study? Is it possible to get a unique ID for each of the features in the feature table, e.g. from greengenes or a putative one, so comparisons between feature tables are possible?

Best regards,
Burrita

Keegan-Evans · August 31, 2022, 7:46pm

@buritta,
I think feature-table tabulate-seqs will do what you are wanting.

system · October 2, 2022, 1:47am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.