Obtaining specific feature IDs

Alan_Chan · March 14, 2019, 12:59am

How can I get the feature ID for a specific OTU so I can BLAST it? For example, I'm interested in a feature that was identified only as far as the genus (Pseudomonas) level and I want to see what shows up when I BLAST it. I've already converted my taxonomy.qza to a .tsv and checked for it there but there are multiple OTUs identified to that same genus so I am unable to know which feature ID to go with

Mehrbod_Estaki · March 14, 2019, 5:18am

Hi @Alan_Chan,
You are certainly starting at the right place to go to your taxonomy file. You can use the taxonomy visualization to look for the target taxa. The fastest way to blast that would be to copy the hashID and look for that ID in your rep-seqs.qzv. That will show you the actual sequence which also happens to be a hyperlink that will blast search that sequence for you.

Unfortunately that just means there are multiple ASVs that you need to be blasting for since they are not differentiated beyond the genus level.

Alan_Chan · March 14, 2019, 6:34pm

Thanks for you response. I find it hard to believe there isn't a way to find a specific feature ID. In my example, I am looking to BLAST a specific OTU in a collapsed taxonomy feature table. It is defined as far as the genus level, however, there are many OTUs defined only as far as that same genus, with each one having a unique feature ID and confidence (looking at taxomony.qzv). There isn't a way to get the specific feature ID from a feature table?

thermokarst · March 14, 2019, 6:49pm

No, unfortunately there is a bit of machinery missing for that. What you can do though is filter the table based on the taxon(s) in question, then, filter your FeatureData[Sequence], then, you can tabulate the filtered sequences and click the sequence links to have the blast query set up automatically, or, you can tabulate the taxonomy and filtered sequences, to see each taxon + feature id + sequence.

Alan_Chan · March 14, 2019, 11:44pm

That would still leave me having to pages of blast sequences to look through, wouldn't it? I will try to find another way outside of qiime2. Thanks any way!

thermokarst · March 15, 2019, 12:30am

I guess I don't understand, can you provide some more context?

thermokarst · March 15, 2019, 12:34am

BTW, check out this example for a concrete representation of what I proposed earlier: Metadata in QIIME 2 — QIIME 2 2019.1.0 documentation

Alan_Chan · March 15, 2019, 4:03am

I actually didn't know about that feature. That's good to know. However, that still doesnt allow me to find the specific OTU sequence in a certain feature table since it was only identified to the genus level and there are many OTUs only identified to that level in my rep-seqs and tabulated-feature-metadata. I don't know which one it is. Does that make sense?

Nicholas_Bokulich · March 15, 2019, 3:18pm

@Alan_Chan,
If you are looking for the sequence associated with a specific feature ID, you can use the tabulate approach to merge taxonomy + sequences and then search within that file, as described by @thermokarst above:

So how are you identifying this particular feature? Did you, e.g., run ANCOM on a feature table of OTUs to identify OTUs that are differentially abundant? In that case, you can use the OTU ID to find the sequence that you are looking for in the sequence summary described by @thermokarst.

However, you have mentioned a collapsed feature table. So perhaps the problem is that you ran ANCOM on a collapsed feature table and you know which genus is significant but you do not actually have the OTU ID. If that is the case, then you should still follow @thermokarst's advice to tabulate taxonomy + sequences. You can search for that genus in that visualization to only pull out sequences belonging to that genus, and then BLAST those. Perhaps that is what you have done, and this is what you mean when you say:

That is why something like ANCOM should be run on the OTUs themselves, not on the collapsed feature table: it is more specific and granular, and you can identify specific sequences, rather than conglomerated taxonomic groups, that differ between your sample groups.

system · April 15, 2019, 9:18pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.