How to dig deeper into specific features?

Marvin_Yeung · January 8, 2020, 5:36am

Hi guys, noobie question here. Is there a more or less standardized or recommended procedure to perform on specific features? I am aware that there are annotation tools for shotgun/wgs data to correlate taxa to function, but what can I do with amplicon data? Picrust2 does a wonderful job producing stratified scoring for features, but what if I'm only interested in one feature? Which database or tools would you recommend? Say a feature is present in lots of my samples that are highly different and this feature could signify important biological information, and classified to: "k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Moraxellaceae;g__Acinetobacter", what can I do about it? Except for basic morphology information, is there a more efficient way of retrieving information about it? I know microbes are functionally redundant most of the time but there must be some special information about a particular taxa right? How can I efficiently retrieve them? I tried simply searching on Kegg and Biocyc databases but seems like they're geared towards WGS style / molecular level analysis, is there anything I'm missing on? Please share your wisdom. Thanks!!

jwdebelius · January 8, 2020, 8:05am

Hi @Marvin_Yeung,

If you tabulate your sequences, and then click on the feature ID of interest, it will blast it against NCBI and you might get whole genome hits that way. You can also check the taxonomic assignment, which might be useful since it looks like it didn't classify as well as it could if you're missing a species string. (even if its empty and you should ignore species).

In this case, I actually just googled it and it has its own wikipedia page, so that might also be a good start? It looks fairly well characterized as a genus, which also indicates that if your samples are culturable, you might be able to isolate it and culture it yourself if its important and/or relevant to you there.

But, it should be noted that this approach may not work for all organisms, and sometimes you are left with basic morphology information. But, a quick google is always worth doing because sometimes you're pleasantly rewarded. And, sometimes you just discover that it's a contaminant or nocosomal infection.

Best,
Justine

Marvin_Yeung · January 13, 2020, 4:46am

@jwdebelius Many thanks Justine! seems pretty straight forward. May I ask, say you Identified a species through amplicon with good matching confidence and retrieve the whole genome sequence through BLAST, what can you do with that? What tools would you recommend to analysis it's genome? Maybe count the genes that are reported express certain function say nitrogen fixing, hence quantifying N fixing ability of a certain species, I've read couple of reviews regarding theses contig annotation approaches tools, and there are so many choices seems like its a whole other domain it scares me a little lol, so anything you think its good to start with and worth checking? Thanks!

jwdebelius · January 13, 2020, 8:47am

HI @Marvin_Yeung,

My impression of the NCBI database is that the genome is mostly annotated and assembled. What you do with it depends on what you're curious about in the genome and what's present. So, you may just be able to look for nitogren fixation genes if those are of interest. I think of contig assemblky as something you need to do for new sequencing.

Best,
Justine