Did anyone use ‘Tax4Fun’ to predict functional feature based on qiime2 artifacts?

Hey @Phuc_Hu_nh_Van,
Can you be a bit more specific with regards to what kind of input you are looking for? Are you looking for technical help going from QIIME 2 artifacts to Tax4Fun, or general thoughts like @lukasbeule 's answer below?

I haven't used Tax4Fun personally but I have used BugBase and have documented how to go from QIIME 2 data to those predictions here.

I'd also like to add my 2 cents on the great points that @lukasbeule brought up.
First and foremost, and as already pointed out, any tool that is using short amplicon data to predict functions has some very important limitations and biases, that the user should be aware of. I think the PICRUSt2 website has a great read on these key limitations that I strongly recommend reading.

I think most would agree that these tools should be limited for generating hypotheses in an exploratory manner and then validate them with some other method. For example, some years ago we hypothesized (using PICRUSt) that physically fit humans had a microbiome enriched with genes associated with fatty acid biosynthesis. That led us to look at the SCFA profile of our samples using GC and found that indeed fit individuals had different SCFA profiles (i.e higher butyrate) [ref]. I think that was a good use of that tool.

Since these approaches are reference dependent, samples from well known environments like human or mouse gut can yield much more accurate results than something like soil. To be honest, I wouldn't even bother running these tools on soil samples. So at the end of the day it depends on your usage. If used properly they can be quite powerful, but just as easily they can be misused and add more noise to the field.

And finally to touch on a few specific points that was mentioned:

Very true, though, you can improve your classification to some extent by using weighted classifiers like with q2-clawback.

This is true for some but not all clades and of course also depends on the function. Recall that some clades are named based after a general shared function, so lots of examples all members of a genus have a shared function.

But of course this again just genes and represents "potential" and not activity, so this comes with its own set of limitations.

You're right that we don't have absolute abundance information (in most cases), but there is still lots of useful information to be gained from looking at ratio of genes/functions. In fact, a lot of biological signals are actually more informative in the context of ratios, think how LDL:HDL or omega fatty acids are often reported in blood work.

tldr; yes they have lots of limitations, but they can also be very useful in the right circumstance.

4 Likes