Did anyone use ‘Tax4Fun’ to predict functional feature based on qiime2 artifacts?

Did anyone use ‘Tax4Fun’ to predict functional feature based on qiime2 artifacts? I am wondering about the choice of functional analysis tool? q2-picrust or tax4Fun. Can anyone with experience give advice? I'm messing around with q2-picrust's post-analytic processing steps (KEGG pathway, ...) that's why I'm inclined to use tax4Fun.

Hi all,
I want to add something which is also related to this question. This has been on my mind for years now:
I have a general concern regarding all tools that aim to predict functional features of ASVs/OTUs. We sequence taxonomically informative subregions of ITS or 16S rRNA genes and we know that we rarely hit species level (I just talk about Illumina-based sequencing approaches of short amplicons). We are often stuck with genus and family level. Given the functional differences that we can find on isolate (!) level, how can this approach be valid at all? I understand that we want to know more than just the identities of our ASVs/OTUs but can we really justify a functional annotation based on amplicon sequencing data?
I am a soil microbiologist and whenever we go to the field, we either measure the function of the microbes (e.g. greenhouse gas emissions, enzymatic activities, ...) or I quantify functional genes (e.g. N-cycling genes) using real-time PCR.
Finally, I want to add that the qualitative nature of amplicon sequencing data may makes the functional annotation meaningless anyway because we cannot truly know whether a community has a lower, unchanged or higher functional potential, right?
I fully understand that these tools are very exciting and I will be super happy if someone has a totally different opinion or corrects me :slightly_smiling_face:


Hey @Phuc_Hu_nh_Van,
Can you be a bit more specific with regards to what kind of input you are looking for? Are you looking for technical help going from QIIME 2 artifacts to Tax4Fun, or general thoughts like @lukasbeule 's answer below?

I haven't used Tax4Fun personally but I have used BugBase and have documented how to go from QIIME 2 data to those predictions here.

I'd also like to add my 2 cents on the great points that @lukasbeule brought up.
First and foremost, and as already pointed out, any tool that is using short amplicon data to predict functions has some very important limitations and biases, that the user should be aware of. I think the PICRUSt2 website has a great read on these key limitations that I strongly recommend reading.

I think most would agree that these tools should be limited for generating hypotheses in an exploratory manner and then validate them with some other method. For example, some years ago we hypothesized (using PICRUSt) that physically fit humans had a microbiome enriched with genes associated with fatty acid biosynthesis. That led us to look at the SCFA profile of our samples using GC and found that indeed fit individuals had different SCFA profiles (i.e higher butyrate) [ref]. I think that was a good use of that tool.

Since these approaches are reference dependent, samples from well known environments like human or mouse gut can yield much more accurate results than something like soil. To be honest, I wouldn't even bother running these tools on soil samples. So at the end of the day it depends on your usage. If used properly they can be quite powerful, but just as easily they can be misused and add more noise to the field.

And finally to touch on a few specific points that was mentioned:

Very true, though, you can improve your classification to some extent by using weighted classifiers like with q2-clawback.

This is true for some but not all clades and of course also depends on the function. Recall that some clades are named based after a general shared function, so lots of examples all members of a genus have a shared function.

But of course this again just genes and represents "potential" and not activity, so this comes with its own set of limitations.

You're right that we don't have absolute abundance information (in most cases), but there is still lots of useful information to be gained from looking at ratio of genes/functions. In fact, a lot of biological signals are actually more informative in the context of ratios, think how LDL:HDL or omega fatty acids are often reported in blood work.

tldr; yes they have lots of limitations, but they can also be very useful in the right circumstance.