q2-sapienns with HUMAnN 3.9 and MetaPhlAn 4

Hi Greg,
Thank you for this super useful tool! :dizzy:

I tried it with HUMAnN 3.9 and MetaPhlAn 4, and everything works smoothly.

I have some questions concerning the plugin.
Concerning the HUMAnN output: Is it intentional that it is just working with the (not normalized) RPK values? In the tutorial of HUMAnN it is recommended to normalize:

It is important to normalize HUMAnN's default RPK values prior to performing statistical analyses. Sum-normalization to relative abundance or "copies per million" (CPM) units are common approaches for this”. Humann3 tutorial

It would be useful to have the option to use these values also in the import. Please tell me if I have any logical mistake here!

Concerning the MetaPhlAn output:
I’m struggeling to work with the produced artifact FeatureTable[RelativeFrequency].
I read in some posts (e.g. here) about multiplying with a pseudo-count. Is there any option you know I could do this in QIIME2, or are there any updates about this? I would like to use qiime diversity core-metrics, that does not accept a RelativeFrequency artefact.

I’m grateful for any opinion and advice!
Best,
Lydia

1 Like

Hi @LPrunus,
Glad it's been helpful for you!

The RPK values are what is provided by the qiime sapienns humann_genefamily command. Different values are provided for the pathway table (I'm seeing that here, from the documentation link you shared). Are you running into this when working with pathway data? If so, I think that's what humann is providing for these data rather than RPK.

I would like to use qiime diversity core-metrics, that does not accept a RelativeFrequency artefact.

See the qiime sapienns frequency command. That will multiply all relative frequencies by a specified value (--p-target-freq), resulting in a FeatureTable[Frequency] that can be provided to qiime diversity core-metrics.

Hope this helps!
Greg

2 Likes

Hi Greg,
Thank you for the answer!

Actually, the value issue already starts in the input-type HumannGeneFamilyTable from qiime tools import. Here the need is specified that:

Expected sample ids (e.g., XXX_Abundance-RELAB) to end with unit descriptor RPKs”.

That might be an issue for users (or just for me, I don’t know...) that want to use the normalization that is offered in HUMAaN, after which the samples can end with:

  • _Abundance-RPK (not normalized)
  • _Abundance-RELAB
  • _Abundance-CPM

The same applies to the input type for the Pathways, as here the unit descriptor Abundance is needed.

It might be that you did this intentional, so just non-normalized data can be input, so the artefact FeatureTable[Frequency] is correctly produced and not confused with a Relative Frequency.
But maybe it would be also useful for others to have the import option for already normalized HUMAaN- data.
Thank you for the qiime sapienns frequency command, that’s exaclty what I was looking for!

Thank you for the help!
Best,
Lydia

Hi @LPrunus, Would you mind sending me the file you're working with, so I can have something to experiment with for supporting differently normalized input? Also, the command(s) used to generate it would be helpful. You can send all of this to me through a private message on the forum, so it's not publicly accessible (in case that's important for this data).

Glad the frequency command was what you were looking for!

Hi @LPrunus,
Thanks for sharing those files - your hunch is correct, I specifically required the RPK values as input, not RELAB (relative abundance). Supporting RELAB does make sense so I'm adding this as a feature request on q2-sapienns here. We'll try to get to this ASAP.

Hi,
I have used q2-sapienns too but using the example code that was provided so I guess I used non-normalized data as well (new to microbiome analysis).
Is it possible to try this out with CPM or RELAB input formats now?