Importing merged MetaPhlAn abundance table with relative frequencies

pippo_pippo · January 25, 2021, 3:58pm

Hello,

I would like to import a merged feature table with relative abundances (relative frequencies) obtained from MetaPhlAn into QIIME to use the diversity plugin. An example can be seen here. The table is in tab seperated values (TSV) fromat:merged_cases_profiled_reformatted.txt (78.8 KB)

If QIIME can not handle the pipe character that divides taxonomic ranks I can also reduce the table to a common tax level, e. g. species.

From your Import Tutorial I did not see the option how to import. Can QIIME handle this data structure (i. e. features in rows, different samples in columns)?

Best
Philipp

Edit: I tried this command but received an error:

qiime tools import
--type 'FeatureTable[RelativeFrequency]'
--input-path /input/merged_cases_species.txt
--output-path /home/plicht/QIIME/123.qza
--input-format TSVTaxonomyFormat

Traceback (most recent call last):
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/q2cli/builtin/tools.py", line 158, in import_data
view_type=input_format)
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/result.py", line 241, in import_data
validate_level='max')
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/result.py", line 266, in _from_view
recorder=recorder)
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/core/transform.py", line 59, in make_transformation
(self._view_type, other._view_type))
Exception: No transformation from <class 'q2_types.feature_data._format.TSVTaxonomyFormat'> to <class 'qiime2.plugin.model.directory_format.BIOMV210DirFmt'>

An unexpected error has occurred:

No transformation from <class 'q2_types.feature_data._format.TSVTaxonomyFormat'> to <class 'qiime2.plugin.model.directory_format.BIOMV210DirFmt'>

See above for debug info.

jwdebelius · January 25, 2021, 5:41pm

Hi @pippo_pippo,

The error tells you that your data is not a TSVTaxonomyFormat. That format maps between feature ID and feature metadata (i.e. the taxonomy string). Based on what I'm seeing in the import tutorial, you will need to import your data as a biom format.

Best,
Justine

pippo_pippo · January 25, 2021, 6:00pm

Hi @jwdebelius

Thanks for your suggestion. Using biom format, I can only import sample-wise. Can I merge the separately imported files afterwards?

jwdebelius · January 25, 2021, 6:20pm

Hi @pippo_pippo,

In :qiime2: , the taxonomy is a separate artifact that you can act on with or without the sample metadata and a separate semantic type. It gets incorperated where needed. You can learn more about the semantic type philosophy in qiime2 here, but essentially, the data is linked by an ID and kept in two files. In your case, you need to pull off the taxonomy into a two column .tsv file, and import that as a taxonomy semantic type.

You may also want to consider pre-filtering your table so you have a single level before you import. I might also replace the pipe with a semi colon .

Best,
Justine

pippo_pippo · January 26, 2021, 10:58am

Many thanks for your help @jwdebelius!
So there is no easy procedure of importing the original tsv file into qiime? Because when importing biom, i would have to converte each biom file into tsv, collapse it to a specified, convert back into biom and then import it.

Best,
Philipp

jwdebelius · January 26, 2021, 3:59pm

Hi @pippo_pippo,

I'm not sure what you mean by this? If you have all your samples in a single TSV, they can be imported to biom and then imported into qiime2.

Best,
Justine

pippo_pippo · January 26, 2021, 5:55pm

Hi @jwdebelius,

I got it to import successfully.

I am relatively new to the microbiome field and would like to calculate alpha diversity on relative abundance data (the sum of all taxa on a given level [e. g. species] within a sample is 1). Do you know an appropriate metric for alpha diversity? As far as I understand is, that most approaches require full count data instead of relative abundances. However, e. g. Simpson and Gini-Index should work fine. But when trying to calculate that with qiime diversity alpha it states to to require a frequency table instead of relative frequency

qiime diversity alpha --i-table merged_cases_species.qza --p-metric 'gini_index' --o-alpha-diversity gini.qza --verbose

Traceback (most recent call last):
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/q2cli/commands.py", line 329, in call
results = action(**arguments)
File "", line 2, in alpha
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
output_types, provenance)
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/action.py", line 484, in callable_executor
outputs = self._callable(scope.ctx, **view_args)
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/q2_diversity/_alpha/_pipeline.py", line 28, in alpha
vector, = action(table=table, metric=metric)
File "", line 2, in alpha_passthrough
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/action.py", line 208, in bound_callable
self.signature.check_types(**user_input)
File "/home/plicht/anaconda3/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/core/type/signature.py", line 342, in check_types
name, spec.qiime_type, parameter.type))

TypeError: Parameter 'table' requires an argument of type FeatureTable[Frequency]. An argument of type FeatureTable[RelativeFrequency] was passed.

Plugin error from diversity:

Parameter 'table' requires an argument of type FeatureTable[Frequency]. An argument of type FeatureTable[RelativeFrequency] was passed.

See above for debug info.

jwdebelius · February 2, 2021, 4:45pm

Hi @pippo_pippo,

Sorry for the delay in answering. I am slowly losing my mind -> .

I think the issue is a QIIME-specific issue. The diversity calculations require a count table because of assumptions around other metrics. I'm not sure if you can get a frequency table, or spoof a frequency table (multiple everything by a constant, say 100,000 or something) and then import the data. It's not a perfect solution, but it might get you there.

Unfortunately, I don't work much with MetaPhlan, so i dont know if you can do an abundance approximation.

Best,
Justine

pippo_pippo · February 3, 2021, 11:03am

Hi @jwdebelius

no worries, I am totally pleased by your help and the forum in general as I am new to Metagenomics and soaking up information.

As how MetaPhlAn works (Mapping shotgun reads against a precomputed database of marker genes specific for a clade) it is not possible to get full read counts because it maps I) only against few markers and not whole genomes and II) each clade consists of varying number of marker genes. So multiplying a clade's relative abundance with the total number of reads of a sample's library will result in a biased output. However, the MetaPhlAn author also proposed your Idea of multiplying by a constant and then rounding to the closest integer to get "pseudo-counts".

I am just searching for alternative analysis approaches that work with relative abundance data as this would fit the original idea of MetaPhlAn better.

Best
Philipp

jwdebelius · February 3, 2021, 5:27pm

Hi @pippo_pippo,

I guess then multiply with a pseudo-count makes more sense and that will let you work with a "frequency" semantic type.

Best,
Justine

SoilRotifer · February 3, 2021, 10:42pm

Hi @pippo_pippo,

I'd highly recommend reading this paper by Baker et al. 2021. They provide some insight into how they processed and imported MetaPhlAn and HUMAnN data into QIIME 2. You can search more in the biobakery forum too.

-Mike

system · March 7, 2021, 4:42am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.