Number of Features per Taxon

Stream_biofilm · January 18, 2018, 7:23pm

Hello everyone,

I was wondering if there is a way to use qiime taxa collapse and output the number of features within each taxon. For example, collapse my data by phylum and show how many features fall into each phylum per sample. Currently I believe the value given is read counts.

Cheers,
Danny

jakereps · January 18, 2018, 11:30pm

I'm not sure if this is immediately available through native plugin functionality, but if you wanted, you could use an interactive session and the Artifact API to figure this out with the help of something like pandas.

import qiime2
import pandas as pd

# Create helper function to split the taxonomy classification strings
def get_level(taxon, level=5):
    taxa = taxon.split(';')
    taxa.extend([''] * (7 - (len(taxa) - 1)))
    return taxa[level].split('__')[-1]

# load and transform the feature metadata artifact into a series
features = qiime2.Artifact.load('path/to/my/feature/classification.qza')
features = features.view(pd.Series)

# Run the get_level function through the series with a default of 5 for genus
# Alternatively you can run:
#     features.apply(lambda item: get_level(item, level=#))
# with 0-6 being Kingdom through Species to get other levels
features = features.apply(get_level)

counts = features.value_counts()

After that you would have a series with the index being the genus, and the values being the number of unique features classified as that genus.

Stream_biofilm · January 19, 2018, 12:54am

@jakereps I appreciate the suggestion. Im just surprised there is no way to complete the same task through the current plugins in the Q2CLI. I think it could be a very useful addition.

Danny

ebolyen · January 19, 2018, 7:04pm

That does sound handy!

Which do you think would be more useful, another feature-table of taxon-by-feature like this:

               b680bc5baa75aad30af95e910bd99d1d 28643c7006d94784f9157251d9cdf0da
k__foo;p__bar                                50                               70
k__baz;p__qux                                 0                               20

Where the respective feature/read counts still exist, or something more like a "collapse summary":

k__foo;p__bar    2
k__baz;p__qux    1

Which I believe is what @jakereps code will give you.

You could theoretically do some more downstream stuff with the feature-table, but I'm not certain how useful that is in practice.

Nicholas_Bokulich · January 19, 2018, 7:24pm

I think an easy way to implement this in the CLI would be to have collapse accept a FeatureTable[PresenceAbsence]. Then the collapsed table would contain the # of unique ASVs/OTU that belong to each taxon — is that what you are looking for, @Stream_biofilm ?

Stream_biofilm · January 19, 2018, 8:33pm

@Nicholas_Bokulich Yes, I think that would be extremely useful to allow researchers to see how many ASVs/OTUs are present in each taxon. It can then also be used to calculate relative abundance of a particular taxon based on ASV/OTUs instead of reads.

@ebolyen Would the collapsed table still be able to show this ASV/OTU count per taxon per sample?

Nicholas_Bokulich · January 22, 2018, 6:33pm

I am not really following this. Could you please clarify? This transformation would report the number of ASVs/OTUs belonging to each taxon in each sample, so I suppose this could be used to calculate the relative proportion of unique ASVs/OTUs from each clade, if that's what you are going for. That seems like a bit of a contrived/odd measurement to me, but yes that should be do-able with the transformation discussed above.

No, the collapsed table looks like an overall summary of all samples (e.g., the unique ASVs/OTUs belonging to each taxon across all samples)

Sounds like you are going for the former (taxon-by-feature), which I think is what my suggestion would achieve, once implemented.

I have added this issue to track progress on this. We will post back here whenever such changes make it into a release. Thanks!