I was wondering if there is a way to use qiime taxa collapse and output the number of features within each taxon. For example, collapse my data by phylum and show how many features fall into each phylum per sample. Currently I believe the value given is read counts.
I'm not sure if this is immediately available through native plugin functionality, but if you wanted, you could use an interactive session and the Artifact API to figure this out with the help of something like pandas.
import qiime2
import pandas as pd
# Create helper function to split the taxonomy classification strings
def get_level(taxon, level=5):
taxa = taxon.split(';')
taxa.extend([''] * (7 - (len(taxa) - 1)))
return taxa[level].split('__')[-1]
# load and transform the feature metadata artifact into a series
features = qiime2.Artifact.load('path/to/my/feature/classification.qza')
features = features.view(pd.Series)
# Run the get_level function through the series with a default of 5 for genus
# Alternatively you can run:
# features.apply(lambda item: get_level(item, level=#))
# with 0-6 being Kingdom through Species to get other levels
features = features.apply(get_level)
counts = features.value_counts()
After that you would have a series with the index being the genus, and the values being the number of unique features classified as that genus.
@jakereps I appreciate the suggestion. Im just surprised there is no way to complete the same task through the current plugins in the Q2CLI. I think it could be a very useful addition.
I think an easy way to implement this in the CLI would be to have collapse accept a FeatureTable[PresenceAbsence]. Then the collapsed table would contain the # of unique ASVs/OTU that belong to each taxon — is that what you are looking for, @Stream_biofilm ?
@Nicholas_Bokulich Yes, I think that would be extremely useful to allow researchers to see how many ASVs/OTUs are present in each taxon. It can then also be used to calculate relative abundance of a particular taxon based on ASV/OTUs instead of reads.
@ebolyen Would the collapsed table still be able to show this ASV/OTU count per taxon per sample?
I am not really following this. Could you please clarify? This transformation would report the number of ASVs/OTUs belonging to each taxon in each sample, so I suppose this could be used to calculate the relative proportion of unique ASVs/OTUs from each clade, if that's what you are going for. That seems like a bit of a contrived/odd measurement to me, but yes that should be do-able with the transformation discussed above.
No, the collapsed table looks like an overall summary of all samples (e.g., the unique ASVs/OTUs belonging to each taxon across all samples)
Sounds like you are going for the former (taxon-by-feature), which I think is what my suggestion would achieve, once implemented.
I have added this issue to track progress on this. We will post back here whenever such changes make it into a release. Thanks!