Is there a way to introspect on a qiime2.Artifact to find out the available classes for .view()?

I struggle every time I deal with a new output type to figure out how to pull it into python if it’s not DataFrame. Unfortunately, I can’t figure out how to get the class of an Artifact object; PluginManager.transformers keys are classes and Artifact.format, while it’s str is the name of the backing class, itself seems to be a semantic type class.

I eventually just printed out a list from directly from PluginManager

for to, frm in PluginManager().transformers.items():
    for f in frm:
        print(f'{to.__module__}.{to.__name__} -> {f.__module__}.{f.__name__}')

and then look back at the list to see what options there are for some Artifact.format.

I’d really like a cleaner way to do this, preferably on a loaded Artifact. Anyone know of a way?

Well, actually, what I'd really like was that .view() defaulted to the backing object (i.e. biom, skbio.stats.distance.DistanceMatrix, etc), but that's a feature request, not a question.

4 Likes

Hi @rrichter - you can use an Artifact's format property to look up the available transformers:

from qiime2 import Artifact, sdk


pm = sdk.PluginManager()
def transformable_to_view_types(artifact):
    from_format = artifact.format
    if issubclass(from_format, sdk.plugin_manager.SingleFileDirectoryFormatBase):
        from_format = artifact.format.file.format
    return set(pm.transformers[from_format].keys())

# load our own data to check the function above out
table = Artifact.load('table.qza')
print(transformable_to_view_types(table))
# displays:
# {<class 'q2_types.feature_data._format.TSVTaxonomyFormat'>,
#  <class 'pandas.core.frame.DataFrame'>,
#  <class 'qiime2.metadata.metadata.Metadata'>,
#  <class 'biom.table.Table'>}

# neat, looks like we can transform to a pandas dataframe, let's confirm:
import pandas as pd
print(table.view(pd.DataFrame))
...

The Artifact.format is the default backing object's type. It is always a file (because an Artifact's data format must be serializable to disk). How you access that data is going to really depend on the Artifact format - SingleFileDirectoryFormat will always have precisely one single file, but a DirectoryFormat can have many - some DirectoryFormats have a fixed schema, while others use a regex pattern for a dynamic number of backing files.

Check out the dev docs for more details:

https://dev.qiime2.org/latest/storing-data/formats/

3 Likes

Thank you!
It even looks like I can even use a variant of this to grab the 1st non-q2_types object and return that transformed version for just about any SingleFileDirectoryFormat type.

Something like

def default_view(artifact):
    from_format = artifact.format
    if not issubclass(from_format, sdk.plugin_manager.SingleFileDirectoryFormatBase):
        throw(NotImplementedError)
    else:
        from_format = artifact.format.file.format
        for fmt in pm.transformers[from_format]:
            if not 'q2_types' in str(fmt):
                return artifact.view(fmt)

Other than sometimes not getting the DataFrame, if it's there, can you see a clear problem with this?
-Alex

This is unstable - the insertion order of the transformers isn't guaranteed, you might run into different "defaults" across plugin manager instantiations.

Please review the docs I shared above - even if this dict was stable you still aren't guaranteed to get a non-file-based format using this approach.