Investigating sequence files via Python API

I’m using the Python API to explore my data (and debug my plugin… :confounded:) - how can I access the data in a FeatureData[Sequence] artifact?

For example, I want the equivalent of doing:

import qiime2 as q2
import pandas as pd

table = q2.Artifact.load('path/to/table.qza')
df = table.view(view_type=pd.DataFrame)

… but on a file of sequences (e.g. the rep-seqs.qza output file from running DADA2 on the Atacama desert tutorial data). Is this possible?

More broadly, how do I find out what view_types are available for each semantic type? I’m specifically looking for the Python data types that I can convert each qiime2 data type into, so that I can do quick and easy analyses interactively.

1 Like

Hey there @cduvallet! Sorry we took so long to get back to you.

The default format for this semantic type has transformers defined for viewing as pd.Series or qiime2.Metadata (source). Although, any plugin can register additional transformers, so if you need to convert to another format, you are able to define that in your plugin, even if you didn’t define the format within the plugin.

That is a bit trickier - we haven’t implemented any type of discoverability or user-facing API for tackling transformations. This is complicated by things like directory formats (versus file formats), as well as transformer transitivity. For now, the truly brave and adventurous an poke around at the PluginManager to learn more about registered transformers:

pm = qiime2.sdk.PluginManager()

Sorry, wish I had a better answer for you!

:qiime2: :t_rex:

That’s what I wanted! Looks like the sequences are stored as skbio.sequence._dna.DNA objects in the pandas Series. For posterity’s sake, here’s what that code to access that looks like:

import qiime2 as q2
import pandas as pd
seq = q2.Artifact.load('seq.qza').view(view_type=pd.Series)

Ah, ok. Well, consider this an upvote for this sort of feature. Like I wrote in my “tools to explore artifacts directly” post, being able to know how to interact with qiime2 data with familiar and flexible tools is super important. :slight_smile:


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.