Working with FeatureData[Sequence] in Jupyter

Hello!

I'm just moving from command line to q2-api via jupyter notebook. The issue I'm facing right now is related to my need to see the repseqs (FeatureData[Sequence]) produced by dada2 in a pd.DataFrame.

After considering the classical strategy of a visualizer, I chose to have this information contained in a pd.DataFrame because I want to explore some things by my own and I need the info to be easily accesible. I tried using repseqs.view(pd.DataFrame) but it cannot be done. The error is:
Exception: No transformation from <class 'q2_types.feature_data._formats.DNASequencesDirectoryFormat'> to <class 'pandas.core.frame.DataFrame'>

The main question is: How can I do to charge the repseqs in a dataframe? But it would be also good to know if there is a place where it is defined the available transformations from each data type. I've been digging in the github repository but I couldn´t find anything. Maybe there is other repository with more documentation?

Even if I was able to have the sequences in a pd.DataFrame, is it possible to return this information to an Artifact in order to continue with my workflow (i.e taxonomy assignment)?

Thank you very much!
VĂ­ctor

Hi @vimh,

You can view the sequences as a pd.Series and then turn the series into a data frame:

seqs_q2 = Artifact.load('rep-seqs.qza')
seqs = seqs_q2.view(pd.Series)
seqs.to_frame()

You can 100% turn a series of sequences back into an Artifact.

I agree, it would be nice to have more documentation of transforms.

Best,
Justine

4 Likes

Thanks @jwdebelius, now I can continue with my workflow. Maybe it is a simple thing but without help I could not.

Best,
VĂ­ctor

4 Likes

Hi @vimh,

Glad we could help!

And I will say that I was either told by someone (Evan? Matt? Liz? Tomasz? IDK) or else read the code to find hte transformer, which is non-trivial.

Best,
Justine

1 Like