Open FeatureData in Artifact API

samuel.drouin · July 24, 2018, 2:56am

Hi,

I am trying to write a small python script that process MiSeq samples through Qiime 2 with Artifact API and plugins to generate taxonomy data. The steps are as follow :
cutadapt -> import in Qiime 2 -> DADA2 -> classifier (classify_sklearn) -> output

Question 1 : Is it possible to access the FeatureData[Taxonomy] Artifact data (UUID, taxon and confidence level) at the classification step directly in python ? Something like when we generate the TSV file from the visualization but within the script.

Question 2 : I tried to export the taxonomy visualization with the following python code :
metadata.visualizers.tabulate(input=classification).visualization.save(os.path.join(directory['Qiime'] + "/taxonomy.qzv"))

Which I assumed would be the equivalent of :
qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv

However I get the following error :
TypeError: Argument to parameter 'input' is not a subtype of Metadata.

Is there any way to export the taxonomy visualization other than export the artifact and use the above command on subprocess to generate the visualization ?

Question 3 : Is there any Artifact/plugins API tutorial in the documentation ? I found a lot of informations about the CLI but not much about the API (other than the dev. docs and the Artifact API docs).

Thank you

thermokarst · July 24, 2018, 1:52pm

Hello @samuel.drouin!

Grabbing the UUID is straightforward (just a prop on the Artifact), while getting the other parts are pretty simple, too, but only if you know to use the view ~~API~~ method (we are lacking good docs in the dept, sorry!)

import qiime2
import pandas as pd

# here I am loading, but this could also be results passed
# down from a previous step.
taxonomy_artifact = qiime2.Artifact.load('taxonomy.qza')
# Get the UUID of the Artifact
print(taxonomy_artifact.uuid)
7224759e-03bb-4179-a382-485165ca8c86

# `view` as a pd.DataFrame to get the taxon and conf.
taxonomy_df = taxonomy_artifact.view(pd.DataFrame)
print(taxonomy_df.head())
                                                                              Taxon  \
Feature ID                                                                            
7fa8f515ee1ac5cb8b529b13e6a89790  k__Bacteria; p__Proteobacteria; c__Gammaproteo...   
40f13b077d41d6837ca6fae65763cf01  k__Bacteria; p__Actinobacteria; c__Actinobacte...   
2e55763e77234a7bfc3a311067a0c7e1  k__Bacteria; p__Firmicutes; c__Clostridia; o__...   
d5ef913e6aee3067001dc610ac8af1d1  k__Bacteria; p__Actinobacteria; c__Actinobacte...   
bc15061b61cf6b5002c58284591f97d4  k__Bacteria; p__Firmicutes; c__Clostridia; o__...   

                                          Confidence  
Feature ID                                            
7fa8f515ee1ac5cb8b529b13e6a89790  0.9998282422613309  
40f13b077d41d6837ca6fae65763cf01  0.9999994725274034  
2e55763e77234a7bfc3a311067a0c7e1  0.9177375883240979  
d5ef913e6aee3067001dc610ac8af1d1  0.9971204554883859  
bc15061b61cf6b5002c58284591f97d4  0.9999262723979453

You will need to view as qiime2.Metadata, first:

import qiime2
from qiime2.plugins import metadata

taxonomy_artifact = qiime2.Artifact.load('taxonomy.qza')
taxonomy_md = taxonomy_artifact.view(qiime2.Metadata)
taxa_tab, = metadata.visualizers.tabulate(input=taxonomy_md)

Still a bit light in this department (would welcome some contributions!):

Artifact API — QIIME 2 2018.6.0 documentation (sounds like you found this already)
Moving Pictures Python API tutorial (QIIME2 2018.4) (thanks @tomasz!)

Hope that helps! :qiime2:

samuel.drouin · July 25, 2018, 12:31am

Thank you for your help @thermokarst ! This is exactly what I was looking for.

I would like to contribute to the Python API documentation in some way and will probably write some documentation about what I found out while writing the script.

samuel.drouin · July 25, 2018, 2:49am

Is there any way to know what classes are accepted as view_type for the artifact view function ?

I tried the same thing on the DADA2 representative_sequences : FeatureData[Sequences] and I get the following error :

rep_seq.view(pandas.DataFrame)
Traceback (most recent call last):
File "", line 1, in
File "/Users/samueldrouin/conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 254, in view
return self._view(view_type)
File "/Users/samueldrouin/conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 265, in _view
recorder=recorder)
File "/Users/samueldrouin/conda/envs/qiime2/lib/python3.5/site-packages/qiime2/core/transform.py", line 59, in make_transformation
(self._view_type, other._view_type))
Exception: No transformation from <class 'qiime2.plugin.model.directory_format.DNASequencesDirectoryFormat'> to <class 'pandas.core.frame.DataFrame'>

I am not sure what class I can use to get this class fasta file.

I figured out that I can get the fasta file from the directory file structure but it most certainly is not the good way to do it :

rep_seq._archiver.root_dir
PosixPath('/var/folders/gh/9m466jhj1zn1jtk8w114w4dr0000gn/T/qiime2-archive-p_4lnjwb/fda193f1-e60c-475b-a0a8-a1cb01f14925')

thermokarst · July 27, 2018, 12:46am

That is a great question @samuel.drouin! We really need to implement a convenience API for this, but in the meantime, you can do something like this to learn more about what you can directly transform to:

import qiime2

pm = qiime2.sdk.PluginManager()
rep_seqs = qiime2.Artifact.load('/path/to/rep-seqs.qza')
transform_to = pm.transformers[rep_seqs.format.file.format].keys()
print('\n'.join([str(t) for t in transform_to]))

which returns:

<class 'pandas.core.series.Series'>
<class 'qiime2.metadata.metadata.Metadata'>
<class 'q2_types.feature_data._transformer.DNAIterator'>

Sorry for the inconvenience, hope that helps! :qiime2:

system · August 27, 2018, 6:46am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.