I am trying to write a small python script that process MiSeq samples through Qiime 2 with Artifact API and plugins to generate taxonomy data. The steps are as follow :
cutadapt -> import in Qiime 2 -> DADA2 -> classifier (classify_sklearn) -> output
Question 1 : Is it possible to access the FeatureData[Taxonomy] Artifact data (UUID, taxon and confidence level) at the classification step directly in python ? Something like when we generate the TSV file from the visualization but within the script.
Question 2 : I tried to export the taxonomy visualization with the following python code : metadata.visualizers.tabulate(input=classification).visualization.save(os.path.join(directory['Qiime'] + "/taxonomy.qzv"))
Which I assumed would be the equivalent of : qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv
However I get the following error : TypeError: Argument to parameter 'input' is not a subtype of Metadata.
Is there any way to export the taxonomy visualization other than export the artifact and use the above command on subprocess to generate the visualization ?
Question 3 : Is there any Artifact/plugins API tutorial in the documentation ? I found a lot of informations about the CLI but not much about the API (other than the dev. docs and the Artifact API docs).
Grabbing the UUID is straightforward (just a prop on the Artifact), while getting the other parts are pretty simple, too, but only if you know to use the viewAPI method (we are lacking good docs in the dept, sorry!)
import qiime2
import pandas as pd
# here I am loading, but this could also be results passed
# down from a previous step.
taxonomy_artifact = qiime2.Artifact.load('taxonomy.qza')
# Get the UUID of the Artifact
print(taxonomy_artifact.uuid)
7224759e-03bb-4179-a382-485165ca8c86
# `view` as a pd.DataFrame to get the taxon and conf.
taxonomy_df = taxonomy_artifact.view(pd.DataFrame)
print(taxonomy_df.head())
Taxon \
Feature ID
7fa8f515ee1ac5cb8b529b13e6a89790 k__Bacteria; p__Proteobacteria; c__Gammaproteo...
40f13b077d41d6837ca6fae65763cf01 k__Bacteria; p__Actinobacteria; c__Actinobacte...
2e55763e77234a7bfc3a311067a0c7e1 k__Bacteria; p__Firmicutes; c__Clostridia; o__...
d5ef913e6aee3067001dc610ac8af1d1 k__Bacteria; p__Actinobacteria; c__Actinobacte...
bc15061b61cf6b5002c58284591f97d4 k__Bacteria; p__Firmicutes; c__Clostridia; o__...
Confidence
Feature ID
7fa8f515ee1ac5cb8b529b13e6a89790 0.9998282422613309
40f13b077d41d6837ca6fae65763cf01 0.9999994725274034
2e55763e77234a7bfc3a311067a0c7e1 0.9177375883240979
d5ef913e6aee3067001dc610ac8af1d1 0.9971204554883859
bc15061b61cf6b5002c58284591f97d4 0.9999262723979453
Thank you for your help @thermokarst ! This is exactly what I was looking for.
I would like to contribute to the Python API documentation in some way and will probably write some documentation about what I found out while writing the script.
Is there any way to know what classes are accepted as view_type for the artifact view function ?
I tried the same thing on the DADA2 representative_sequences : FeatureData[Sequences] and I get the following error :
rep_seq.view(pandas.DataFrame)
Traceback (most recent call last):
File "", line 1, in
File "/Users/samueldrouin/conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 254, in view
return self._view(view_type)
File "/Users/samueldrouin/conda/envs/qiime2/lib/python3.5/site-packages/qiime2/sdk/result.py", line 265, in _view
recorder=recorder)
File "/Users/samueldrouin/conda/envs/qiime2/lib/python3.5/site-packages/qiime2/core/transform.py", line 59, in make_transformation
(self._view_type, other._view_type))
Exception: No transformation from <class 'qiime2.plugin.model.directory_format.DNASequencesDirectoryFormat'> to <class 'pandas.core.frame.DataFrame'>
I am not sure what class I can use to get this class fasta file.
I figured out that I can get the fasta file from the directory file structure but it most certainly is not the good way to do it :
That is a great question @samuel.drouin! We really need to implement a convenience API for this, but in the meantime, you can do something like this to learn more about what you can directly transform to:
import qiime2
pm = qiime2.sdk.PluginManager()
rep_seqs = qiime2.Artifact.load('/path/to/rep-seqs.qza')
transform_to = pm.transformers[rep_seqs.format.file.format].keys()
print('\n'.join([str(t) for t in transform_to]))