convert SampleData[DADA2Stats] to pandas dataframe?

hsapers · May 3, 2020, 5:10am

Hello - I'm trying to work with the stats artifact output of denoise-paired as a data frame using the artifact API and Jupyter notebooks. To generate the qiime2 artifact I ran the following:

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs run-1-demux-paired-end.qza \
  --p-trim-left-f 0 \
  --p-trim-left-r 0 \
  --p-trunc-len-f 230 \
  --p-trunc-len-r 224 \
  --p-trunc-q 2 \
  --p-max-ee-f 2 \
  --p-max-ee-r 2 \
  --p-chimera-method consensus \
  --p-hashed-feature-ids TRUE \
  --o-representative-sequences 230_truncation/run-1-224truncr-rep-seqs.qza \
  --o-table 230_truncation/run-1-224truncr-table.qza \
  --o-denoising-stats 230_truncation/run-1-224truncr-stats.qza

To import the stats.qza artifact into python I ran the following:

 import qiime2
 from qiime2.sdk import Artifact
 table_stats = Artifact.load('230_truncation/run-1-224truncr-stats.qza')
 table_stats_df = table_stats.view(pd.DataFrame)

This gave the following error:

Exception: No transformation from <class   'qiime2.plugin.model.directory_format.DADA2StatsDirFmt'> to <class 'pandas.core.frame.DataFrame'>

When checking the data class:

table_stats.type
SampleData[DADA2Stats]

I know I could create a .qzv artifact, download the tsv, then import the tsv with read_csv, but I figure that there has to be a more direct way. I found a couple of forum posts about the artifact api:
exporting .qzv objects and viewing artifacts as metadata and generating the FeatureData[Taxonomy] artifact. The latter also seemed to be looking for a way to circumvent generating and re-importing the .tsv file from the .qzv artifact. But I couldn't modify the discussion there to work for this case.

I think I'm getting confused about importing the methods and functions, and I don't think I know the difference between the following pairs of import statements:

from qiime2.sdk import Artifact
from qiime2 import Artifact

from qiime2.metadata.metadata import Metadata
from qiime2.plugins import metadata

Thank you!

I'm using the following versions:
qiime2 2020.2.0
python 3.6.7
jupyterlab 1.2.6
pandas 0.26.3

timanix · May 3, 2020, 6:25am

Hi!
I know that it is not the best way to do it, but I prefer to extract a table and read it in pandas:

import pandas as pd
qza = 'beta-group-sign-pairwise.qzv'
a = !unzip $qza
digest = a[1].split('/')[0].replace('  inflating: ','')
meta = pd.read_csv(digest+'/data/raw_data.tsv', sep='\t')
!rm -r $digest

It is working for .qzv and .qza files

Nicholas_Bokulich · May 3, 2020, 2:36pm

Hi @hsapers,
Great question.

It looks like a transformer from SampleData[DADA2Stats] --> pd.DataFrame does not exist, since the dada2 stats are usually handled as metadata... try this slight tweak to your code instead:

import qiime2
import pandas as pd
table_stats = qiime2.Artifact.load('230_truncation/run-1-224truncr-stats.qza')
table_stats_df = table_stats.view(qiime2.Metadata).to_dataframe()

That should do it! Let me know if you have any more questions.

hsapers · May 3, 2020, 6:46pm

Thanks! @Nicholas_Bokulich, you modification worked perfectly. And I'll keep @timanix 's code on hand - looks like it will be useful going forward as well

system · June 4, 2020, 12:46am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.