Imort FASTQ with manifest via Artifact API?

I would like to start using the artifact api =-O

I use a mixture of the CLI and the Studio now. I see that when I have an Active Job running, I can click on the job and see some Python code. For a DADA2 job , it might start like this:

def denoise_paired(demultiplexed_seqs: SingleLanePerSamplePairedEndFastqDirFmt,
                   trunc_len_f: int, trunc_len_r: int,
                   trim_left_f: int=0, trim_left_r: int=0,
                   max_ee: float=2.0, trunc_q: int=2,
                   chimera_method: str='consensus',
                   min_fold_parent_over_abundance: float=1.0, n_threads: int=1,
                   n_reads_learn: int=1000000, hashed_feature_ids: bool=True
                   ) -> (biom.Table, DNAIterator):

It doesn’t seem like importing FASTQ files into an artifact with the green + button creates an Active Job that I can examine.

How can I learn how to do this import step with the Artifact API (for example, in Jupyter?)

Thanks!

Hi @mamillerpa!

Importing is a built-in action in QIIME 2, and since it's not a part of a plugin, the Studio doesn't automatically show the Python code like it does for a plugin action.

We have a very basic Aritfact API tutorial that shows how to import data using the Python API. The example in the tutorial shows how to use the Artifact.import_data() method to import a pandas.DataFrame into a FeatureTable[Frequency] artifact. The import_data() method can be used to import other types of data (as long as they are in a supported format). In addition to Python objects (e.g. the pandas.DataFrame example), files and directories can be imported, which is what you'll want to do with the FASTQ manifest.

We'll be providing an expanded Artifact API tutorial within the next release or two, and will follow up here when that's available. In the meantime, here's an example of how to import a FASTQ manifest using the Artifact API. I'm using the importing tutorial's example data to import a manifest for single-end reads with Phred 33 quality scores, but the process is similar for the other FASTQ manifest variants.

Here's the code I ran in an IPython interactive shell -- you can run this same code in the Jupyter Notebook or with regular Python.

In [1]: !ls
se-33  se-33-manifest  se-33.zip

In [2]: !ls se-33
sample1.fastq.gz  sample2_S1_L001_R1_001.fastq.gz

In [3]: from qiime2 import Artifact

In [4]: artifact = Artifact.import_data('SampleData[SequencesWithQuality]', 'se-33-manifest', view_type='SingleEndFastqManifestPhred33')

In [5]: artifact
Out[5]: <artifact: SampleData[SequencesWithQuality] uuid: 2ce29984-8aff-4d5a-80c1-f4d895c43e7f>

In [6]: artifact.save('single-end-demux.qza')
Out[6]: 'single-end-demux.qza'

In [7]: !ls
se-33  se-33-manifest  se-33.zip  single-end-demux.qza

In this example, we have the se-33-manifest file and a se-33/ directory of FASTQ files to import. The import step happens in Cell 4. where we use Artifact.import_data(). Here are the components of the import_data() call:

  • The first argument is the semantic type of the artifact ('SampleData[SequencesWithQuality]'). This argument corresponds to the CLI's --type option.

  • The second argument is the FASTQ manifest file path ('se-33-manifest') that's in the current working directory (I'm using a relative file path; absolute file paths also work). This argument corresponds to the CLI's --input-path option.

  • The third argument is the view type, which in this case is the name of the FASTQ manifest file format we're using (view_type='SingleEndFastqManifestPhred33'). This argument corresponds to the CLI's --source-format option.

In Cell 5, we see that the artifact variable stores the new Artifact object we imported the data into. If you want to save this artifact to disk, we use Artifact.save() in Cell 6 to save the artifact to a file called single-end-demux.qza. This artifact file can then be used with any other QIIME 2 interface, such as the CLI, Studio, or other Artifact API scripts/sessions.

Let me know how importing via the Artifact API works out for you!

4 Likes

Hi,

I had a follow-up question for this thread. I’ve used QIIME2’s CLI interface complete an analysis using the same parameters used below (so I know it works) but am now hoping to build my pipeline using the Artifact API and the python plugins as much as possible.

My question is related to passing the artifact generated by Artifact.import_data() into q2_dada2.denoise_paired().

I am passing in demultiplexed, paired-end, Illumina FASTQs. I use the following python code to import the data into python and save it as an artifact (QZA):

from qiime2 import Artifact
import q2_dada2 as dada2

# Import data from demultiplexed FASTQs and save as an artifact
demux_unfiltered = Artifact.import_data('SampleData[PairedEndSequencesWithQuality]',
                                         'markerXXX_manifest.csv',
                                          view_type='PairedEndFastqManifestPhred33')
demux_unfiltered.save('markerXXX_manifest.qza')

When I ask python for the help file for DADA2’s denoise_paired algorithm, I get the following. Am I correct is reading that it expects only single lane per sample data?

help dada2.denoise_paired

Help on function denoise_paired in module q2_dada2._denoise:

denoise_paired(demultiplexed_seqs:q2_types.per_sample_sequences._format.SingleLanePerSamplePairedEndFastqDirFmt, trunc_len_f:int, trunc_len_r:int, trim_left_f:int=0, trim_left_r:int=0, max_ee:float=2.0, trunc_q:int=2, chimera_method:str='consensus', min_fold_parent_over_abundance:float=1.0, n_threads:int=1, n_reads_learn:int=1000000, hashed_feature_ids:bool=True) -> (<class 'biom.table.Table'>, <class 'q2_types.feature_data._transformer.DNAIterator'>)

Just to check, I try to use the artifact I previously created in denoise_paired() and am presented with the following error:

# Filter, denoise, pair, and remove chimeras
dada2.denoise_paired(Artifact.load('markerXXX_manifest.qza'),
                     trunc_len_f[idx], trunc_len_r[idx],
                     trim_left_f[idx], trim_left_r[idx], trunc_q[idx],
                     n_threads=cpus)
AttributeError: 'Artifact' object has no attribute 'sequences'

I must be missing something basic here - like a conversion to a different artifact type. Casava 1.8, perhaps? I can’t find anything clear in the documentation and when I inspect the _denoise.py source, it clearly requires that my artifact class have a .sequences object, which it does not. Maybe there is a format that constructs a .sequences object.

Any advice would be appreciated.

Thanks!
Josh

1 Like

Hi @Joshua_Barnes,

The trouble you are having comes from this line:

You want that to be:

import qiime2.plugins.dada2.actions as dada2

What you are using right now is what we tend to call the "View API" of the plugin. It's just the plain undecorated Python that takes simple objects and returns other objects. The Framework layers on all the extra view-conversion and provenance tracking before and after the view API is called. We call the decorated version with all that extra functionality the Artifact API, because it needs artifacts instead of views.

Incidentally, if you for some reason really hated keeping track of provenance, you could call:

a = Artifact.load('markerXXX_manifest.qza')
demux = a.view(q2_types.per_sample_sequences.SingleLanPerSamplePairedEndFastqDirFmt)  # that's long to type!

and pass demux into q2_dada2.denoise_paired as you would now have the view that the undecorated function expects.

Hope that helps!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.