Imort FASTQ with manifest via Artifact API?

Hi @mamillerpa!

Importing is a built-in action in QIIME 2, and since it's not a part of a plugin, the Studio doesn't automatically show the Python code like it does for a plugin action.

We have a very basic Aritfact API tutorial that shows how to import data using the Python API. The example in the tutorial shows how to use the Artifact.import_data() method to import a pandas.DataFrame into a FeatureTable[Frequency] artifact. The import_data() method can be used to import other types of data (as long as they are in a supported format). In addition to Python objects (e.g. the pandas.DataFrame example), files and directories can be imported, which is what you'll want to do with the FASTQ manifest.

We'll be providing an expanded Artifact API tutorial within the next release or two, and will follow up here when that's available. In the meantime, here's an example of how to import a FASTQ manifest using the Artifact API. I'm using the importing tutorial's example data to import a manifest for single-end reads with Phred 33 quality scores, but the process is similar for the other FASTQ manifest variants.

Here's the code I ran in an IPython interactive shell -- you can run this same code in the Jupyter Notebook or with regular Python.

In [1]: !ls
se-33  se-33-manifest  se-33.zip

In [2]: !ls se-33
sample1.fastq.gz  sample2_S1_L001_R1_001.fastq.gz

In [3]: from qiime2 import Artifact

In [4]: artifact = Artifact.import_data('SampleData[SequencesWithQuality]', 'se-33-manifest', view_type='SingleEndFastqManifestPhred33')

In [5]: artifact
Out[5]: <artifact: SampleData[SequencesWithQuality] uuid: 2ce29984-8aff-4d5a-80c1-f4d895c43e7f>

In [6]: artifact.save('single-end-demux.qza')
Out[6]: 'single-end-demux.qza'

In [7]: !ls
se-33  se-33-manifest  se-33.zip  single-end-demux.qza

In this example, we have the se-33-manifest file and a se-33/ directory of FASTQ files to import. The import step happens in Cell 4. where we use Artifact.import_data(). Here are the components of the import_data() call:

  • The first argument is the semantic type of the artifact ('SampleData[SequencesWithQuality]'). This argument corresponds to the CLI's --type option.

  • The second argument is the FASTQ manifest file path ('se-33-manifest') that's in the current working directory (I'm using a relative file path; absolute file paths also work). This argument corresponds to the CLI's --input-path option.

  • The third argument is the view type, which in this case is the name of the FASTQ manifest file format we're using (view_type='SingleEndFastqManifestPhred33'). This argument corresponds to the CLI's --source-format option.

In Cell 5, we see that the artifact variable stores the new Artifact object we imported the data into. If you want to save this artifact to disk, we use Artifact.save() in Cell 6 to save the artifact to a file called single-end-demux.qza. This artifact file can then be used with any other QIIME 2 interface, such as the CLI, Studio, or other Artifact API scripts/sessions.

Let me know how importing via the Artifact API works out for you!

4 Likes