How to capture a value from a summary and pipe it?

How to capture a value from a summary and pipe it?

In an effort to automate work, I would like to know how to capture a value from a summary visualization and feed that value to the next command. I searched and failed, and am running qiime2-2021.2 through conda.

I have run:

qiime tools import --type SampleData[PairedEndSequencesWithQuality] --input-path raw_data/ --output-path reads_qza/reads.qza --input-format CasavaOneEightSingleLanePerSampleDirFmt

qiime cutadapt trim-paired --i-demultiplexed-sequences reads_qza/reads.qza --p-cores $NCORES --p-front-f GTGYCAGCMGCCGCGGTAA --p-front-r CCGYCAATTYMTTTRAGTTT --p-discard-untrimmed --p-no-indels --o-trimmed-sequences reads_qza/reads_trimmed.qza

qiime vsearch join-pairs --i-demultiplexed-seqs reads_qza/reads_trimmed.qza --o-joined-sequences reads_qza/reads_trimmed_joined.qza

qiime quality-filter q-score --i-demux reads_qza/reads_trimmed_joined.qza --o-filter-stats filt_stats.qza --o-filtered-sequences reads_qza/reads_trimmed_joined_filt.qza

qiime demux summarize --i-data reads_qza/reads_trimmed_joined_filt.qza --o-visualization reads_qza/reads_trimmed_joined_filt_summary.qzv

and when I view reads_trimmed_joined_filt_summary.qzv, under the interactive quality tab, I see this:


.

I want a way to capture whatever number is in that 2% row of the Demultiplexed sequence length summary, which in this example I have highlighted in gray and is 368, and to insert it as the --p-trim-length parameter into my next qiime command, which here would be

qiime deblur denoise-16S --i-demultiplexed-seqs reads_qza/reads_trimmed_joined_filt.qza --p-trim-length 368 .... etc.

I tried qiime tools export --input-path reads_qza/reads_trimmed_joined_filt_summary.qzv --output-path ., but none of its outputs easily gave me what I wanted. (Its outputs: data.jsonp, demultiplex-summary-forward.pdf, demultiplex-summary-forward.png, dist/, forward-seven-number-summaries.tsv, index.html, overview.html, per-sample-fastq-counts.tsv, q2templateassets/, quality-plot.html.)

Do I have what I want and just don't see it? Or if I need to do more work now so I can do less work later, what's next in my automating journey? I'm sure this is Python-soluble, but I have no Python-skill. Thanks to any who can help!

2 Likes

Hi @wburgess!

Thanks for reaching out, this is a great question.

The QIIME framework doesn’t currently support direct result piping. However, one of our community contributors has built a plugin that does allow for pulling values directly from an output file, which may help you with this workflow. Details on this plugin, along with installation instructions can be found here.

As another alternative, you could achieve this with a small script that contains all of your QIIME commands (above) along with the desired value from your output file that you’d extract using Python (or another language of your choice). Here is an example of how you could do this in Python. That being said, this may be a heavier lift since you are not familiar with Python.

Hopefully this helps provide you with a couple of options for automation in this workflow - please let us know if you have any additional questions moving forward!

Cheers,
Liz

3 Likes

Thanks for that, @lizgehret!

@wburgess, I put together a brief concrete example of how to read the viz results using QIIME 2, and parse the HTML file using pandas to get the information you're looking for:

import os
import pandas
import qiime2
import tempfile


viz_fp = 'demux.qzv'
viz = qiime2.Visualization.load(viz_fp)
with tempfile.TemporaryDirectory() as tmpdir:
    viz.export_data(tmpdir)
    fp = os.path.join(tmpdir, 'quality-plot.html')
    dfs = pandas.read_html(fp, index_col=0)
fwd = dfs[0]
print(fwd.loc['2%'][0])

The key things to note here are:

  1. I am using qiime2.Visualization.load to load the viz
  2. Using qiime2.Visualization.export_data to export the contents of the viz to a temporary directory
  3. reading the content of the quality plot HTML file into a list of pandas dataframes
  4. extracting the row and column I need

To be clear, this did require some trial-and-error on my part to figure out what file I needed to grab out of the viz, and then identifying a way to parse that file took another couple of minutes. This is certainly something that could be improved upon in future versions of QIIME 2!

:qiime2:

4 Likes

@thermokarst, first, I have to thank you for apparently just making a tool to do the thing I wanted to do. But it turns out that I lack the lxml module? So my mind is truly deep into the weeds of the interplay between ubuntu running conda running qiime running python, and I have no idea where the problem is.

I've copied my terminal from trying it below. As always, I'm grateful for any assistance.

[Edited to say I tried conda install lxml too, and that claimed to find outrageous numbers of conflicting packages, which wasn't solved after running conda upgrade --all.]

(qiime2-2021.2) cory@cory-Thinkstation-P520:~/Documents/wlb/thesis-qiime/FP$ python
Python 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import os
import pandas
import qiime2
import tempfile
viz_fp = 'reads_qza/reads_trimmed_joined_filt_summary.qzv'
viz = qiime2.Visualization.load(viz_fp)
with tempfile.TemporaryDirectory() as tmpdir:
... viz.export_data(tmpdir)
... fp = os.path.join(tmpdir, 'quality-plot.html')
... dfs = pandas.read_html(fp, index_col=0)
...
Traceback (most recent call last):
File "", line 4, in
File "/home/cory/anaconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/pandas/util/_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "/home/cory/anaconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/pandas/io/html.py", line 1101, in read_html
displayed_only=displayed_only,
File "/home/cory/anaconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/pandas/io/html.py", line 894, in _parse
parser = _parser_dispatch(flav)
File "/home/cory/anaconda3/envs/qiime2-2021.2/lib/python3.6/site-packages/pandas/io/html.py", line 851, in _parser_dispatch
raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it
install lxml
File "", line 1
install lxml
^
SyntaxError: invalid syntax
pip install lxml
File "", line 1
pip install lxml
^
SyntaxError: invalid syntax
pip3 install lxml
File "", line 1
pip3 install lxml
^
SyntaxError: invalid syntax

@lizgehret, thank you for promptly helping, and with more than one idea, at that. I’m currently trying @thermokarst’s idea, but you bringing those options to my awareness was still valuable, and I have a few things cooking in the background—that qiime2R plugin looks promising.

1 Like

Hi @wburgess, sorry, I forgot to mention, the snippet above was run in a QIIME 2 2021.4 (which comes with lxml automatically) environment. It looks like you're running 2021.2, you can install lxml in that env with the following command (once it has been activated):

conda install -c conda-forge -c bioconda -c defaults lxml

Once I installed that in my 2021.2 env I was able to re-run the code above. I hope that helps.

:qiime2:

2 Likes

Et voila. Thanks so much!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.