Command line interface/tools to explore artifacts directly

cduvallet · January 29, 2018, 11:42pm

Not sure if this is feasible, but something that could dramatically improve the user experience with qiime2 would be to have a set of tools that can directly explore artifacts like OTU tables, sequences, etc. Basically an equivalent to key bash commands that would work directly on artifacts (my personal favorites are: less, grep, ls -lh, and wc -l). This way, you could look at your data without having to call an export command each time you try something new.

I have a feeling a lack of these tools will be a large barrier to adoption for many people, especially established microbiome researchers who are used to digging into their data and looking into what's going on after every processing step.

thermokarst · January 30, 2018, 3:06pm

Hi @cduvallet!

Have you had a chance to check out the relevant visualizers for those types of data? Just to name a few:

Feature Tables: feature-table summarize
Sequences: feature-table tabulate-seqs
Metadata: metadata tabulate

And to get at a complete list of Methods and Visualizers: Available plugins — QIIME 2 2017.12.0 documentation

As far as "power users" go, we have been recommending that they leverage the Artifact API, since it has a little more control for things like exploratory data analysis, vs q2cli. Using the Artifact API can help with viewing artifacts in a variety of ways (the following example is plucked from the docs)

>>> import biom
>>> from qiime2 import Artifact
>>> unrarefied_table = Artifact.load('table.qza')
>>> rarefy_result = feature_table.methods.rarefy(table=unrarefied_table, sampling_depth=100)
>>> rarefied_table = rarefy_result.rarefied_table
>>> biom_table = rarefied_table.view(biom.Table)
>>> print(biom_table.head())
# Constructed from biom file
#OTU ID      L1S105  L1S140  L1S208  L1S257  L1S281
b32621bcd86cb99e846d8f6fee7c9ab8     25.0    31.0    27.0    29.0    23.0
99647b51f775c8ddde8ed36a7d60dbcd     0.0     0.0     0.0     0.0     0.0
d599ebe277afb0dfd4ad3c2176afc50e     0.0     0.0     0.0     0.0     0.0
51121722488d0c3da1388d1b117cd239     0.0     0.0     0.0     0.0     0.0
1016319c25196d73bdb3096d86a9df2f     11.0    17.0    12.0    4.0     2.0

So with that said, I completely agree with you, this is especially cumbersome for users of q2cli (I just wanted to make sure that you were aware that exporting isn't the only option available right now).

One idea that we have tossed around is some kind of tweak to qiime tools peek, that would somehow let you get at a brief summary of data within an artifact. Maybe we could also do something like qiime tools cat. As well, visualizations also technically support multiple representation formats, so that might be another way to handle this (update vizs to create a full HTML representation, as well as a light text-based representation).

Hopefully some of the alternatives are good enough for now (anecdotally, we haven't heard too much of an outcry about this, yet...), but please know that this is on our radar!

Thanks!

cduvallet · January 30, 2018, 4:01pm

Thanks for the thorough response, @thermokarst!

The visualizations and API tools you point to are very nice and definitely good enough for now. That said, I think they're still far more cumbersome than would be ideal. For example, generating a summary of a fairly small feature table takes about 10 seconds and requires two separate commands - which is far longer than just digging around with less on a text file. Same with using the API - you have to open a python interpreter, load in the file, and then call some commands to get a less interactive output than less gives you.

I think having easy and fast access to the underlying data is also really important for troubleshooting data processing at each step, especially when you're not getting the results you expect. For example, greping for barcodes or primers to check whether they're in the sequences, counting the number of lines in files at each processing step to get a rough estimate of how many reads you're losing, opening up your sequences to see where your primer/barcodes start in each sequence, etc.

I think this sounds like a great idea! qiime is great for beginner users who are looking for their data to be nicely presented (e.g. in the visualization outputs like from qiime feature-table summarize), but a little more annoying for people who are used to dealing with raw text files and prefer things to be fast and versatile than nicely package.

Also an important caveat to all my comments: I haven't actually used qiime2 yet to process any data yet. I'm sure there exists many tools and alternatives for most go-to troubleshooting methods that I just don't know about. And, like you said, the alternatives are probably good enough for now and certainly good enough for the majority of qiime users.

Matilda_H-D · February 14, 2018, 12:39am

I can add another (friendly, non-urgent) voice to this as I think many people in my lab group feel similarly and it has been one of the biggest adjustments in moving to QIIME 2 for us. It would be nice to be able to search directly in artifact files etc. without having to generate and then extract, export, or download visualisations at each step. We were also used to being able to comb through BIOM files, distance matrices etc. to help understand our data and it's a somewhat laborious process to create something equivalent to, say, an OTU table with per-sample abundances and taxonomy annotations in QIIME 2 (though I think I've figured it out now!).

Certainly not a dealbreaker and there are so many great things about QIIME 2, but just thought I would add that in since it came up.

thermokarst · February 15, 2018, 1:17pm

Thanks for the note, @Matilda_H-D! This all sounds good to me, and we will keep you posted as the discussion continues. Striking the right balance between portability, decentralized provenance tracking, and "quick-look" convenience is certainly tricky, and I am sure there are many more improvements we can incorporate in the months and years to come to streamline this. Thanks!

cduvallet · June 1, 2018, 4:10am

For what it's worth, my thoughts on this issue after attending the QIIME 2 developer conference:

I think that the knowledge that (1) the python API lets you convert artifacts into interactive-able data (e.g. pandas dataframes) and that (2) artifacts are just zip files you can unzip and explore yourself satisfies like 90% of my desires that motivated this forum post.

If I were to put in effort somewhere as a first-pass solution to this issue, it would be in documenting the different ways to extract non-QIIME 2 data objects from the Artifacts via the Python API. And making sure that everybody knows that thing about artifacts being zip files.

Anyway, that's my two cents...

andrea.telatin · November 22, 2020, 5:10pm

Based on my needs I made

Pre compliled binaries available and conda installable.
Can be of any use for you?

Cheers
A

thermokarst · November 25, 2020, 8:47pm

This looks super cool, thanks for sharing @andrea.telatin! Do you have some examples of how you are currently using this tool in your own work?

I took a look at the qax project README, and had a few things that I wanted to address:

accessing metadata in the artifacts requires the full Qiime2 installation

There actually isn't any requirement that a "full QIIME 2 installation" is present in order to read QIIME 2 results. All results in QIIME 2 are saved in whatever their "normal format" is (BIOM, TSV, fastq.gz, etc) - and zipped into an Artifact or Visualization . The Archive specification that we have developed has a standard directory and file layout, as you have leveraged in your tool:

https://dev.qiime2.org/latest/storing-data/archive/

The exciting thing (to me) is that tools like qax, q2view, or itol can all leverage the archive spec in order to parse the data and provenance for any QIIME 2-produced result.

not to mention that every release of Qiime2 will produce incompatible artifacts

This is not true - not only do new releases of QIIME 2 not necessarily produce new Archive Formats, but even if they do, backwards compatibility is always guaranteed! We have almost 25 releases of QIIME 2, and only 6 Archive formats. You can read a QIIME 2 2017.2 Artifact in QIIME 2 2020.8, pretty cool ! For a real-world example, check out this filtered feature table from the QIIME 2 docs:

https://view.qiime2.org/provenance/?src=https%3A%2F%2Fdocs.qiime2.org%2F2020.8%2Fdata%2Ftutorials%2Ffiltering%2Ffeature-frequency-filtered-table.qza

It was imported using QIIME 2 2017.9, and then filtered using 2020.8!

100X times faster than Qiime2

You might get some questions about this - since QIIME 2 is an ecosystem of plugins and interfaces, you might want to qualify (and quantify) what you mean by "faster" in this case. Faster than qiime tools peek?

Also, just a minor point, the project is called "QIIME 2" not "Qiime2," as you have written in the README. As well, a citation or a link to the QIIME 2 project might be helpful, too, that way users will be able to get an idea of how your project fits into the broader ecosystem.

If you're open to it, I would be happy to open PR against the qax repo to start suggesting or providing some edits for the points raised above - just let me know!

Thanks again for sharing! This would be really great to get published up on the QIIME 2 Library soon, I will follow up with more details in the next few months!

:qiime2:

Sam_Degregori · March 25, 2024, 9:58pm

wondering if anything like this has been added yet? I know this would be for more advanced users who should be using the Python API anyway. But would love some commands that basically allow you to bypass the exporting table.qzv steps as this can get cumbersome over time.