Hey all,
I am fairly new to Qiime2 and feel like I am struggling a bit to transition to the Qiime way of doing things. Hopefully this doesn't come across as overly whiny, but relative to other bioinformatics tools, Qiime feels unusually black-box-y and unintuitive to me. Among other things, I think it's because 1) all files use the same two extensions and are zipped by default, 2) you frequently need to open files in a browser to get a sense of their contents, 3) files use somewhat cryptic semantic types, and 4) it feels tricky to figure out how to go from A to B in an analysis once you're ready to go beyond what's covered in the Moving Pictures tutorial. Note, I completely understand that there are good and valid reasons for the above, I'm just describing the net effect to a (or at least this) new user.
Every couple of days, I'm left with browser windows that look like this, as I try to piece together information from across the forum:
For context, I'm aware of the following resources and have read, skimmed, or watched all of them:
- The Moving Pictures tutorial.
- These 2022 and 2024 ISB tutorials on Youtube.
- The current "amplicon docs", including the References sections.
- The "Using Qiime 2" docs.
- The old Qiime2 docs.
- The CLI --help outputs.
In the above, I'm not seeming to find the following kinds of information, which would be really helpful to me.
- For any given artifact class (e.g. FeatureTable[Frequency]), illustrations of what the data usually looks like. I'm envisioning something like the Data Schema/Data format descriptions you can get in the UCSC Table Browser, e.g.:
Perhaps you could also include links to a few minimal examples that could be viewed in view.qiime2.org, like in the Moving Picture tutorial. - For any given artifact class (e.g. FeatureTable[Frequency]), the set of qiime commands that can manipulate that class.
- Basically, the inverse of 2. For any given qiime (or qiime plugin) command, the list of available classes it can process (i.e. its available inputs and outputs) with links to example data (e.g. developed as part of 1.) illustrating what the data looks like before and after having been processed by said command. Ideally, there would also be links to the relevant paper describing the method behind the command and/or links to high-quality summaries of what the command is doing on the forum.
- Some sort of (possibly AI-powered) graph/network-like utility that can map out how to get from A to B in an analysis. Or put another way, a sort of interactive flowchart for the entire Qiime action/object universe, constrained by the data manipulations that are possible in Qiime2. I'm thinking of something along the lines of a neo4j graph database. Ideally, this utility could help answer questions like: Given a FeatureTable[Frequency], what is the shortest path (and corresponding commands) for obtaining a FeatureData[Taxonomy].
For a more concrete example (which in part prompted me to post this), I processed a dataset through to ANCOM-BC following the Moving Pictures tutorial, and produced some differential abundance outputs like this.
I also classified the features with GreenGenes2. However, I can't seem to map the features in the differential abundance plot to their relevant taxa (as the Feature IDs don't match), and I have no idea which transformation I should do (or shouldn't have done) to get from A to B here.
If I had a tool that showed me where I was in the Qiime2 pipeline universe and I could look at the inputs and outputs of nearby "nodes", I could potentially figure out what commands to run, but as it is, my main option is to bug you fine folks for help.
Aside from the above, I haven't been able to find information about the following:
- I know how to view the provenance of a file, but is there a way to output the actual Qiime2 CLI commands that were run to generate it?
- Is there any way to have Qiime2 automatically add some sort of default suffix or prefix to the files it outputs for any given command? That would save me the trouble of having to name all my files manually, and it would allow one to see at a glance how the file was processed. Perhaps when the filename got long enough, the earliest portions could start being reversibly hashed or something.
- I do most of my work on a remote server, and it's kind of a nuisance to have to transfer Qiime files locally to view them in a browser. As it is, zipped files (and tons of datatypes) can be viewed with Visidata, but I was wondering if you have any other suggestions for dealing with this and/or whether there is any work in progress to make more data viewable on the command line?
Ok, I think that's it. Looking forward to your feedback!