Input file names when viewing provenance

jpetteng · January 3, 2018, 3:33pm

Is there a way to see the actual file names that were the inputs for the various steps shown in the provenance chain when visualizing a .qza or .qzv object?

For example, I have a file taxa-bar-plots.qzv and I would like to make sure that the input files were all prefixed with 300bp as I also created files with 400bp prefixes to see the influence of --p-trim-length on the deblur output before moving onto downstream steps.

The provenance chain just lists the steps but not the input files as far as I can tell.

Thanks!
jamie

thermokarst · January 4, 2018, 3:25pm

Hi @jpetteng!

The filename isn't recorded as part of provenance --- I will ask @ebolyen to provide some detail as to why, but as I understand it, it boils down to the fact that filenames are mutable, while something like the artifact's UUID aren't. @ebolyen, is there a technical limitation/hurdle here, too?

Here is a simple python script to demonstrate how to get a mapping of Artifact UUIDs to filenames:

import qiime2
import pathlib

artifacts = pathlib.Path('.').glob('**/*.qz*')
artifact_map = {qiime2.Artifact.peek(str(a)).uuid: str(a) for a in artifacts}
print(artifact_map)

{'00d4b3d4-a036-4d99-8a3d-25866fe519dd': 'beta-rarefaction.qzv',
 '158f42e6-0576-4ba6-9b7b-7ebf572a0b5b': 'alpha-rare-without-spaces.qzv',
 '2ecc209a-1e9f-4e2f-a417-c0e01c313170': 'table.qza',
 '45158415-22ed-4601-9373-f461042311f6': 'alpha-rare-with-spaces.qzv'}

That still requires a bit of manual work, although, to identify which file is which. We still have plenty of plans in the provenance (and citation) departments, so stay tuned!

system · February 4, 2018, 9:39pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.