Is there a SemanticType for plain text files?

cduvallet · January 29, 2018, 1:25am

I’m starting to develop a plugin to percentile-normalize case-control data, which will hopefully be published soon (preprint is on biorxiv).

As currently written, the method requires as inputs an OTU table and lists of case and control samples. The list of samples is currently provided as a plain text file with each sample on a new line, with one file for cases and one for controls.

How do I declare plain text files in the plugin setup? Or is there a different approach that you think would be better to pass this info to the plugin? Happy to discuss.

Thanks!

Nicholas_Bokulich · January 29, 2018, 1:40am

Hi @cduvallet,
Thanks for posting!

Others may have different opinions on this (and/or provide much better explanations), but information about each sample (e.g., case/control status) strikes me as sample metadata.

I would recommend incorporating this information inside a metadata file. That file would contain a list of all samples where each row header is a sample ID and each column contains a different metadata value (e.g., case/control status). So instead of inputting two files of sample IDs (for cases and controls), input a single metadata file as a CategoricalMetadataColumn type (this can be input as a text file, does not require importing as an artifact). This will require users to input a file (the text file) and the name of the metadata column that contains the relevant information (case/control status). Metadata types can be converted to a pandas dataframe for easy manipulation inside the plugin method (e.g., to split into separate lists of sample IDs to be passed to your method).

Lots of qiime2 plugins handle metadata and can be used as a template, but I think that q2-diversity might give a good example of how to declare CategoricalMetadataColumn (and other metadata) inputs in a plugin.

I hope that helps!

cduvallet · January 29, 2018, 3:09pm

Thanks @Nicholas_Bokulich! That’s probably what we should have done in the first place, so I’ll implement it the way you suggested.

I’m having trouble playing around with the metadata, however - I’m getting an ImportError when I try to import MetadataColumn. I think this might be related to a very recently-closed [issue] (https://github.com/qiime2/qiime2/issues/326) where it seems that the name was just changed from MetadataCategory to MetadataColumn.

I just installed the core 2017.12 qiime distribution tonight. What do you think the best way to go about updating qiime2 to reflect the very latest changes is?

Nicholas_Bokulich · January 29, 2018, 3:34pm

Hi @cduvallet,

Yes, you are correct — the advice I gave you is for the dev version, where some very big changes in metadata are in the works for the next release.

You can just install the dev version into a separate conda environment. So you can follow the install instructions here but download the dev version of the environment file that is appropriate for your system instead. So do something like this:

wget https://raw.githubusercontent.com/qiime2/environment-files/master/latest/staging/qiime2-latest-py35-linux-conda.yml
conda env create -n qiime2-2017.12.0-dev --file qiime2-latest-py35-linux-conda.yml

(those instructions are for linux — just swap the separate env file at the link I provided to install in mac OS instead)

I hope that helps!

cduvallet · January 29, 2018, 4:03pm

Sure thing, thanks!

FYI for posterity - using wget as above returns the html file rather than the yaml file, which conda env cannot read. You can either open the raw file itself and copy/paste the text into a file (i.e. using a text editor), or you can directly access the raw text with wget at a slightly different link:

wget https://raw.githubusercontent.com/qiime2/environment-files/master/latest/staging/qiime2-latest-py35-linux-conda.yml