Changing the Metadata file between analyses steps

I have a query regarding the metadata file-it’s more of a conceptual question, and I would like to clear my fundamental understanding before going into deeper analyses.

The metadata file has categorical and numeric columns, and for some analyses such as diversity, gneiss etc., the input only allows categorical or numeric columns (depends on the plugin). If suppose I wish to analyse an attribute (say Nitrogen dose, a column in my metadata file) which I have denoted as ‘categorical’ whereas the particular plugin only requires ‘numeric’:

a) Can I re-run the analysis by removing alphabet characters in all the rows under that attribute and renaming the attribute category as ‘numeric’? I know the plugin will run, but would my results be inaccurate? This is because I wish to see how trends will appear with respect to the Nitrogen dose.

b) Will this deem previous analyses (e.g. rarefaction, diversity, taxonomy etc.) which I have done with ‘Nitrogen dose’ denoted as ‘categorical’ to be inaccurate?

PS: I denote the rows under ‘Nitrogen dose’ as a numerical value by only signifying the magnitude of the dose in the metadata file. If I wish to make it appear as a categorical value, I simply attach “ppm” at the end of the magnitude value.

The macro-question would be:

Can I make changes in the metadata file during my analyses, where certain scripts/plugins take input the metadata in a format which is different from that taken input by previous scripts/plugins?

Thanks a lot for your help. :slight_smile:
Best regards,

Hi @anirban.mcgill,

Let’s start with the high level question

The short answer is you, you can change the metadata between one step and the next. Currently, QIIME doesn’t track the metadata for you, beyond the column/covariate used. In general, it’s good practice to make sure that you know how the new columns are derived or where they come from. (I say this from spending far too many hours trying to figure out where “disease2”.

I think the validity of this approach depends on the nature of the data and the thing you’re trying to answer. In your case in particular, I’m assuming that nitrogen was a continuous measurement between some range and you have unique or close to unique values for each sample. In this case, the approach of adding a string to “cast” the numeric data to categorical will probably not give you the answer you’re looking for, and you probably want to find ways to split your data into categories. You might find this article about quantiles helpful but a statistician or domain expert will be a better position to help you transform your data. (I also recommend keepeing to columns, one with the raw values and one with the transformed values so you can do the analyses side by side.)

Re-coding the metadata won’t change most of the base statistical analyses, but you will probably want ot re-calculate any of your qzvs that require metadata. Typically, changing your metadata doesn’t require changing the base objects (tree, taxonomy, table, distance matrices) unless it changes the samples or sample IDs)


A post was split to a new topic: Why are my beta diversity results different from the tutorial?

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.