Hello,
Thank you for the addition of ancombc to QIIME2.
I have used the example data provided with the plugin (q2 composition ancombc --example-data PATH) in order to understand the ancombc results, that are available in a different layout compared to ancom(1).
There are a few questions left:
The example data (table.qza and metadata.tsv) do not match perfectly (in the single and multiple examples). Is it true that the 'toe' samples are not in the feature table?
In the multiple formula example, the intercept is definded from --p-reference-levels bodysite::tongue animal::dog and these groups are given in the lfc visualization of the differentials artifact (Groups used to define the intercept:). So, I assume that the lfc of the other groups is in relation to the reference levels.
In a single formula example, the --p-reference-levels parameter is not necessary (according to the example instructions), but it seems that the intercept is calculated from 'gut'. This sample group is the first one in the metadata.tsv: is this defining the sample group used for the intercept?
I have tried to use --p-reference-levels bodysite::tongue for the single formula ancombc and observed tow effects: First, this parameter is accepted and bodysitetongue is no longer in the lfc table visualization. Second, the lfc intercept statment looks like Groups used to define the intercept: b, o, d, y, s, i, t, e, :, :, t, o, n, g, u, e ; is this just a visualization bug?
@lizgehret : I have posted a question related to q2 composition da-barplot in another post error using qiime composition da_barplot. Could you look at this error message, and can you see whether this is related to the plugin or to my Linux system?
Thanks for reaching out! Happy to address these questions below:
Yes that is true - those samples are present in the metadata, but not in the table. This metadata file is used for usage examples as well as internal unit testing, so there is some additional data present that's used to validate expected behavior in ancombc.
That's correct. The --p-reference-levels parameter is used to define the intercept.
Correct - the --p-reference-levels parameter is optional for defining the intercept. If it is left blank, the default behavior will set the intercept to the column::value group based on alphabetical order (this is the default behavior from the R method). In the single formula example, gut is selected since it is highest in alphabetical order within the bodysite column.
Since you chose bodysite::tongue, this group is used to define the intercept - hence it not showing up in the visualization. In terms of the string splitting you're observing, can you copy/paste your entire command to produce your .qzv file in your response, share that qzv file, and share a screenshot of where you're seeing that in QIIME 2 view? That will help me to better identify where this issue is coming from. Thanks!
Thanks for sharing those details! You'll want to wrap the --p-reference-levels value in quotes so that string splitting doesn't occur, like so: 'bodysite::tongue'
I did not develop that visualization, but I will look into the issue you mentioned in your other post - I haven't had time to do so yet, but have time set aside this morning for that. I'll follow up with you in that post with next steps!
Thanks for following up - I was able to replicate this on my end with a single value in the --p-reference-levels parameter. This looks like it is a small bug in the tabulate visualizer, thanks for bringing this to our attention! I'll work on getting this fixed in our current development cycle, so this should be fixed in our 2023.5 release. Thanks for your patience!