Evaluating covariates in Songbird

JoaoGabrielMoraes · October 12, 2020, 8:25pm

Hello,

I am new to Songbird. I am trying to use the standalone version of Songbird to perform a multivariate analyzis on our 16S sequencing data.

Following the tutorial (GitHub - biocore/songbird: Vanilla regression methods for microbiome differential abundance analysis), I first ran an null model using the command below:

##Null Model
songbird multinomial
--input-biom Collapsed-only-horns_no-miss-table-dada2.biom
--metadata-file POSTPARTUM_mapping_sample_key.txt
--formula "1"
--epochs 10000
--differential-prior 0.5
--summary-interval 1
--summary-dir Null_results

And got the following output (first page only):

Then, I ran two extra models. One full model (all covariates included) and another model removing only one covariate.

##Full model example
songbird multinomial
--input-biom Collapsed-only-horns_no-miss-table-dada2.biom
--metadata-file POSTPARTUM_mapping_sample_key.txt
--formula "OBS+Class+TRT+Cow_ID+Site+Cycling"
--epochs 10000
--differential-prior 0.5
--summary-interval 1
--summary-dir results

In the Cross validation graphs, the CV error is much higher than the one shown in the tutorial. Are my results for this first two models within the acceptable range? Any recommendations for improvement?

Based on the pictures above, it doesn't seem to be a big difference between "results_full_formula_10_12_2020" and the "results_no-TRT" (perhaps the latter slightly better).

I also ran a third model excluding an additional covariate. However, the results showing on TensorBoard included only the null model and the last one I ran (the two previous models were not available on Tensorboard). Is there a way to fix this so I can compare multiple models in Tensorboard?

Using Gneiss, it's very easy to check the effect of each covariate using the regression summary output. Is there a equivalente way in Songbird to evaluate the effect of individual covariates included in the model?

@mortonjt - If at all possible, I would really appreciate to hear your thoughts on this!

mortonjt · October 12, 2020, 9:27pm

Hi @JoaoGabrielMoraes, the good news is that you are using Songbird as it was designed to be used

Regarding the cross-validation error, it depends on the dataset. In the tutorial, we used a dataset that was highly predicted by the covariates (something like 70% of the variance is explained by 6 enivironmental factors). So that bar is quite high.

Looking at your CV error, it looks reasonable - your predictions are off by +/- 400 counts, which is expected for categorical predictors. Your error is also much higher in the null model than the other two models, which is a good sign. One thing that is curious though is the dip in the model with no treatment - dips in the CV error are often indicative of overfitting, you may need to play around with the prior.

I'm not sure why all of the models aren't showing in TB. Did you run everything through qiime2 or the standalone? There have been conflicts between qiime2 and TB in the past. But the qiime2 plugin will also record these summary statistics.

JoaoGabrielMoraes · October 12, 2020, 11:32pm

Hi @mortonjt,

Thank you for your prompt reply and feedback! I am glad to know that I am on the right track.

The input biom table was created on Qiime2, and then I used Songbird standalone for the analysis. As a side note, I do get the following error message when running TensorBoard:

W1012 14:44:26.655411 123145334247424 plugin_event_accumulator.py:323] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.

When there is more than 3 models (the null and 2 more) the cross validation graphs only present information for 2 runs (null and full model in this case). Please see pictures below. Is there a way of changing the settings to show all runs?

Also, what are you looking for when evaluating the qbeta plots?

Regarding the effect of each covariate in the model, is there a coefficient equivalent to the R2diff (used in Gneiss) to evaluate how each covariate affects the model?

Thanks again, Joao

mortonjt · October 13, 2020, 1:43am

RE the tensorboard runs - I honestly don't know without additional information about the file structure.
If you have all of the *event files, it should load.

There is no R2diff - but in the qiime2 plugin there is a Q2 score which serves the same purpose.

JoaoGabrielMoraes · October 14, 2020, 4:14pm

Thanks @mortonjt!

I do have all the result files in the TensorFlow folder. I do not know why I am getting this error. It does work when there is up to 3 models but I get that error when additional models are compared.

Thank you for pointing out the pseudo-Q2 score. I will also use it when comparing my models.

Thanks again, Joao

JoaoGabrielMoraes · October 14, 2020, 7:21pm

Hi @mortonjt

I have another question that you might be able to help.

When running qurro using the 'differentials.qza' output generated from my full model I get the following message:

863 feature(s) in the BIOM table were not present in the feature rankings.

These feature(s) have been removed from the visualization.

69 sample(s) in the sample metadata file were not present in the BIOM table.

These sample(s) have been removed from the visualization.

Thus, I decided to check the differentials.qza file to see how many taxa were present there. Not surprisingly, only 66 OTUs out of 929 OTUs from my collapsed feature-table were present.

My question to you is why the differentials are not being calculated for the majority of my OTUs?

Thank you, Joao

fedarko · October 15, 2020, 2:38am

Songbird does some filtering on the table before running the regression in order to remove "rare" features and samples. Please see the FAQs for details.

By default, right now samples with < 1,001 counts—and features present in < 11 samples—will be filtered out, although these thresholds are configurable.

Note that these thresholds should be < 1,000 counts and < 10 samples -- we just fixed a bug where the filtering code was off by one. This should be updated in a new version of Songbird soon. If you have a lot of features present in exactly 10 samples, you may want to try installing the latest version of Songbird's code from GitHub (something like pip install git+https://github.com/biocore/songbird.git should work) in order to have the filtering work correctly.

JoaoGabrielMoraes · October 15, 2020, 3:32pm

Hi @fedarko,

I will adjust these parameters and will re-run Songbird.

Thank you, Joao

system · November 15, 2020, 9:32pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.