Longitudinal feature-volatility: same data, different results each time

JessJarett · December 19, 2018, 4:24am

Hi all,

I have some questions about feature-volatility and its requirements and limitations. I ended up re-running a dataset several times and noticed that I don’t get the same features in the same order, the same number of features, or the same numeric values for feature importance, every time that I run the same input data. Assuming that this is not the expected behavior of feature-volatility, I suspect this is because my dataset is too small, but I would like to know more about this issue so I can make a better design next time, and know when I should and shouldn’t use feature-volatility.

My current dataset consists of samples from 30 animals collected over 3 time points, and the animals comprise 4 treatment groups (2 groups of 8, 2 groups of 7), a total of 90 samples. Is this a totally insufficient number of samples/timepoints for feature-volatility? Is the key limitation here the total number of animals, the number of time points, or possibly the magnitude of differences in taxa that change between time points? If I have a small number of samples and timepoints, is it better to use —p-parameter-tuning or turn it off, or does it not matter much?

I’ve attached 3 examples of the same data run with the same command for reference. I’ve also tried ANCOM on each of my 3 time points (see Best approaches/model for longitudinal differences in taxa abundance?) but I don’t get any taxa that are significantly different between my treatment groups with that method.

Any advice or suggestions would be great. Thanks!

The command I used to generate these results was:

qiime longitudinal feature-volatility --i-table feature-table-genus-collapse.qza --m-metadata-file metadata_for_q2_anon.txt --p-state-column Day --p-individual-id-column Animal_ID --p-parameter-tuning --verbose --output-dir volatility-genus

volatility_plot1.qzv (499.6 KB)
volatility_plot2.qzv (502.5 KB)
volatility_plot3.qzv (416.4 KB)

Nicholas_Bokulich · December 19, 2018, 4:34am

This behavior is expected because there are several random processes in this action. You can use the --p-random-state parameter to make this consistent. The fact that you see wide variation between runs indicates that sample size is probably smaller than you need and/or the temporal signal is not very strong.

It does not make a big difference.

Check out Blautia — that's the only feature I see with a somewhat clear difference between Protein groups. Could be worth testing with linear-mixed-effects or pairwise-differences.

Some other test options: use ANCOM directly in R, which will allow you do test a multi-way model, e.g., with Protein, time, and animal as factors. q2-gneiss could also be useful.

JessJarett · December 21, 2018, 8:28pm

Thanks for your help with this! I am trying out the --p-random-state parameter along with more trees to see if I can at least get the same 10-20 taxa every time I run it, even if they're in a slightly different order, which will help me believe the results more.

system · January 22, 2019, 2:28am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.