I have some questions about feature-volatility and its requirements and limitations. I ended up re-running a dataset several times and noticed that I don’t get the same features in the same order, the same number of features, or the same numeric values for feature importance, every time that I run the same input data. Assuming that this is not the expected behavior of feature-volatility, I suspect this is because my dataset is too small, but I would like to know more about this issue so I can make a better design next time, and know when I should and shouldn’t use feature-volatility.
My current dataset consists of samples from 30 animals collected over 3 time points, and the animals comprise 4 treatment groups (2 groups of 8, 2 groups of 7), a total of 90 samples. Is this a totally insufficient number of samples/timepoints for feature-volatility? Is the key limitation here the total number of animals, the number of time points, or possibly the magnitude of differences in taxa that change between time points? If I have a small number of samples and timepoints, is it better to use —p-parameter-tuning or turn it off, or does it not matter much?
I’ve attached 3 examples of the same data run with the same command for reference. I’ve also tried ANCOM on each of my 3 time points (see Best approaches/model for longitudinal differences in taxa abundance?) but I don’t get any taxa that are significantly different between my treatment groups with that method.
Any advice or suggestions would be great. Thanks!
The command I used to generate these results was:
qiime longitudinal feature-volatility --i-table feature-table-genus-collapse.qza --m-metadata-file metadata_for_q2_anon.txt --p-state-column Day --p-individual-id-column Animal_ID --p-parameter-tuning --verbose --output-dir volatility-genus