Hi @nara,
Welcome to the QIIME 2 Forum!
This is a bit of a different direction than you're describing here, but a data set that you could consider would be the full gut-to-soil dataset, which we have archived under CC-BY license (meaning you can use it for whatever you want - you just need to cite the original paper) in Zenodo here. Probably the best way to learn about the data and study is via the gut-to-soil tutorial and the links therein. While this data isn't specifically focused on human gut health or food microbiology, my pitch is that it's related to both: the composting material can support the growth of high quality fiber-rich food (e.g., in areas where it is challenging to grow high quality foods due to poor soil quality), and that food in turn can support human gut health. In other words, the data can be used to understand how a material that we are used to thinking of as waste can be cycled to support human health, in the process supporting sanitary management of human excrement and environmental sustainability (e.g., through reduced reliance on fertilizers).
One problem, off the top of my head, that could be interesting for a data science PhD is how to align the samples in a timeseries based on characteristics of the samples (e.g., their phylogenetic composition) as opposed to strictly based on the time when the sample was collected. This could be relevant for studies such as this, where we're trying to understand a microbially driven process through replicated time series data, and the process that we're studying might take more or less time in different replicates. Here the process is composting, but the process could alternatively be fermentation, development of plant-supporting microbial communities in soil, etc, and this would make it easier to get insight into "microbial phases" that occur throughout the process, or to quantify or rank different rates at which the process occurs in different replicates.
For example, in this data you might align the 15 replicates (buckets) based on their phlyogenetic composition, and then try to layer on when E. coli begins to disappear based on the paired culturing and qPCR data. That could tell you whether there are certain patterns in the composition of the samples when this important step (an indicator of the safety of the material) happens, and whether that pattern is disrupted or missing when E. coli doesn't disappear. This could inform optimization of the process.
We are in the early stages of sequencing metagenomes and metatranscriptomes from a follow-up thermophilic HEC study here, and within about 12-18 months I expect that all of those data will be public as well. That data set will have lots of multi-omics data integration challenges which we won't solve in our initial publications, and questions related to quantifying and exploring functional dark matter (ie., active genes of unknown function) throughout the process. At this stage we have collected all of the samples and are just about to start the sequencing, so the caveat here is that the data doesn't exist yet - so is probably a little risky to plan a PhD around. Depending on how early you are in your PhD though, if you're interested in the system and the problems this could be something to look forward to.
Hope this helps a little, good luck!
Update: I edited this a little after posting to describe a possible data science project - the initial idea would have been more of a biology project. Note that pre-existing work probably exists in this area - as always, it's good to start a PhD project with a thorough literature review.