Hello all!
Firstly, I'd like to thank all the people who put out so much effort in making this forum a really interactive and informative one. I started a few days ago w/o any knowledge in running Linux commands and more so utilizing qiime2 but now, I've finally smoothen out my qiime2 pipeline and able to perform sample processing and analysis.
For my ultimate question, I've read about sampling depth has no universal consensus as to how we select them and I've read about a couple of posts here about similar question and I (kind of) understand that on the process, it's a trade-off on whether you want to include more features but drop few or more samples from the analysis and vice versa.
The sample with the lowest number of features but is close to the maximum is what I guess everyone dreams about. Unfortunately, for my case, it's almost exactly the opposite. My samples are bird gut samples collected in time series (supposedly 5 birds each time point, although few birds have no samples/bird subjects have died prior collection so each time point has varying number of bird samples leading to uneven number of samples each time point). Sequenced 16S v4 amplicons and microbiome was profiled.
So I did my summary.qzv and saw the distribution of frequency in each of my sample. The highest number of frequency I have is around 124k (other samples play around from 100k to 120k) and the lowest is 60k (top 3 lowest are 60k, 89k, and 90k which, ironically, came from the same time point.
As much as possible I wanna retain them all but I've read some cases where they drop samples that's way kinda off the rest of the samples.
So there's two possible scenarios:
A) If I keep the sample with 60k, I'd retain 55% features but covers 100% of the samples (the samples for this time point will be n=5)
or;
B) If I drop the sample with 60k, and start the sampling depth with 89k instead, I'd retain 76% of the features but one of the samples will be dropped from downstream analyses (so my samples from one time point will be n=4 instead of 5, this will also make samples for each time point uneven).
I've never done this analysis before this intensive so this is actually my first time dealing with this dilemma. I'm kinda leaning towards Scenario B but I'd like to hear your opinions based on your experiences dealing with same dilemma?
Thanks a lot and regards!