Sorry in advance for the probably dumb question.
Looking at this thread Statistical analysis - looking for core microbiom, it is said that qiime feature-table core-features can be used to look for features that are shared between groups within some percentage. Should I first filter the table for only the 2 groups I want to compare (in term of core microbiome) and then use the core-features function? Also, if the 2 groups have the same number of samples, I guess I need to set a core feature threshold of say about 75% to be sure that those features are really share between the 2 groups? Or maybe even higher (90%?), is there any gold standard for this?
Not a dumb question at all and all questions are welcome!
If you are only interested in determining the shared features between 2 groups then you should, as you suggested, filter your feature table to only retain those 2 groups since the core-microbiome action looks through all the samples and is not contingent on group classification.
As far as I'm aware of there is no 'consensus' as what percentage needs to be shared to be identified as part of a core. This is rather subjective depending on your field, samples, and the question being asked.
However, just so you know core-features will actually give you a range of these thresholds. For example you set a min and max fraction , lets say .5 (50%) and 1 (100%) and set the number of steps between those two ranges using --p-steps and you can actually see the core microbiomes all the way from 50-100% and make a choice as to what you want to consider the 'core'. That way you don't have to rerun the command several times.
Hi @Mehrbod_Estaki, thank you for your clarification!
Regarding the first question, I am doubtful that, if I filter my table for the 2 groups I want to include to calculate the core microbiome, and those 2 groups have the same number of samples, if a certain feature is present in 100% of the samples of group 1 and in 0% of the samples from group 2, then with a core feature threshold of 50% that feature would be considered as part of the shared core microbiome between those 2 groups, even if is present in only one of the two. Hope I managed to explain it clearly…
Hi @alfanon,
That is correct, technically a threshold of 50% in that situation could include a feature that is only found in one group, thus 50% is probably not sufficient for your goal, you should use higher values. If your groups are so different from each other that they don’t share a lot of taxa between each other then I would re-consider the usefulness of trying to detect a core microbiome between them. Determining a core is more useful with regards to similar samples/groups.