Hi,
I am working on a new filtering method to remove spurious taxa, and I want to validate my method by applying it on Mock community data. I am looking for a dataset for which I know how many true signal (True Taxa) I have versus noise (contaminant,…). Any help to get this dataset would be appreciated.
Welcome to the forum @Elle!
Sounds useful! I hope you develop a QIIME 2 plugin for your method too
I think I have just the thing. A few years ago (pre-QIIME 2!) I put together a database of mock communities: GitHub - caporaso-lab/mockrobiota: A public resource for microbiome bioinformatics benchmarking using artificially constructed (i.e., mock) communities.
Most of the ~30 communities in there were designed to look like gut samples (other compositions would be useful to add there if you have some to contribute!)... no mock vaginal communities unfortunately but maybe these could serve as a starting point for your tests? Expected compositions (with different databases) are in there.
I hope that's useful to you.
Thanks so much! Gut or vaginal either works. I just skimmed through it and it is very helpful. I look into it in more details and ask you any possible question I might have : blush:. I will go through your paper, it is interesting. Thank you for your help.