Open Mock-communities for 16S biome analysis


I have different taxonomies from different databases (Silva, GTDB etc) and I want to check quality of their work. I want to take several mock-communities with predefined percentage of species, run qiime2 with different DB and look at results

So first question, could you advise links on such open resourses with 16S biome sequences? To specify, I need V4 region and human gut bacteria

Second question. I already took one from Zymbiomics resource. So I have one fastQ file with 16S sequences. I need then to extract V4 region from it. I know how to do the same job with fastA file, but what about fastQ ? Is it possible in general? And is it possible to do it without manifest file creation ( because I need transform one FASTQ to one QZA there is no needs in manifest )

Thank you much for your attention.


there is a nice database collecting mock-communities on amplicon data,

All the data are at:

Hope it helps!


Looks fantastically cool ! Exactly what I need according to description. But I couldn't find raw fastq files there. Could you help me?

shame on me but can not find the fastqs as well ...
let see if @Nicholas_Bokulich can help us!


1 Like

Links are given for each dataset in the dataset-metadata.tsv files for each dataset.

All raw fastq data are hosted externally and cannot be stored in that repository (much too large for GitHub).


@Nicholas_Bokulich very helpful, thank you!

The only question is that seems "" script needed to correctly preprocess fastq files.

As I understood this script is from qiime1 environment. Is there some analogue method to do same thing with qiime2 environment? Or the best way will be to install and use qiime1 legacy?

Hi @biojack ,

QIIME 1 is by no means needed here... that repository is just that old, and there has been no time to update the usage instructions to demonstrate how to import and process with QIIME 2 (contributions welcome :wink: ).

the importing and demultiplexing instructions in the "moving pictures" tutorial should be the most appropriate for demultiplexed data (but this may differ by dataset as some might already be demultiplexed).


That's just a notice for future searchers of mock-communities

I looked at one mock in mockrobiota resource (mock-1 to be accurate) and just need to note that though there is V4 in metadata there are all DNA sequences with lengths ~ 100 in FastQ (instead of expected ~ 250-300 lengths). Looks like it might be insufficient for subsequence qiime2 analysis with good precision.

the mock communities are basically numbered according to age (1 being the oldest). That specific community is more than 11 years old, using 100nt reads that were the latest technology at the time...

Not useful for species-level analysis for sure, but we have used this in a number of times for at least genus-level (as is quite typical for V4).

You should check out the higher numbers in mockrobiota for newer mock communities (still at least 6 years old) that have longer read lengths, if that is what you are after.

1 Like

@Nicholas_Bokulich your notice was extremely helpful, thanks!

And I also should make another notice for future investigators. I checked forward reads of mocks in range 12-23, they are V4 regions (except mock-17). Mocks 13-15 have IMO a "so-so" quality profiles so I would recommend use 16-23 mocks (except 17) for analysis + maybe mock-12. They look more or less fine. Also I think reads should be trimmed around 10 nucleotide from 5' and truncated to length ~ 225.

1 Like

@Nicholas_Bokulich, Hi!

Just want to "check the clock" if you don't mind. After running and analyzing mock-23 on mocrobiota resourse I got acceptable results on L6 (genera) level (90% of recognition reproducable for different taxes). But for L7 (species) level I got only 30% of recognition. Do you remember if it corresponding to your results on this mock? Or maybe you could suggest if that result is expected. In fact on L7 level I got a lot of placeholders (so in fact many clusters was classified on level above - on L6 level)

Under recognition I mean that species which expected ( presented in theorethical list ) also calculated in output of qiime pipeline. So just the fact of occurance without comparison of abundances. For example if there 20 species theoretically and if in output list there would be presented 10 species from that theoretical list then I say that recognition is 50%, nevermind how many false species else in qiime output.

1 Like

Hi @biojack ,
I recommend checking the 3 papers that I suggested to you here:

All of those the mockrobiota mock communities were used in at least the first paper, and we have done quite some benchmarks in there and the 2 others, also describing various metrics to quantitatively assessing accuracy of (a) methods and (b) databases.

90% genus-level detection sounds about right. 30% species-level sounds low, but it depends on the quality and length of the mock community data so this I cannot remember off-hand...

good luck!


It seems that there are no species level comparison in article.. Genus level is quite compatible but for species is hard to say. But certainly 30% of recognition is not the result that any would like to expect :grinning:

Are you looking at the right one? I just pulled this figure from the article:

and to quote the abstract:

...QIIME 2 meet or exceed the species-level accuracy...

The comparisons are all looking at class to species levels (but with a focus on species in most figures)

EDIT: I see that I also linked a 4th article in my previous post, the original RDP classifier article from 2007. That article indeed only looked at genus level. I was referring to the other 3 linked articles for species-level comparions and metrics.


I see.. It strange but somehow I missed that article. Probably most useful one.

I looked at Bray-Curtis values in mock-community 16S in your article and it looks like IQR range fit to my values since I got around 0.75 BC (totally crap one) on species level too

Also I noticed that you using also metrics that I described above (I called it recognition % and in article it is called as TDR ) but in graphs there are another metric TAR. Would be interesting to compare TDR too

From article I got that best way to improve results would be to upgrate to bespoke classifier. But

  1. if I will learn weight classifier for example on human gut samples - would it than be really effective on mocrobiota samples? Would it be effective on some other mock-communities, mixed from non-human, non-gut or rare strains ?

  2. is that possible to improve results just to "play around" with paremeters od naive bayess (change result confidence or smth else) ? At least to improve TDR

1 Like

Mock communities are by definition artificial, so are most useful for comparing unweighted methods (as natural weights are not a good representation, but this sort of misses the point of both weighting and mock communiities). You should see our other work on the topic (which uses simulated communities) if you are interested in seeing how we compare weighted vs. unweighted classifiers.

Yes. See the extensive parameter sweeps in that article.

We did. This might be shown in the supplement or in some other figures, along with some other metrics.

good luck!


@Nicholas_Bokulich thanks for advice! Will try to find additional resources you mentioned. I'm closing the topic