Generating and analyzing mock communities

Hi @micro_guy ,

Great questions.

You are starting the in silico analysis at the wrong point:

If you are trying to compare the observed vs. theoretical composition (as is the goal of evaluate-composition), then you should start with a feature table of the expected taxonomic composition (based on the taxonomic labels given in the chosen reference database that you are using for classification of your true observed sequences). To build this table you should take the list of species in your mock community (Zymo standard), find how these species are annotated in your reference database (SILVA 138), and create a table of their relative abundances in the mock community (example).

This tutorial is a bit old, but shows what this process should look like (though it does not show how the expected composition table should be constructed):

This is a more direct way to get from point A to point B. Synthesizing a FASTQ, only to dereplicate etc adds a few unnecessary steps, which might also somehow distort the actual expected abundances.

Good luck!

2 Likes