I'm analysin my data using Silva reference database, and I'm just wondering... When I open taxonomy and rep_set files of 94 and 97 OTU similarity and compare some taxa, I cannot find differences.. So, anyone knows how these files are constructed? What these sequences inside these files represent?
Hi @Dzana_Basic,
These files contain the representative sequences (cluster centroids) for SILVA reference sequences clustered and different % identity thresholds (94% and 97%).
So these files should contain different sequence IDs and sequence counts. Many of the same sequence centroids may be present, however, so that may explain the redundancies you have found.
Does this explain the similarities that you see, or are the files literally replicates of each other?