I am attempting to create a classifier using my dataset and am planning on using 13_8 from the data resources. I found another qiime forum question that said i could just use the rep_set 99_otus.fasta but I am confused as to why there are 61_otus.fasta all the way to 99_otus.fasta and why I would choose one over the other? Does the 99 stand for 99% of the OTU data is contained in that file?

No — 99 stands for 99% OTUs, i.e., the reference sequences are clustered into OTUs that share ≥ 99% similarity with one another.

Hence, we recommend the 99% OTUs because they have the most specificity. Using a lower clustering threshold results in lower specificity because a certain amount of sequence information is lost, which may include specific sequences that are good matches to your query sequences.

So, e.g., the 61% OTUs probably contain very few sequences, which probably have very limited similarity to your query sequences, and limited taxonomic resolving power since these are clustered across large swathes of reference sequences.

I would not recommend using anything lower than 97% similarity. 99% will contain the most sequence information.

I hope that helps!

That helped! Thank you so much!

