Just a question regarding the “Assign taxonomy” function of qiime 2. I noticed that in another post, people were asking if we have to take the rep_sets and taxonomy_sets with the same identity (99,97,94,90, etc.), and the answer was yes. However, I am a bit confused in the perc-idenity function from QIIME 2. Isn’t this overruled by the findings from the 99 artifacts that I chose to run my taxonomy identification? As in, if I chose a perc-identity of 98, but I took the 99 files from Silva, what would be the point of the 98 that I selected?
the reason why the reference sequences and taxonomy must match is because their IDs must match… so you must use the paired files for this. If you use, e.g., the 97% rep seqs with the 99% taxonomy the IDs will not match and you will get an error. You can’t assign taxonomy if the reference sequences don’t have any associated taxonomy info…
You are comparing apples and oranges.
The SILVA OTU cluster %s really have little bearing on how the taxonomy classifier works. Those #s just indicate the % similarity threshold used for clustering the SILVA reference sequences. The purpose of doing that is to dereplicate that database, reducing it to a size that is manageable while still maintaining sufficient taxonomic information. The amount of clustering does not change how the taxonomy classifier fundamentally works, just the complexity of the reference sequences.
The perc-identity parameter on the other hand is altering the behavior of the taxonomic classifier. It is specifying the threshold of similarity required to consider a sequence a hit to the query.
So there is no need to use the same perc-identity to match the reference database OTU clustering %.