Can someone help me understand the "consensus" column in my taxonomy file after running classify-consensus-vsearch or point me in the right direction?
For example, one ASV has this output with a consensus of 0.556: d__Eukaryota; p__Annelida; c__Polychaeta; o__Phyllodocida; f__Phyllodocida; g__Phyllodocida
Is this similar to NCBI BLAST "percent match", where there is only a 55.6% match to the sequence? If so, is there any kind of quality control I should be doing to filter out low matches?
Hi @areaume,
The consensus is the fraction of assignments must match top hit to be accepted as consensus assignment. So for your example 55.6% of the assignments for your sequence agreed that it is d__Eukaryota; p__Annelida; c__Polychaeta; o__Phyllodocida; f__Phyllodocida; g__Phyllodocida.
The default min value for this is 51% but if you want a raise that you could use the --p-min-consensus parameter to the raise the min value.
Thanks @cherman2! This helps clear things up a lot.
I also have a followup question- do you know why order, family, and genus are all "Phyllodocida"? It is my understanding that this is an order classification. I have a few other assignments like this as well.
Hi @areaume,
That is decided by the database you are querying. In your case, you are using the Silva database.
For more information on specific labels, I would look into Silva taxonomy.