After finishing up my first attempt at using the hybrid vsearch/sklearn feature classifier I was surprised to find that every single sequence was classified using sklearn (~2,600 unique sequences) - not single vsearch exact match!?
After looking at the code governing the classifier actions I think my confusion rests on how I had misinterpreted what “exact” might mean. I was hoping that if the vsearch alignment of my sequence feature was 100% identical to a reference sequence it would be retained, but that’s not quite right. It sounds like the reference and query must be identical not only in sequence composition, but also identical in length. For those of us using query sequences that are shorter than the reference sequences, I’m guessing this current approach isn’t what is desired.
Instead, I was hoping to modify this hybrid classifier so that the user could either (1) use the existing, faster, exact match approach, but, on the chance that this won’t work, then (2) it would be possible for a user to input the typical
--p-query-cov parameters of
classify-consensus-vsearch. Obviously the latter approach will be slower, but at least it’s still a hybrid classifier that lets a user work with a mixture of sequence lengths in a reference database. One benefit is that by including those two parameters, you can now not only perform exact alignments, but you can perform a hybrid classification using whatever alignment parameters you want - maybe you want 99% alignment over 98% query coverage, followed by LCA consensus, then a hybrid classifier to kick in?
There’s an existing complication, however. The current hybrid approach includes an optional pre-filter step that relies on those same very
--p-query-cov parameters I’d like to incorporate. If a modification along the lines of what I was thinking was made possible, it would likely either require removing the pre-filter step, or renaming one of redundant terms. I’d vote for keeping the pre-filtering option, but instead amend the parameters specific to the pre-filtering to be
Thanks @Nicholas_Bokulich for the new tool!