I’ve been researching the subject of assigning taxonomy -in terms of choice of database and method of classification- on Q2’s forums and according to papers and official websites cited within. I am still not convinced to use sklearn, as I don’t find it straightforward. Furthermore, compared to BLAST+, I find it more challenging to decide on specificities for sklearn - primarily due to inexperience using it. This is in spite of the good explanations of the confidence parameter and how to alter k-mer specifications for sklearn on the forums.
My question is, are there any resources available for SILVA or greengenes to use with BLAST consensus classifications on Q2 (i.e. separated reference sequences and associated taxonomy files)? I know there are pre-trained classifiers available for sklearn, but with such large databases, it is difficult to prepare reference database inputs for BLAST.