I have eukaryotic data (18S, V3-V4) and used Silva database (silva-138-99-nb-classifier). When I compared sequences with NCBI I realized that it’s not unified: sequences were assigned to different species according to database and NCBI. I tried PR2, but it wasn’t much better.
and got a problem that sequences enlisted in annotated_rep-set file don’t match NCBI results, when copied to the browser, and the seventh level of taxonomy is missing.
I tried to built my own reference database from NCBI, I downloaded sequences from desired region and groups, concatenated to the one .fna file and.. got stuck. Maybe someone has an experience in that?
By default this plugin will only return domain, phylum, class, order , family, genus. You can return any other ranks from SILVA by specifying them via the --p-ranks flag. This will not work for species as SILVA does not curate species taxonomy. We've added an option to append the organism name as the species label, but these can be unreliable (as outlined in the tutorial you linked). If you would really like to use these labels then you'll need to enable the --include-species-labels flag for parsing the SILVA taxonomy. Note the warnings we provide in the tutorial!
Also, different databases curate taxonomy differently, so expect some differences.
Can you share the classification output files (QZVs)? It is difficult to help without looking at the data. What taxonomy are you expecting?
Thank you for sharing these. Nothing appears out of the ordinary. It looks like you are detecting everything but Entamoeba. I spot checked a few sequences via BLAST and they appear to match the classifier taxonomy. Can you please provide explicit examples where the taxonomy is different / problematic?