Difficult to say.
Is there a reason why you link to the fasta file from the SILVA web site? The name you provide does not match the file name in the link. Was this intended?
I guess I'd need more details on how the reference data is handled and/or further processed within R prior to classification. I am unfamiliar with how this tool and its commands (e.g. makeTaxonomyFasta_SilvaNR
) works. So, I'd suggest classifying through QIIME 2 for a comparison and sanity-check.
I assume you followed all of the "Make amplicon-region specific classifier" parts of the RESCRIPt tutorial? That is, the sequence and taxonomy dereplication steps, prior to importing them into R? Just asking to make sure I understand all the steps you've taken.
Check out this thread: training classifiers: performance of full-length vs. extract-reads for some additional insights.
Do you have anything to add, @Nicholas_Bokulich ?
-Mike