Rescript Plugin for 18S

I have been using the Rescript plugin for classifer training using the Silva Reference Database to generate a taxa barplot for 18S data (following the thread here: [Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt]); however, I am still getting Bacterial sequence matches for some samples. Could this be a problem with the classifier training, or do I need to filter my results further to include only Eukaryotic results, or could this be due to low data quality? I have included only the taxonomic filtering script block below where I filtered to only include Eukaryota, is this the correct method for taxonomic restriction for classify 18S data?

qiime rescript filter-seqs-length-by-taxon
–i-sequences silva-138-ssu-nr99-seqs-cleaned.qza
–i-taxonomy silva-138-ssu-nr99-tax.qza
–p-labels Eukaryota
–p-min-lens 1400
–o-filtered-seqs silva-138-ssu-nr99-seqs-filt.qza
–o-discarded-seqs silva-138-ssu-nr99-seqs-discard.qza

Hi @lmjackson, welcome to :qiime2:!

Can you provide details on which primer set you used to generate your sequence data? Many primer sets will amplify off-target sequences. That is just the nature of the process. How much of your sequence data is being returned as Bacterial? Few? Many?

I would advise against removing removing groups of taxa just because your are unable to identify what you want. In fact, it is best to leave these "outgroup" taxa within your database. This way you can more confidently remove off-target sequence data. Otherwise, you may falsely classify many sequences as simply Eukaryotes, when in fact they are not.

What sequencing protocol was used? That is, is your sequence data in mixed orientation? This can fool you into thinking your data are composed of bacterial sequences. See the following forum threads:


Hi @SoilRotifer,

Thank you for your quick response!

I am using the E572F and E1009R primer set to generate the sequence data, and one entire sample was returned as 100% bacterial. These samples were sent to me by a colleague, but I am guessing that they could be of mixed orientation. After reading the forum threads you suggested, I will try using the orient-seqs method and re-running the classifier to see if this helps.

Also, thanks for the helpful suggestion to not filter out specific taxonomic groups because of the potential for having false classifications, I will be sure to retain these outgroup taxa within my database in the future.