Training SILVA 18S feature classifiers and Taxonomic annotation

SoilRotifer · February 21, 2023, 5:31pm

I would recommend using the latest version of of the QIIME 2 formatted SILVA database (version 138) as provided on our Data resources page. The base sequence and taxonomy files, from which you can extract your amplicon region and use as input to train your own classifier are provided here. These file were generated using RESCRIPt.

Alternatively, you can download and curate your own version of the SILVA database, in any way you'd like using this tutorial as a guide.

This is generally explained in this part of the linked tutorial. Feel free to curate the database in a way that best suites your needs. Skip, reorder, or change, the various steps as needed.

Not all reference sequences within SILVA may have a region that matches your primer pairs. Or the primers may simply not match well enough. For more details, read this tutorial, which can also be used in conjunction with the base RESCRIPt SILVA tutorial.

Well you can run the following command to generate a taxonomy file that only contains the taxonomy of your representative sequences. You can run this twice, once on the full-length references, and another on the extracted amplicons:

qiime rescript filter-taxa \
    --i-taxonomy taxonomy.qza \
    --m-ids-to-keep-file rep-seqs.qza \
    --o-filtered-taxonomy rep-seqs-taxonomy.qza

Then you can tabulate (visualize) each of the outputs like so:

qiime metadata tabulate \
    --m-input-file rep-seqs-taxonomy.qza \
    --o-visualization rep-seqs-taxonomy.qzv

You can also then export the tsv file from the visualizers and view them in a spreadsheet program too.

There is no simple answer to this, sometimes an amplicon specific classifier is better, other times, not so much... However, you can use some of the RESCRIPt tools to evaluate the reference databases to one another and see which might be better overall.