Silva 132 Database, which taxonomy & reference sequence files to select for classifier training?

Hi all!

I was hoping you guys could help me out. This is my first time working with 16S samples. I am trying to train a classifier using the SILVA 132 database, therefore I need a reference sequence file and a taxonomy file. However, when I look in the SILVA_132_QIIME_release file, I see that there are many options that I could select.

If I look in the taxonomy/16S_only directory, there are directories which, what I assume, correspond to the percentages the database OTUs are clustered at. Then within those directories (let’s look at the 99 directory as an example), there are 7 different taxonomy files. My question is how would I know which taxonomy file to use?

I have a similar issue when trying to choose a reference sequence file. Looking in the SILVA_132_QIIME_release, based on what I am seeing, there are two directories to choose from: rep_set and rep_set_aligned. How would I know which directory to look into to ensure I am using the proper reference sequence file?

Last question: how would I know which percentage clustering to select? I would assume 99% clustering would give the most accurate identification, what reason would I not use the files in the respective 99 directories?

I am running QIIME2-2020.11 which was installed using conda.

Essentially, I am looking for what file to use to fill in the bolded code here:

qiime tools import
–type ‘FeatureData[Sequence]’
–input-path INSERT_REF_SEQ_FILE.fasta
–output-path ref_seq.qza

qiime tools import
–type ‘FeatureData[Taxonomy]’
–input-format HeaderlessTSVTaxonomyFormat
–input-path INSERT_TAXONOMY_FILE.txt
–output-path ref-taxonomy.qza

Thank you in advance for your support! :slight_smile:

Hi @DannyBoi97,

If you’d like to make use of SILVA 132 specifically, I think you’ll find everything you need within the RESCRIPt tutorial.

Otherwise you can download pre-made SILVA 138 classifiers, and the input files used to make them on the Data resources page.

-Good luck!
-Mike

1 Like

Hi Mike! Thank you for your quick reply :slight_smile:

I read somewhere that using pre-made classifiers can pose a security risk. Is this still an issue?

Dan

As long as you download the classifiers from a trusted source you should be fine. In fact, even if you made the classifiers yourself and share them with some one else, then the security risk is still there… that person would have to trust you. :wink:

But as I said, the input files used to make the classifiers are on the data resource page too, just scroll down to here, and train the classifier yourself. :train:

1 Like

I hear ya! Would downloading the classifier from your specified location be deemed as safe?

Well, these are made by the :qiime2: team. I use these classifiers myself. :slight_smile:

2 Likes

Gotcha :grin: Can’t be too careful I guess. Thanks so much for your help Mike!

1 Like