SILVA 138.1 classifier trained on version 2023.5 and used in 2024.5

I am using qiime2-2024.5 version for my analysis containing V3-V4 region of 16S rRNA. i have generated taxonomic profile of my data in two ways:

  1. using a classifier for V3-V4 region which was previously trained using 2023.5 version. I have used 2023.5 version only at 'qiime feature-classifier classify-sklearn' step. I am not getting any file compatibility issue. Is it right to analyse the data on 2024.5 version while the taxonomy is assigned using 2023.5 version? Is it justiable in paper?
  2. To avoid the above problem i tried training my classifier again using 2024.5 version using following commands:
    qiime rescript get-silva-data
    --p-version '138.1'
    --p-target 'SSURef_NR99'
    --o-silva-sequences silva-138.1-ssu-nr99-rna-seqs.qza
    --o-silva-taxonomy silva-138.1-ssu-nr99-tax.qza

qiime rescript reverse-transcribe
--i-rna-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-dna-sequences silva-138.1-ssu-nr99-seqs.qza

qiime feature-classifier extract-reads
--i-sequences silva-138.1-ssu-nr99-seqs.qza
--p-f-primer CTACGGGNGGCWGCAG
--p-r-primer GGACTACNNGGGTATCTAAT
--o-reads silva-138.1_V3-V4_16S-seqs.qza
--verbose

qiime rescript dereplicate
--i-sequences silva-138.1_V3-V4_16S-seqs.qza
--i-taxa silva-138.1-ssu-nr99-tax.qza
--p-mode 'uniq'
--o-dereplicated-sequences silva-138.1_V3-V4_16S-derep-seqs.qza
--o-dereplicated-taxa silva-138.1-_V3-V4_16S-derep-tax.qza

qiime metadata tabulate --m-input-file silva-138.1-_V3-V4_16S-derep-tax.qza --o-visualization taxonomy_summary.qzv
qiime metadata tabulate --m-input-file silva-138.1_V3-V4_16S-derep-seqs.qza --o-visualization seqs_summary.qzv

########The sequences and taxonomy here showed d__Eukaryota members, therefore i removed them from the tsv files using python code and using the cleaned sequences nad taxonomy files, i trained classifier on 2024.5 version of qiime2 using following commands:

qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads qiime_sequences.qza --i-reference-taxonomy qiime_taxonomy.qza --o-classifier silva-138.1-16S-nr99-V3_V4-classifier-HS.qza

i got the classifier using above command but when i am using :

"qiime rescript evaluate-fit-classifier --i-sequences qiime_sequences.qza --i-taxonomy qiime_taxonomy.qza --o-classifier silva-138.1-evaluation-16S-nr99-V3_V4-classifier-HS.qza --o-observed-taxonomy silva-138.1-predicted-V3_V4-taxonomy-HS.qza --o-evaluation silva-138.1-V3_V4-taxonomy-fit-classifier-evaluation-HS.qza" ,

it shows following error: 'Plugin error from rescript: Passing a set as an indexer is not supported. Use a list instead.'

The classifier that i trained using 'qiime feature-classifier fit-classifier-naive-bayes' on qiime2-2024.5 is not assigning the same set of sequences to family or genus level while my other classifier which was trained on 2023.5 version is successfully assigning the sequences upto genus level. both the classifiers are trained using SILVA 138.1 datasets.

what is the potential cause of this problem and How can i resolve it?

Also, if i proceed with my taxonomy classified via 2023.5 classifier the no. of features in around 800 while with newly trained classifier using 2024.5 the feature count is around 400 only. Please help me, I am really stuck with this!!!!!!!!!!

Hi @Harshita_Sharma1,

Let's see if we can figure this out...

I assume you mean that you downloaded a pre-curated set of files from the QIIME 2 website?

If you are using SILVA 138.1, then you should be fine. Nothing should have changed within the SILVA database files between QIIME2 versions.

There is nothing wrong with the SILVA processing commands you used. However, there are some other steps that were carried out on the premade QIIME 2 files. These differences could affect how things are classified. I think all you need to do is run cull-seqs on your extracted reads to clean things up a bit.

This is fine and intended. I strongly suggest that you leave these reference sequences within your database! Many 16S rRNA sequences can amplify eukaryotic 16S rRNA sequences. It is best to be able to identify them, if they exist, so that you can remove them. If not, you run the risk of erroneously classifying them as Bacteria, when they in fact are eukaryotes. Reference sequence / taxonomy classifiers should always contain outgroup taxa.

Again, it is recommended that you do not do this. Also, there is no need to write your own scripts. You can simply run qiime taxa filter-seqs ...

It looks like you are running an older version of 2024.5 which had a bug. I'd re-install this version... or better yet, simply install 2025.4.

Again, I'd install the latest version of QIIME 2 to make sure there are no other environment errors. Also, can you share the QZV of the assigned taxonomy for both runs? I'd like to look at the provenance.

Also, The later version of RESCRIPt now use SILVA v138.2 by default. Nothing is much different other than updated taxonomy labels.

I am not sure what you are trying to say here. Are you referring to classified features in your data, or reference features for your classifier?

-Cheers!

2 Likes

Thank you so much for your reply. I am trying my whole analysis again with 2025.4 version.

1 Like