problem training taxonomic Naive Bayes classifier with SILVA database

Dear colleagues,

I am having trouble getting a 16S SILVA classifier in the QIIME 2024.4 environment. The goal is to obtain a classifier for the region defined by the primers B969F (ACGCGHNRAACCTTACC) and BA1406R (ACGGGCRGTGWGTRCAA) based on SILVA 138.1. It does not work: using a mock commuity with 8 known bacterial strains, I get a weard eukaryotic classification-. I am following the tutorial: Processing, filtering, and evaluating the SILVA database (and other reference sequence data) with RESCRIPt

To my surprise, after doing some debugging, I noticed that the classifier for the complete 16S region also doesn't work (so the issue seems not related with the specific region). I am pasting the code here - only the part get the full 16S classifier - . There is probably a mistake that I canĀ“t see.

By the way, the pretrained classifier downloaded from QIIME for the complete 16S region (https://data.qiime2.org/2024.2/common/silva-138-99-nb-classifier.qza), that I am actually trying to replicate here, does actually work (correct taxonomy based on the mock sample); could you possibly share the exact code used to produce it, so I can try to replicate it?

I have also noticed than that the QIIME website classifier is approximately twice as large (519,178 KB) as the one I get with this script (213,312 KB), so I suspect that at some step I might be losing perhaps most of the prokaryotes (?).

Thank you very much again,
Jose.

import and prepare sequences SILVA 138.1

qiime rescript get-silva-data
--p-version '138.1'
--p-target 'SSURef_NR99'
--o-silva-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-silva-taxonomy silva-138.1-ssu-nr99-tax.qza
--parallel
--verbose

qiime rescript reverse-transcribe
--i-rna-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-dna-sequences silva-138.1-ssu-nr99-seqs.qza

qiime rescript cull-seqs
--i-sequences silva-138.1-ssu-nr99-seqs.qza
--o-clean-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza

qiime rescript filter-seqs-length-by-taxon
--i-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza
--i-taxonomy silva-138.1-ssu-nr99-tax.qza
--p-labels Archaea Bacteria Eukaryota
--p-min-lens 900 1200 1400
--o-filtered-seqs silva-138.1-ssu-nr99-seqs-filt.qza
--o-discarded-seqs silva-138.1-ssu-nr99-seqs-discard.qza

qiime rescript dereplicate
--i-sequences silva-138.1-ssu-nr99-seqs-filt.qza
--i-taxa silva-138.1-ssu-nr99-tax.qza
--p-mode 'uniq'
--o-dereplicated-sequences silva-138.1-ssu-nr99-seqs-derep-uniq.qza
--o-dereplicated-taxa silva-138.1-ssu-nr99-tax-derep-uniq.qza
--p-threads 30

Train Full 16S classifier

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads silva-138.1-ssu-nr99-seqs-derep-uniq.qza
--i-reference-taxonomy silva-138.1-ssu-nr99-tax-derep-uniq.qza
--o-classifier silva-138.1-ssu-nr99-full16S-classifier.qza

An update that might help: I just tried using the pre-trained classifier available on the Qiime website in the newest version 2024.5 (silva-138-99-nb-classifier.qza, https://data.qiime2.org/classifiers/sklearn-1.4.2/silva/silva-138-99-nb-classifier.qza); it also doesn't work for my mock (all are eykaryotic assignments), and suspiciously it weighs almost exactly the same as the "wrong" classifier I generated with the previous pipeline in the version 2024.2 (213.131 KB):

conda activate qiime2-amplicon-2024.5

qiime feature-classifier classify-sklearn
--i-classifier ~/qiime2/analysis/databases/16S/qiime_2024.5/silva-138-99-nb-classifier.qza
--i-reads dada2/representative_sequences.qza
--o-classification taxo/taxonomy.qza
--p-n-jobs 30
--verbose

qiime taxa barplot --i-table dada2/table.qza
--i-taxonomy taxo/taxonomy.qza
--m-metadata-file metadata_debug.tsv
--o-visualization taxo/taxa_barplot_CHECK.qzv

Hi @ja.morillo ,
I just ran a quick test run with the 2024.5 pre-trained classifier and it appears to be working fine.

So I am not sure why you are getting some strange results. Two possibilities come to mind:

  1. try specifying the --p-read-orientation parameter with classify-sklearn. Particularly if your sequences are in mixed orientations this could cause issues with the classifier, or even if not some of your query sequences could be causing the orientation detector to misbehave.
  2. The only real difference between the 2024.2 and 2024.5 classifiers is the database version: SILVA 138 vs. 138.1. This should not lead to massively different results, but it's worth investigating further if my first idea does not lead to more promising results.

Please give that a try and let me know what you find.

Note that you can always inspect this directly in the provenance tab for any QZA or QZV file. Or use provenance replay to extract this information into a script that you can use to replicate.

Good luck!

First of all, thank you very much for the help! I have conducted some tests that I believe might be of interest to share on the forum. Firstly, I downloaded the classifiers again from the Qiime resources website and obtained the provenance using provenance replay (very useful!) for the two full-length SILVA pre-trained classifiers. Main points:

  1. I still don't understand why the classifier with the SILVA 138 database (q2.2024.2) is twice as large as the one with the SILVA 138.1 database (q2.2024.5). I would expect a larger size if the database has been expanded.
  2. The provenance of both classifiers shows quite a few changes in the code, but most notably, in the case of q2.2024.5, a step involving qiime rescript reverse-transcribe is included, which is NOT present in the provenance of q2.2024.2. Why?
  3. Regarding my test sample, everything works fine with q2.2024.2, but I get all eukaryotes with q2.2024.5 (i.e., the same error persists).
  4. I tried to train the classifier within q2.2024.5 by omitting the qiime rescript reverse-transcribe step, but without success.
  5. I tested all options of --p-read-orientation with classify-sklearn but without results.
    In summary, starting from exactly the same fastq files, everything is working with q2.2024.2 but not with q2.2024.5. My initial intention was to generate a classifier for the specific region of my sequences by updating Qiime, so I believe it is worth investigating this further.

Below are the scripts I used, and the provenance results for both pre-trained Qiime2 classifiers.

###################### TAXONOMY q2-2024.2 ######################

conda activate qiime2-amplicon-2024.2

cd ~/qiime2/analysis/Starling2021/debug_cutadapt_q2-2024.2

mkdir taxo

qiime feature-classifier classify-sklearn
--i-classifier ~/qiime2/analysis/databases/16S/qiime_2024.2/silva-138-99-nb-classifier.qza
--i-reads dada2/representative_sequences.qza
--o-classification taxo/taxonomy.qza
--p-n-jobs 10
--p-read-orientation auto
--verbose

qiime taxa barplot --i-table dada2/table.qza
--i-taxonomy taxo/taxonomy.qza
--m-metadata-file metadata_debug.tsv
--o-visualization taxo/taxa_barplot_CHECK.qzv

###################### TAXONOMY q2-2024.5 ######################

conda activate qiime2-amplicon-2024.5

cd ~/qiime2/analysis/Starling2021/debug_cutadapt_q2-2024.5

mkdir taxo

qiime feature-classifier classify-sklearn
--i-classifier ~/qiime2/analysis/databases/16S/qiime_2024.5/silva-138-99-nb-classifier.qza
--i-reads dada2/representative_sequences.qza
--o-classification taxo/taxonomy.qza
--p-n-jobs 10
--p-read-orientation auto
--verbose

qiime taxa barplot --i-table dada2/table.qza
--i-taxonomy taxo/taxonomy.qza
--m-metadata-file metadata_debug.tsv
--o-visualization taxo/taxa_barplot_CHECK.qzv

Provenance of both classifiers:

###################### PROVENANCE q2-2024.2 ######################

conda activate qiime2-amplicon-2024.2

cd ~/qiime2/analysis/databases/16S/qiime_2024.2

wget https://data.qiime2.org/2024.2/common/silva-138-99-nb-classifier.qza

qiime tools replay-provenance
--in-fp silva-138-99-nb-classifier.qza
--out-fp silva-138-99-nb-classifier_provenance.txt
--verbose

cat silva-138-99-nb-classifier_provenance.txt

[only relevant part]

qiime rescript get-silva-data
--p-version 138
--p-target SSURef_NR99
--p-include-species-labels
--p-rank-propagation
--p-download-sequences
--o-silva-sequences silva-sequences-0.qza
--o-silva-taxonomy silva-taxonomy-0.qza

qiime rescript cull-seqs
--i-sequences silva-sequences-0.qza
--p-num-degenerates 5
--p-homopolymer-length 8
--p-n-jobs 1
--o-clean-sequences clean-sequences-0.qza

qiime rescript filter-seqs-length-by-taxon
--i-sequences clean-sequences-0.qza
--i-taxonomy silva-taxonomy-0.qza
--p-labels Archaea Bacteria Eukaryota
--p-min-lens 900 1200 1400
--o-filtered-seqs filtered-seqs-0.qza
--o-discarded-seqs XX_discarded_seqs

qiime rescript dereplicate
--i-sequences filtered-seqs-0.qza
--i-taxa silva-taxonomy-0.qza
--p-mode uniq
--p-perc-identity 1.0
--p-threads 1
--p-rank-handles silva
--p-no-derep-prefix
--o-dereplicated-sequences dereplicated-sequences-0.qza
--o-dereplicated-taxa dereplicated-taxa-0.qza

qiime rescript evaluate-fit-classifier
--i-sequences dereplicated-sequences-0.qza
--i-taxonomy dereplicated-taxa-0.qza
--p-reads-per-batch auto
--p-n-jobs 1
--p-confidence 0.7
--o-classifier classifier-0.qza
--o-evaluation XX_evaluation
--o-observed-taxonomy XX_observed_taxonomy

###################### PROVENANCE q2-2024.5 ######################

conda activate qiime2-amplicon-2024.5

cd ~/qiime2/analysis/databases/16S/qiime_2024.5

wget https://data.qiime2.org/classifiers/sklearn-1.4.2/silva/silva-138-99-nb-classifier.qza

qiime tools replay-provenance
--in-fp silva-138-99-nb-classifier.qza
--out-fp silva-138-99-nb-classifier_provenance.txt
--verbose

cat silva-138-99-nb-classifier_provenance.txt

[only relevant part]

qiime rescript get-silva-data
--p-version 138.1
--p-target SSURef_NR99
--p-no-include-species-labels
--p-rank-propagation
--p-download-sequences
--o-silva-taxonomy silva-taxonomy-0.qza
--o-silva-sequences silva-sequences-0.qza

qiime rescript reverse-transcribe
--i-rna-sequences silva-sequences-0.qza
--o-dna-sequences dna-sequences-0.qza

qiime rescript cull-seqs
--i-sequences dna-sequences-0.qza
--p-num-degenerates 5
--p-homopolymer-length 8
--p-n-jobs 1
--o-clean-sequences clean-sequences-0.qza

qiime rescript filter-seqs-length-by-taxon
--i-sequences clean-sequences-0.qza
--i-taxonomy silva-taxonomy-0.qza
--p-labels Archaea Bacteria Eukaryota
--p-min-lens 900 1200 1400
--o-filtered-seqs filtered-seqs-0.qza
--o-discarded-seqs XX_discarded_seqs

qiime rescript dereplicate
--i-sequences filtered-seqs-0.qza
--i-taxa silva-taxonomy-0.qza
--p-mode uniq
--p-perc-identity 1.0
--p-threads 1
--p-rank-handles domain phylum class order family genus species
--p-no-derep-prefix
--o-dereplicated-taxa dereplicated-taxa-0.qza
--o-dereplicated-sequences dereplicated-sequences-0.qza

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads dereplicated-sequences-0.qza
--i-reference-taxonomy dereplicated-taxa-0.qza
--p-classify--alpha 0.001
--p-classify--chunk-size 20000
--p-classify--class-prior null
--p-no-classify--fit-prior
--p-no-feat-ext--alternate-sign
--p-feat-ext--analyzer char_wb
--p-no-feat-ext--binary
--p-feat-ext--decode-error strict
--p-feat-ext--encoding utf-8
--p-feat-ext--input content
--p-feat-ext--lowercase
--p-feat-ext--n-features 8192
--p-feat-ext--ngram-range '[7, 7]'
--p-feat-ext--norm l2
--p-feat-ext--preprocessor null
--p-feat-ext--stop-words null
--p-feat-ext--strip-accents null
--p-feat-ext--token-pattern '(?u)\b\w\w+\b'
--p-feat-ext--tokenizer null
--p-no-verbose
--o-classifier classifier-0.qza

Hi @ja.morillo ,
Thanks for the update and information.

this step was done internally as part of the get-silva-data action in the older version of RESCRIPt. In the more recent release (used in 2024.5) this is done separately. I would not expect this to impact the results.

Similarly, the other difference in provenance (use of qiime rescript evaluate-fit-classifier vs. qiime feature-classifier fit-classifier-naive-bayes ) should not matter, as evaluate-fit-classifier runs a pipeline that includes fit-classifier-naive-bayes with the same default parameters.

Could you please clarify? This did not impact the results?

could you please share:

  1. the taxonomy barplot QZVs from each of your tests (each database version and with the results with different read orientations)
  2. run qiime metadata tabulate with the taxonomy files (from 2024.2 and 2024.5) and send the QZVs
  3. run qiime feature-table tabulate-seqs with the representative_sequences.qza and send the output.

Thank you for explaining the changes between both versions of RESCRIPt. It's clear now.

You are right, I was not very clear in my comments about the effects of the options for --p-read-orientation with classify-sklearn. What I meant to say is that using Ā“--p-read-orientation reverse-complementĀ“ with q2.2024.5, did not improve the results (taxonmy is incorrect with all optoins). Of course, it impacted the q2.2024.2 taxonomy (with reverse-complement, wrong; auto and same,OK).

This sample is a mock community of 8 strains, that was mixed with a soil sample (that was for a paticular test; but with q2.2024.2 barplots qzv you will see the 8 main bacteria).

I am sending you all the files that you asked for along with the code I used.

q2.2024.2 qzv(s):

tabulate-seq_same-2024.2.qzv (346.0 KB)
tabulate-seq_auto-2024.2.qzv (346.5 KB)
tabulate-seq_reverse-complement-2024.2.qzv (341.4 KB)
taxa_barplot_same-2024.2.qzv (401.8 KB)
taxa_barplot_auto-2024.2.qzv (401.8 KB)
taxa_barplot_reverse-complement-2024.2.qzv (388.5 KB)
taxonomy_same-2024.2.qzv (1.3 MB)
taxonomy_auto-2024.2.qzv (1.3 MB)
taxonomy_reverse-complement-2024.2.qzv (1.3 MB)

q2.2024.5 qzv(s):

tabulate-seq_same-2024.5.qzv (348.1 KB)
tabulate-seq_auto-2024.5.qzv (344.0 KB)
tabulate-seq_reverse-complement-2024.5.qzv (344.1 KB)
taxa_barplot_same-2024.5.qzv (402.8 KB)
taxa_barplot_auto-2024.5.qzv (392.1 KB)
taxa_barplot_reverse-complement-2024.5.qzv (392.1 KB)
taxonomy_same-2024.5.qzv (1.3 MB)
taxonomy_auto-2024.5.qzv (1.3 MB)
taxonomy_reverse-complement-2024.5.qzv (1.3 MB)

Many thanks in advance for checking this!

############# q2.2024.2 #############

cd ~/qiime2/analysis/Starling2021/debug_classifiers/q2.2024.2

conda activate qiime2-amplicon-2024.2

Import data

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path ./manifest_debug.tsv
--output-path demux-paired-end.qza
--input-format PairedEndFastqManifestPhred33V2

qiime demux summarize
--i-data demux-paired-end.qza
--o-visualization demux-paired-end.qzv

Cutadapt

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-cores 30
--p-front-f ACGCGHNRAACCTTACC
--p-adapter-f TTGYACWCACYGCCCGT
--p-front-r ACGGGCRGTGWGTRCAA
--p-adapter-r GGTAAGGTTYNDCGCGT
--p-error-rate 0.1
--p-indels
--p-overlap 3
--p-match-read-wildcards True
--p-match-adapter-wildcards True
--o-trimmed-sequences trimmed.qza
--p-times 1
--p-discard-untrimmed
--verbose > cutadapt_log.txt

qiime demux summarize
--i-data trimmed.qza
--o-visualization trimmed.qzv

DADA2

qiime dada2 denoise-paired
--i-demultiplexed-seqs trimmed.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 205
--p-trunc-len-r 215
--output-dir dada2
--p-n-threads 30
--p-pooling-method pseudo
--verbose

qiime tools export --input-path dada2/denoising_stats.qza --output-path dada2/

results DADA2

qiime metadata tabulate
--m-input-file dada2/denoising_stats.qza
--o-visualization dada2/stats-dada2.qzv

ASVs stats:

qiime feature-table summarize
--i-table dada2/table.qza
--o-visualization dada2/table.qzv

TAXONOMY

mkdir taxo

check three --p-read-orientation options:

qiime feature-classifier classify-sklearn
--i-classifier ~/qiime2/analysis/databases/16S/qiime_2024.2/silva-138-99-nb-classifier.qza
--i-reads dada2/representative_sequences.qza
--o-classification taxo/taxonomy_auto.qza
--p-n-jobs 15
--p-read-orientation auto
--verbose

qiime feature-classifier classify-sklearn
--i-classifier ~/qiime2/analysis/databases/16S/qiime_2024.2/silva-138-99-nb-classifier.qza
--i-reads dada2/representative_sequences.qza
--o-classification taxo/taxonomy_same.qza
--p-n-jobs 15
--p-read-orientation same
--verbose

qiime feature-classifier classify-sklearn
--i-classifier ~/qiime2/analysis/databases/16S/qiime_2024.2/silva-138-99-nb-classifier.qza
--i-reads dada2/representative_sequences.qza
--o-classification taxo/taxonomy_reverse-complement.qza
--p-n-jobs 15
--p-read-orientation reverse-complement
--verbose

barplots qzv(s):

qiime taxa barplot --i-table dada2/table.qza
--i-taxonomy taxo/taxonomy_auto.qza
--m-metadata-file metadata_debug.tsv
--o-visualization taxo/taxa_barplot_auto.qzv

qiime taxa barplot --i-table dada2/table.qza
--i-taxonomy taxo/taxonomy_same.qza
--m-metadata-file metadata_debug.tsv
--o-visualization taxo/taxa_barplot_same.qzv

qiime taxa barplot --i-table dada2/table.qza
--i-taxonomy taxo/taxonomy_reverse-complement.qza
--m-metadata-file metadata_debug.tsv
--o-visualization taxo/taxa_barplot_reverse-complement.qzv

metadata tabulate taxo

qiime metadata tabulate
--m-input-file taxo/taxonomy_auto.qza
--o-visualization taxo/taxonomy_auto.qzv

qiime metadata tabulate
--m-input-file taxo/taxonomy_same.qza
--o-visualization taxo/taxonomy_same.qzv

qiime metadata tabulate
--m-input-file taxo/taxonomy_reverse-complement.qza
--o-visualization taxo/taxonomy_reverse-complement.qzv

rep seqs qzv(s):

qiime feature-table tabulate-seqs
--i-data dada2/representative_sequences.qza
--i-taxonomy taxo/taxonomy_auto.qza
--o-visualization taxo/tabulate-seq_auto.qzv

qiime feature-table tabulate-seqs
--i-data dada2/representative_sequences.qza
--i-taxonomy taxo/taxonomy_same.qza
--o-visualization taxo/tabulate-seq_same.qzv

qiime feature-table tabulate-seqs
--i-data dada2/representative_sequences.qza
--i-taxonomy taxo/taxonomy_reverse-complement.qza
--o-visualization taxo/tabulate-seq_reverse-complement.qzv

############# q2.2024.5 #############

cd ~/qiime2/analysis/Starling2021/debug_classifiers/q2.2024.5

conda activate qiime2-amplicon-2024.5

Import data

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path ./manifest_debug.tsv
--output-path demux-paired-end.qza
--input-format PairedEndFastqManifestPhred33V2

qiime demux summarize
--i-data demux-paired-end.qza
--o-visualization demux-paired-end-2024.5.qzv

Cutadapt

qiime cutadapt trim-paired
--i-demultiplexed-sequences demux-paired-end.qza
--p-cores 30
--p-front-f ACGCGHNRAACCTTACC
--p-adapter-f TTGYACWCACYGCCCGT
--p-front-r ACGGGCRGTGWGTRCAA
--p-adapter-r GGTAAGGTTYNDCGCGT
--p-error-rate 0.1
--p-indels
--p-overlap 3
--p-match-read-wildcards True
--p-match-adapter-wildcards True
--o-trimmed-sequences trimmed.qza
--p-times 1
--p-discard-untrimmed
--verbose > cutadapt_log.txt

qiime demux summarize
--i-data trimmed.qza
--o-visualization trimmed-2024.5.qzv

DADA2

qiime dada2 denoise-paired
--i-demultiplexed-seqs trimmed.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 205
--p-trunc-len-r 215
--output-dir dada2
--p-n-threads 30
--p-pooling-method pseudo
--verbose

qiime tools export --input-path dada2/denoising_stats.qza --output-path dada2/

results DADA2

qiime metadata tabulate
--m-input-file dada2/denoising_stats.qza
--o-visualization dada2/stats-dada2-2024.5.qzv

ASVs stats:

qiime feature-table summarize
--i-table dada2/table.qza
--o-visualization dada2/table-2024.5.qzv

TAXONOMY

mkdir taxo

check three --p-read-orientation options:

qiime feature-classifier classify-sklearn
--i-classifier ~/qiime2/analysis/databases/16S/qiime_2024.5/silva-138-99-nb-classifier.qza
--i-reads dada2/representative_sequences.qza
--o-classification taxo/taxonomy_auto.qza
--p-n-jobs 15
--p-read-orientation auto
--verbose

qiime feature-classifier classify-sklearn
--i-classifier ~/qiime2/analysis/databases/16S/qiime_2024.5/silva-138-99-nb-classifier.qza
--i-reads dada2/representative_sequences.qza
--o-classification taxo/taxonomy_same.qza
--p-n-jobs 15
--p-read-orientation same
--verbose

qiime feature-classifier classify-sklearn
--i-classifier ~/qiime2/analysis/databases/16S/qiime_2024.5/silva-138-99-nb-classifier.qza
--i-reads dada2/representative_sequences.qza
--o-classification taxo/taxonomy_reverse-complement.qza
--p-n-jobs 15
--p-read-orientation reverse-complement
--verbose

barplots qzv(s):

qiime taxa barplot --i-table dada2/table.qza
--i-taxonomy taxo/taxonomy_auto.qza
--m-metadata-file metadata_debug.tsv
--o-visualization taxo/taxa_barplot_auto-2024.5.qzv

qiime taxa barplot --i-table dada2/table.qza
--i-taxonomy taxo/taxonomy_same.qza
--m-metadata-file metadata_debug.tsv
--o-visualization taxo/taxa_barplot_same-2024.5.qzv

qiime taxa barplot --i-table dada2/table.qza
--i-taxonomy taxo/taxonomy_reverse-complement.qza
--m-metadata-file metadata_debug.tsv
--o-visualization taxo/taxa_barplot_reverse-complement-2024.5.qzv

metadata tabulate taxo

qiime metadata tabulate
--m-input-file taxo/taxonomy_auto.qza
--o-visualization taxo/taxonomy_auto-2024.5.qzv

qiime metadata tabulate
--m-input-file taxo/taxonomy_same.qza
--o-visualization taxo/taxonomy_same-2024.5.qzv

qiime metadata tabulate
--m-input-file taxo/taxonomy_reverse-complement.qza
--o-visualization taxo/taxonomy_reverse-complement-2024.5.qzv

rep seqs qzv(s):

qiime feature-table tabulate-seqs
--i-data dada2/representative_sequences.qza
--i-taxonomy taxo/taxonomy_auto.qza
--o-visualization taxo/tabulate-seq_auto-2024.5.qzv

qiime feature-table tabulate-seqs
--i-data dada2/representative_sequences.qza
--i-taxonomy taxo/taxonomy_same.qza
--o-visualization taxo/tabulate-seq_same-2024.5.qzv

qiime feature-table tabulate-seqs
--i-data dada2/representative_sequences.qza
--i-taxonomy taxo/taxonomy_reverse-complement.qza
--o-visualization taxo/tabulate-seq_reverse-complement-2024.5.qzv

1 Like

Hi @ja.morillo ,
Thanks for preparing all of these files. We can just focus on two:
2024.2 same orientation
2024.5 same orientation

These look more or less the same to me and do not contain Eukaryotes.

Indeed, the "auto" orientation is yielding bad results, as can happen when some query sequences cause the orientation detector to misfire, e.g., due to very noisy query or reference sequences. But it looks like you just need to set orientation to "same" for your samples to get it to classify just fine.

So I still do not see any issue with the 2024.5 release / SILVA 138.1 classifier based on these results.

2 Likes

Hello,

Thank you very much for checking the files, and I apologize for the delay in my response, but I needed time to verify everything properly. Indeed, the "same" orientation generates good results with the QIIME 2024.5 version (sorry for the erroneous check of the qzv). It's interesting that in 2024.2, "auto" also works. Considering that the query sequences are the same, I understand that this is due to changes in the latest version of SILVA used in the QIIME 2024.5 classifier. We will take this into account for future updates. Thank you again for your time and explanations.

Cheers,
Jose

1 Like