Taxonomic classification at subspecies level

Hello everyone,

I want to do taxonomic assignment at sub-species level. But I am not sure I can do this with qiime. I used following commands while training my classifier.
qiime rescript get-silva-data
--p-version '138.1'
--p-target 'SSURef_NR99'
--p-include-species-labels
--o-silva-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-silva-taxonomy silva-138.1-ssu-nr99-tax.qza

qiime rescript reverse-transcribe
--i-rna-sequences silva-138.1-ssu-nr99-rna-seqs.qza
--o-dna-sequences silva-138.1-ssu-nr99-seqs.qza

qiime rescript cull-seqs
--i-sequences silva-138.1-ssu-nr99-seqs.qza
--o-clean-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza

qiime rescript filter-seqs-length-by-taxon
--i-sequences silva-138.1-ssu-nr99-seqs-cleaned.qza
--i-taxonomy silva-138.1-ssu-nr99-tax.qza
--p-labels Archaea Bacteria Eukaryota
--p-min-lens 900 1200 1400
--o-filtered-seqs silva-138.1-ssu-nr99-seqs-filt.qza
--o-discarded-seqs silva-138.1-ssu-nr99-seqs-discard.qza

qiime rescript dereplicate
--i-sequences silva-138.1-ssu-nr99-seqs-filt.qza
--i-taxa silva-138.1-ssu-nr99-tax.qza
--p-mode 'uniq'
--o-dereplicated-sequences silva-138.1-ssu-nr99-seqs-derep-uniq.qza
--o-dereplicated-taxa silva-138.1-ssu-nr99-tax-derep-uniq.qza

qiime feature-classifier extract-reads
--i-sequences silva-138.1-ssu-nr99-seqs-derep-uniq.qza
--p-f-primer CCTACGGGAGGCAGCAG
--p-r-primer CTACCAGGGTATCTAATCC
--p-n-jobs 2
--p-read-orientation 'forward'
--o-reads silva-138.1-ssu-nr99-seqs-v3v4.qza

qiime rescript dereplicate
--i-sequences silva-138.1-ssu-nr99-seqs-v3v4.qza
--i-taxa silva-138.1-ssu-nr99-tax-derep-uniq.qza
--p-mode 'uniq'
--o-dereplicated-sequences silva-138.1-ssu-nr99-seqs-v3v4_derep.qza
--o-dereplicated-taxa silva-138.1-ssu-nr99-tax-v3v4_derep.qza

qiime taxa filter-seqs
--i-sequences silva-138.1-ssu-nr99-seqs-v3v4_derep.qza
--i-taxonomy silva-138.1-ssu-nr99-tax-v3v4_derep.qza
--p-exclude Eukaryota,Unassigned
--o-filtered-sequences 16S_v3v4_derep_filt.qza

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads 16S_v3v4_derep_filt.qza
--i-reference-taxonomy silva-138.1-ssu-nr99-tax-v3v4_derep.qza
--o-classifier classifier_16S_v3v4.qza

Then, I used following commands while preparing my data.

qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path Sample49
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path Sample49_Results/Import_Data/Sample49-demux-paired-end.qza

qiime demux summarize
--i-data Sample49_Results/Import_Data/Sample49-demux-paired-end.qza
--o-visualization Sample49_Results/Import_Data/Sample49-demux-paired-end.gzv

qiime cutadapt trim-paired
--i-demultiplexed-sequences Sample49_Results/Import_Data/Sample49-demux-paired-end.qza
--p-adapter-f TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
--p-adapter-r GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
--p-match-read-wildcards
--p-match-adapter-wildcards
--o-trimmed-sequences Sample49_Results/Trimming/Sample49_trimmed_adapters.qza

qiime demux summarize
--i-data Sample49_Results/Trimming/Sample49_trimmed_adapters.qza
--o-visualization Sample49_Results/Trimming/Sample49_trimmed_adapters.gzv

qiime dada2 denoise-paired
--i-demultiplexed-seqs Sample49_Results/Trimming/Sample49_trimmed_adapters.qza
--p-trunc-len-f 290
--p-trunc-len-r 255
--p-trim-left-r 55
--p-n-threads 24
--o-representative-sequences Sample49_Results/Denoising/Sample49-rep-seqs.qza
--o-table Sample49_Results/Denoising/Sample49-table.qza
--o-denoising-stats Sample49_Results/Denoising/Sample49-stats.qza
--verbose

AND FINALLY, for taxonomic classification I used following

qiime feature-classifier classify-sklearn
--i-classifier Trained_SILVA/classifier_16S_v3v4.qza
--i-reads Sample49_Results/Denoising/Sample49-rep-seqs.qza
--o-classification Sample49_Results/Taxonomy/Sample49_v3v4_taxonomy.qza

BUT, I HAVE JUST TAXONOMIC CLASSIFICATION AT SPECIES LEVEL. BUT I WANT TO DO IT AT SUB-SPECIES LEVEL.

Is it possible to obtain taxonomic classification at sub-species level with qiime?

Hi @elifi ,

This is a wish that we all have. But there are some very clear reasons why this is not possible. First and foremost, you are using the SILVA database. This database is actually only curated to genus level, though it is annotated to species level (with the raw uncurated species-level information coming from Genbank). So the species annotations should be treated carefully, as many are wrong (not a fault of SILVA — they make no guarantees and only provide these as information).

If you have a database that is annotated to species level, then how can you classify to sub-species? You cannot. Getting species-level classification is already very good.

Another reason: your data appear to be sequences of a 16S rRNA gene domain. Even the full-length 16S rRNA gene has limited taxonomic resolution, and species-level classification is difficult enough. Subspecies classification is nearly impossible with 16S, as it is a fairly well-conserved gene. Usually proper subspecies classification is done with something like MLST or whole genome sequencing.

You could certainly look at the ASVs and treat them as subspecies-level variants. You could even build a phylogeny of these comparing against known subspecies of your clade of interest. But this is separate from taxonomic classification, then, which relies on the annotations available in the database (not possible with SILVA), and requires sufficient taxonomic resolution to differentiate true subspecies and find a reliable match in the database (not possible with 16S).

Yes, this is possible, e.g., with long-read sequencing data or shotgun metagenome sequence data. QIIME 2 has plugins for this. But with SILVA and 16S data you will not be able to get subspecies classifications.

I hope that clarifies! Good luck!

4 Likes