Ambiguous_taxa when assigning species ID

I am working on a study to describe hemoparasite communities in wildlife (Illumina MiSeq (300bp paired end)). We have targeted one genus, using primers traditionally used for diagnostic work with this group of parasites. The genus we are describing consists of a multitude of species with 18S V4 regions well defined (there are ~70 species included in the Silva 18S data base).

I used qiime2, with the dada2 plugin, to derive ASVs and have been trying to use the Silva.128 18S database to assign taxonomy. However, all (~420) parasite sequences only state D6 Theileria; Ambiguous taxa. When I BLAST search the sequences I get really high species specific identity (99%-100%). Is there something wrong with my workflow or is the issue related to taxa available in the Silva database?

Here are the qiime2 commands run (I used both the 99_otus_18S.fasta & 97_otus_18S.fasta and got the same results).

#move 97_otus_18S.fasta & consensus_taxonomy_all_levels.txt from SILVA database into working database

qiime tools import
--type 'FeatureData[Sequence]'
--input-path 97_otus_18S.fasta
--output-path 97_otus.qza

qiime tools import
--type 'FeatureData[Taxonomy]'
-—source-format HeaderlessTSVTaxonomyFormat
--input-path consensus_taxonomy_all_levels.txt
--output-path ref.taxonomy.qza

qiime feature-classifier extract-reads
--i-sequences 97_otus.qza
—-p-trunc-len 512
-—o-reads ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads ref-seqs.qza
--i-reference-taxonomy ref-taxonomy.qza
--o-classifier 97.classifier.qza

qiime feature-classifier classify-sklearn
--i-classifier training-feature-classifiers.97/97.classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy.qza

qiime metadata tabulate
--m-input-file taxonomy.qza
--o-visualization taxonomy.qzv

rep-seqs.qza (23.3 KB)

Hi @ckg89,

This issue is unambiguously caused by the reference database, and is not an error with the classifier.

“Ambiguous taxa” would be a label in the reference database, as that label is not produced by the classifiers in QIIME2 (which would provide an “unclassified” label if a sequence does not resemble anything in the reference, or a shallow classification without other labels (e.g., “D6 Theileria”) if it just can’t confidently classify that sequence).

One issue could be that you use the consensus_taxonomy_all_levels. The consensus taxonomy might apply that label when it can’t reach a reasonable consensus at level X, but I just don’t know enough about how SILVA is calculating that consensus. You could try the majority taxonomy and see if that helps…

Another possibility would be to filter out any taxa from the reference database that have that “Ambiguous” label — but I don’t know whether that may include valuable sequence data (e.g., taxa that really do exist but just aren’t well characterized yet).

I would try those steps and go from there — if the results are still unsatisfying, there may be other options to explore…

Good luck!

Thank you!

I tried running the same commands with a curated database and got species level assignments so it was an issue with the SILVA database not including enough 18S sequences (for my particular study).

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.