I am working on a study to describe hemoparasite communities in wildlife (Illumina MiSeq (300bp paired end)). We have targeted one genus, using primers traditionally used for diagnostic work with this group of parasites. The genus we are describing consists of a multitude of species with 18S V4 regions well defined (there are ~70 species included in the Silva 18S data base).
I used qiime2, with the dada2 plugin, to derive ASVs and have been trying to use the Silva.128 18S database to assign taxonomy. However, all (~420) parasite sequences only state D6 Theileria; Ambiguous taxa. When I BLAST search the sequences I get really high species specific identity (99%-100%). Is there something wrong with my workflow or is the issue related to taxa available in the Silva database?
Here are the qiime2 commands run (I used both the 99_otus_18S.fasta & 97_otus_18S.fasta and got the same results).
#move 97_otus_18S.fasta & consensus_taxonomy_all_levels.txt from SILVA database into working database
qiime tools import
--type 'FeatureData[Sequence]'
--input-path 97_otus_18S.fasta
--output-path 97_otus.qza
qiime tools import
--type 'FeatureData[Taxonomy]'
-—source-format HeaderlessTSVTaxonomyFormat
--input-path consensus_taxonomy_all_levels.txt
--output-path ref.taxonomy.qza
qiime feature-classifier extract-reads
--i-sequences 97_otus.qza
--p-f-primer GAGGTAGTGACAAGAAATAACAATA \
—- p-r-primer TCTTCGATCCCCTAACTTTC
—-p-trunc-len 512
-—o-reads ref-seqs.qza
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads ref-seqs.qza
--i-reference-taxonomy ref-taxonomy.qza
--o-classifier 97.classifier.qza
qiime feature-classifier classify-sklearn
--i-classifier training-feature-classifiers.97/97.classifier.qza
--i-reads rep-seqs.qza
--o-classification taxonomy.qza
qiime metadata tabulate
--m-input-file taxonomy.qza
--o-visualization taxonomy.qzv
rep-seqs.qza (23.3 KB)