QIIME2-Compatible Database for AMF Taxonomy Assignment – Issues with SILVA Lineage Depth

Salma_Sarker · June 4, 2025, 4:04am

Thanks so much for the guidance!

Yes, I agree that using qiime rescript get-silva-data is simpler for standard taxonomy-only workflows. However, in my case, I do need the sequence data to assign taxonomy to my experimental sequences (AMF OTUs), so I followed the full workflow @Nicholas_Bokulich previously recommended here:
Processing, filtering, and evaluating SILVA data with RESCRIPt

I’ve successfully formatted the SILVA 138.2 database into QIIME 2 .qza artifacts. The only step I did not include was the differential filtering of sequences by length and taxonomy (as outlined in the “Filtering sequences by length and taxonomy” section), because I wasn’t entirely sure if that would impact downstream AMF assignments. I understand the reasoning behind it — avoiding selection bias by preserving shorter Eukaryotic sequences — and I may revisit that step if needed.

Here’s what I’ve done so far:

Generated the reference files:
- silva-138.2-ssu-nr99-seqs-derep-uniq.qza
- silva-138.2-ssu-nr99-tax-derep-uniq.qza
Ran taxonomy assignment using the VSEARCH method (still running)

qiime feature-classifier classify-consensus-vsearch \
  --i-query amf_otu_sequences_97.qza \
  --i-reference-reads silva-138.2-ssu-nr99-seqs-derep-uniq.qza \
  --i-reference-taxonomy silva-138.2-ssu-nr99-tax-derep-uniq.qza \
  --p-perc-identity 0.90 \
  --p-maxaccepts 1 \
  --p-threads 4 \
  --o-classification amf_qiime2taxonomy.qza \
  --o-search-results search_qiime2results.qza

Trained a Naive Bayes classifier on the full-length reference sequences:

qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads silva-138.2-ssu-nr99-seqs-derep-uniq.qza \
  --i-reference-taxonomy silva-138.2-ssu-nr99-tax-derep-uniq.qza \
  --o-classifier silva-138.2-ssu-nr99-classifier.qza

Then, I attempted taxonomy assignment using this classifier

qiime feature-classifier classify-sklearn \
  --i-classifier silva-138.2-ssu-nr99-classifier.qza \
  --i-reads amf_otu_sequences_97.qza \
  --p-n-jobs 4 \
  --o-classification amf_qiime2_classifier_taxonomy.qza

However, this command failed immediately without running — no job was submitted or processed. I suspect the issue may be related to mismatch between the classifier (trained on full-length sequences) and my OTU input (derived from AMV4.5NF and AMDGR primers targeting a specific 18S region). But I’m not entirely sure.

That brings me to my main question:
Would you recommend extracting reads using the primer pair AMV4.5NF (AAGCTCGTAGTTGAATTTCG) and AMDGR (CCCAACTATCCCTATTAATCAT) with qiime feature-classifier extract-reads, and then training an amplicon-specific classifier for this 18S region? Or is the full-length classifier still appropriate in this case?

Thanks again for all your help — the RESCRIPt resources and your input have been incredibly helpful.