Hi @SoilRotifer
Thanks so much for the guidance!
Yes, I agree that using qiime rescript get-silva-data
is simpler for standard taxonomy-only workflows. However, in my case, I do need the sequence data to assign taxonomy to my experimental sequences (AMF OTUs), so I followed the full workflow @Nicholas_Bokulich previously recommended here:
Processing, filtering, and evaluating SILVA data with RESCRIPt
I’ve successfully formatted the SILVA 138.2 database into QIIME 2 .qza
artifacts. The only step I did not include was the differential filtering of sequences by length and taxonomy (as outlined in the “Filtering sequences by length and taxonomy” section), because I wasn’t entirely sure if that would impact downstream AMF assignments. I understand the reasoning behind it — avoiding selection bias by preserving shorter Eukaryotic sequences — and I may revisit that step if needed.
Here’s what I’ve done so far:
- Generated the reference files:
silva-138.2-ssu-nr99-seqs-derep-uniq.qza
silva-138.2-ssu-nr99-tax-derep-uniq.qza
- Ran taxonomy assignment using the VSEARCH method (still running)
qiime feature-classifier classify-consensus-vsearch \
--i-query amf_otu_sequences_97.qza \
--i-reference-reads silva-138.2-ssu-nr99-seqs-derep-uniq.qza \
--i-reference-taxonomy silva-138.2-ssu-nr99-tax-derep-uniq.qza \
--p-perc-identity 0.90 \
--p-maxaccepts 1 \
--p-threads 4 \
--o-classification amf_qiime2taxonomy.qza \
--o-search-results search_qiime2results.qza
- Trained a Naive Bayes classifier on the full-length reference sequences:
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads silva-138.2-ssu-nr99-seqs-derep-uniq.qza \
--i-reference-taxonomy silva-138.2-ssu-nr99-tax-derep-uniq.qza \
--o-classifier silva-138.2-ssu-nr99-classifier.qza
- Then, I attempted taxonomy assignment using this classifier
qiime feature-classifier classify-sklearn \
--i-classifier silva-138.2-ssu-nr99-classifier.qza \
--i-reads amf_otu_sequences_97.qza \
--p-n-jobs 4 \
--o-classification amf_qiime2_classifier_taxonomy.qza
However, this command failed immediately without running — no job was submitted or processed. I suspect the issue may be related to mismatch between the classifier (trained on full-length sequences) and my OTU input (derived from AMV4.5NF and AMDGR primers targeting a specific 18S region). But I’m not entirely sure.
That brings me to my main question:
Would you recommend extracting reads using the primer pair AMV4.5NF (AAGCTCGTAGTTGAATTTCG
) and AMDGR (CCCAACTATCCCTATTAATCAT
) with qiime feature-classifier extract-reads
, and then training an amplicon-specific classifier for this 18S region? Or is the full-length classifier still appropriate in this case?
Thanks again for all your help — the RESCRIPt resources and your input have been incredibly helpful.