You could simply download the premade SILVA sequence and taxonomy files from the Data resources page, and filter the sequences using qiime taxa filter-seqs ... like so:
If we just want to provide only eukaryotic outgroup taxa, but no SILVA nematode sequences you would run the command below. So, we are removing all Bacterial and Archaeal sequences. We'll also remove the Nematoda, as we do not want to pollute our new NemaBase database. You can also simply just remove the Nematoda, and leave the Archaea and Bacteria there as decoys / outgroups. You may have to play around and see which works best.
qiime taxa filter-seqs \
--i-sequences silva_sequences.qza \
--i-taxonomy silva_taxonomy.qza \
--p-exclude Nematoda,Bacteria,Archaea \
--o-filtered-sequences silva_euk_outgroup_seqs.qza
Now that we have our filtered SILVA database, we'll want to remove the taxonomy information for those sequences we removed. So, we can run:
qiime rescript filter-taxa \
--i-taxonomy silva_taxonomy.qza \
--m-ids-to-keep-file silva_euk_outgroup_seqs.qza
--o-filtered-taxonomy silva_euk_outgroup_taxonomy.qza
Now we have two files we can merge to our NemaBase files: silva_euk_outgroup_seqs.qza
and silva_euk_outgroup_taxonomy.qza
. Assuming you've been able to import the NemaBase files.
Now we can merge:
qiime feature-table merge-seqs \
--i-data silva_euk_outgroup_seqs.qza nemabase_seqs.qza \
--o-merged-data nemabase_w_silva_euk_outgroup_seqs.qza
qiime feature-table merge-taxa \
--i-data silva_euk_outgroup_taxonomy.qza nemabase_taxonomy.qza \
--o-merged-data nemabase_w_silva_euk_outgroup_taxonomy.qza
Assuming there are similar taxonomic rankings (d__,p__,c__,o__,f__,g__,s__
) you should now be able to train your classifier. If needed you can play around with qiime rescript edit-taxonomy ... to fix things.
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads nemabase_w_silva_euk_outgroup_seqs.qza \
--i-reference-taxonomy nemabase_w_silva_euk_outgroup_taxonomy.qza \
--o-classifier nemabase_w_silva_euk_outgroup_classifier.qza
This should help you get started.