Hi @wasade,
Back again with another query,
So, I am trying to follow through with these commands after DADA-2 (16S V3-V4 region, Illumina 300x2, 341F/785R)
qiime rescript orient-seqs --i-sequences rep-denoise-trimmed-seqs.qza --i-reference-sequences 2022.10.backbone.full-length.fna.qza --o-oriented-seqs oriented-rep-denoise-trimmed-seqs.qza --o-unmatched-seqs unmatched-rep-denoise-trimmed-seqs.qza --p-threads 36
qiime feature-table merge-seqs --i-data oriented-rep-denoise-trimmed-seqs.qza --i-data unmatched-rep-denoise-trimmed-seqs.qza --o-merged-data rescript-rep-denoise-trimmed-seqs.qza
qiime feature-classifier classify-sklearn --i-reads rescript-rep-denoise-trimmed-seqs.qza --i-classifier 2022.10.backbone.full-length.nb.qza --o-classification sklrean-rescript-rep-denoise-trimmed-seqs.tax.qza --p-n-jobs 10
Plugin error from feature-classifier:
The scikit-learn version (0.24.1) used to generate this artifact does not match the current version of scikit-learn installed (1.4.2). Please retrain your classifier for your current deployment to prevent data-corruption errors.
Debug info has been saved to /tmp/qiime2-q2cli-err-7zxl3qzv.log
qiime2-q2cli-err-7zxl3qzv.txt (1.9 KB)
What could I do to remedy this?
Next, I tried the Greengenes2 plugin as shown below, and it seems to be working fine.
qiime greengenes2 non-v4-16s --i-table table-denoise-trimmed-seqs.qza --i-sequences rescript-rep-denoise-trimmed-seqs.qza --i-backbone 2022.10.backbone.full-length.fna.qza --o-mapped-table icu.gg2.biom.qza --o-representatives icu.gg2.fna.qza
qiime greengenes2 taxonomy-from-table --i-reference-taxonomy 2022.10.taxonomy.asv.nwk.qza --i-table icu.gg2.biom.qza --o-classification icu.gg2.taxonomy.qza
qiime metadata tabulate --m-input-file icu.gg2.taxonomy.qza --m-input-file icu.gg2.fna.qza --o-visualization gg2-before-filter-seqs.tax.qzv
qiime taxa barplot --i-table icu.gg2.biom.qza --i-taxonomy icu.gg2.taxonomy.qza --m-metadata-file 16S-seqs-metadata.tsv --o-visualization gg2-before-filter-vis-bar.qzv
qiime phylogeny align-to-tree-mafft-fasttree --i-sequences icu.gg2.fna.qza --o-alignment gg2-aligned-rep-trimmed-seqs.qza --o-masked-alignment gg2-masked-aligned-rep-trimmed-seqs.qza --o-tree gg2-unrooted-trimmed-tree.qza --o-rooted-tree gg2-rooted-trimmed-tree.qza
qiime diversity core-metrics-phylogenetic --i-phylogeny gg2-rooted-trimmed-tree.qza --i-table icu.gg2.biom.qza --m-metadata-file 16S-seqs-metadata.tsv --output-dir core-metrics-results
However, I noticed two specific differences with SILVA:
- All of the sequences were classified under bacteria for Greengenes2, unlike SILVA, which also assigned a very tiny percentage under _unassigned and _eukaryota.
- SILVA taxonomy still had much higher number of reads under each sample, even after filtering non-bacterial sequences, (see attachment below).
SILVA vs. Greengenes2.txt (6.5 KB)
Do you think there are any discrepancies in the Greengenes-2 plugin commands I'm following, or does everything seem fine to you?
Best,
D_S