Hi every one,
I experience a problem with classification of my sequences with the plugin feature-classifier classify-sklearn.
Indeed, I have amplicon data targeting the 16S V3-V4 region. First, raw data were treated with Dada2 :
qiime dada2 denoise-paired
--i-demultiplexed-seqs Metabarcoding_16S/QIIME/1-Import/paired-end-demux.qza
--p-trim-left-f 17
--p-trunc-len-f 280
--p-trim-left-r 20
--p-trunc-len-r 260
--p-max-ee 2
--p-trunc-q 2
--p-chimera-method consensus
--p-min-fold-parent-over-abundance 1.0
--p-n-threads 18
--p-n-reads-learn 1000000
--p-hashed-feature-ids
--o-representative-sequences Metabarcoding_16S/QIIME/2-Dada2/rep-seqs-dada2.qza
--o-table Metabarcoding_16S/QIIME/2-Dada2/table-dada2.qza
--o-denoising-stats Metabarcoding_16S/QIIME/2-Dada2/stats-dada2.qza
Then I performed taxonomic classification using this command lines :
1- Pre-training of the classifier for the V3-V4 region
qiime feature-classifier extract-reads
--i-sequences Databases/SILVA_132_QIIME_release/rep_set/rep_set_all/99/silva132_99.qza
--p-f-primer CCTACGGGNGGCWGCAG
--p-r-primer GACTACHVGGGTATCTAATCC
--o-reads Databases/SILVA_132_QIIME_release/rep_set/rep_set_all/99/silva132_99_V3-V4.qza
qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads Databases/SILVA_132_QIIME_release/rep_set/rep_set_all/99/silva132_99_V3-V4.qza
--i-reference-taxonomy Databases/SILVA_132_QIIME_release/taxonomy/taxonomy_all/99/taxonomy_7_levels.qza
--o-classifier Databases/SILVA_132_QIIME_release/silva-132-99-nb-classifier_V3-V4.qza
2- Taxonomic classification with the V3-V4 pre-trained classifier
qiime feature-classifier classify-sklearn
--i-classifier Databases/SILVA_132_QIIME_release/silva-132-99-nb-classifier_V3-V4.qza
--i-reads Metabarcoding_16S/QIIME/2-Dada2/rep-seqs-dada2.qza
--o-classification Metabarcoding_16S/QIIME/3-Classification/taxonomy_db_V3-V4.qza
--p-n-jobs 10
--p-reads-per-batch 100
--p-confidence 0.7
--verbose
I also tried with the pre-trained classifier with full-length sequences that I downloaded from the QIIME2 website.
qiime feature-classifier classify-sklearn
--i-classifier /Databases/SILVA_132_QIIME_release/silva-132-99-nb-classifier.qza
--i-reads Metabarcoding_16S/QIIME/2-Dada2/rep-seqs-dada2.qza
--o-classification Metabarcoding_16S/QIIME/3-Classification/taxonomy_db_full-length.qza
--p-n-jobs 10
--p-reads-per-batch 100
--p-confidence 0.7
--verbose
Then I plot taxonomic classification with the plugin :
qiime taxa barplot
--i-table Metabarcoding_16S/QIIME/2-Dada2/table-dada2.qza
--i-taxonomy Metabarcoding_16S/QIIME/3-Classification/taxonomy_db_{V3-V4 or full-lenth}.qza
--m-metadata-file QIIME/APAZ1to27_metadata.csv
--o-visualization Metabarcoding_16S/QIIME/3-Classification/taxa-bar-plots.qzv
For each mode of taxonomic classification (V3-V4 vs full-length databases) I get 2 taxa barplots very different.
With the V3-V4
With full-lenth
I was not expecting so much eukaryota or archaea in the 2 datasets and I decided to pick sequences assigned as Eukaryota with the full-length db and Archaea with the V3-V4 db. Finally in each case sequences are not at all Eukaryota or Archaea but Bacteria (checked by BLASTn vs NR database and SILVA sequence search).
Furthermore when I classify my sequences with qiime feature-classifier classify-consensus-vsearch, everything seems good.
qiime feature-classifier classify-consensus-vsearch
--i-query Metabarcoding_16S/QIIME/2-Dada2/rep-seqs-dada2.qza
--i-reference-reads Databases/SILVA_132_QIIME_release/99-otus.qza
--i-reference-taxonomyDatabases/SILVA_132_QIIME_release/7_level_taxonomy.qza
--p-maxaccepts 0
--p-perc-identity 0.8
--p-query-cov 0.8
--p-strand both
--p-min-consensus 0.51
--p-unassignable-label Unassigned
--p-threads 40
--o-classification Metabarcoding_16S/QIIME/3-Classification/taxonomy_db_full-length_vsearch_classif.qza
I tested this pipeline with qiime2-2018.8 and qiime2-2019.1 with the same results.
How can these results be explained? What is going wrong with my pipeline?
Thank you so much for your replies.