Hi,
I would like to create a classifier from EZbiocloud database. I downloaded the sequence and taxa files, made the classifier and used it on my dataset (16S, V3-V4 region). Out of 4389 ASVs 4348 were classified as:
Bacteria;Proteobacteria;Deltaproteobacteria;Desulfobacterales;Desulfobacteraceae;Desulfamplus;Desulfobacterium niacini
For human stool samples this is certainly not what it should look like. With Silva138 classifier, made using rescript tutorial, the results for these data look totally normal.
I downloaded the rep-seqs from here https://data.qiime2.org/2022.2/tutorials/training-feature-classifiers/rep-seqs.qza --- and with these sequences the EZbiocloud classifier worked just fine (results comparable to Silva138). For both databases, the V3-V4 region was extracted using same primer sequences.
I took as well a small subset of reads from ref-seqs_EZ_V3V4.qza (after V3V4 extraction) and compared to the ASV reads in my dataset and the start and end align well...
Start:
End:
Qiime version : QIIME2/2021.8
Commands to make the classifier:
qiime tools import \
--type 'FeatureData[Sequence]' \
--input-path ezbiocloud_qiime_full.fasta \
--output-path ezbio.qza
qiime tools import \
--type 'FeatureData[Taxonomy]' \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path ezbiocloud_id_taxonomy.txt \
--output-path ref-taxonomy.qza
qiime feature-classifier extract-reads \
--i-sequences ezbio.qza \
--p-f-primer CCTACGGGNGGCWGCAG \
--p-r-primer GACTACHVGGGTATCTAATCC \
--o-reads ref-seqs_EZ_V3V4.qza
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads ref-seqs_EZ_V3V4.qza \
--i-reference-taxonomy ref-taxonomy.qza \
--o-classifier classifier_EZ_V3V4.qza
Do you have any ideas why this EZbiocloud classifier does not work with my dataset? (but works with the test rep-seq (guess V4 seqs) and the dataset itself looks fine and gives normal results with Silva138)
Thank you in advance!!
Best,
Rahel