Training classifier for taxonomic analysis

Hi everyone! :grinning:

I tried to train the classifier using the Greengenes 13_8 99% OTU dataset for the 16S rRNA gene V3-V4 regions. I followed the tutorial "Training feature classifiers with q2-feature-classifier".
For the "extract-reads" command I did:
qiime feature-classifier extract-reads
--i-sequences 99_otus.qza
--p-min-length 400
--p-max-length 500
--o-reads ref-seqs.qza

After which I trained the Naive Bayes classifier.

Unfortunately, already in the "taxonomy.qzv" file (which I attach) many features stop at the domain.

I then applied the taxa barplot command to my filtered table and didn't get great results. Furthermore, there are no statistically significant differences regarding abundance at taxa level 6.

For simplicity I attach all the files.
bsf-l6-ancom-treatment.qzv (420.0 KB)
taxa-bar-plots-bsf.qzv (356.2 KB)
taxonomy.qzv (1.3 MB)

I'll explain my work to you: I'm trying to analyse the intestinal microbiome of an insect species raised on a control diet and a treated diet to understand if there are any differences of bacterial community.

Do you have suggestions for improving the analysis? The analysis of alpha and beta diversity was all significant.

Thank you in advance :pray:

1 Like

Hello Linda,

This sounds like a great project!
:beetle: :green_salad: :microbe:

Have you tried using Greengenes2 with non-v4-16s, as shown in the main tutorial?

EDIT: This does not make ASVs from your data like other pipelines. Instead, it directly compares your reads against the GG2 database entries, which already have a taxonomy.

Simply counting database hits is also known as 'closed-ref-clustering'.
This has very different tradeoffs than denoising -> taxonomy inference.

While GG2 should work well with a Naive Bayes classifier, using the GG2 plugin
qiime greengenes2 non-v4-16s
is a great place to start and provides a point of comparison.

Keep us posted!

1 Like

Thank you so much, Colin! I will try with your suggestion :grin: I will keep you posted

1 Like