poor classification using qiime2

MichelaRiba · March 29, 2023, 10:44am

Good morning,

I am experiencing some difficultie sto get results even if indeed my pipeline has not changed.
In specific what I obtain is kind of poor classification: half of the sequences (very low number of OTU in addition (e.g 900) are just attributed to Bacteria or OD1. So I think this is not a great result.

I include my commands

taxa_classi:
$(CONDA_ACTIVATE) Miqiime2-2021.8;
qiime feature-classifier classify-sklearn
--i-classifier gg-13-8-99-nb-classifier.qza
--i-reads rep-seqs-or-85.qza
--o-classification taxonomy10C.qza

joined_import_filter_derep:
export HDF5_USE_FILE_LOCKING='FALSE';
$(CONDA_ACTIVATE) Miqiime2-2021.8;
qiime vsearch dereplicate-sequences
--i-sequences fil_joined.qza
--o-dereplicated-table table.qza
--o-dereplicated-sequences rep-seqs.qza

Couls you please help me?

Thanks a lot

I specify I did not check at the moment if primers for sequencing have changed or so

Michela

I would appr4ciate very much you kind help.

crusher083 · March 29, 2023, 11:07am

Hello Michela,

The information about primers would be crucial. It is the most possible explanation for the poor performance of the classifier.

Cheers
Valentyn

MichelaRiba · March 29, 2023, 11:46am

Hi Valentyn, I will be back with information about primers, for sure I would need indications on waht classifier would be best fitted.

Thanks a lot for you support

Michela

crusher083 · March 29, 2023, 12:00pm

After you obtain primer sequences you can refer to a tutorial on building a reference database here:

Cheers
Valentyn

MichelaRiba · March 29, 2023, 12:13pm

Thanks a lot!
could you please remind me of which primers are compatible with this classifier database
gg-13-8-99-nb-classifier.qza

?
This will help me very much to recontruct the sudden impossibility of classification starting from the same facility.

I would appreciate it very much

Michela

colinbrislawn · March 29, 2023, 10:29pm

More details are avaiable on the data resources page.

Naive Bayes classifiers trained on:

Greengenes 13_8 99% OTUs full-length sequences (MD5: 6bbc9b3f2f9b51d663063a7979dd95f1)

Greengenes 13_8 99% OTUs from 515F/806R region of sequences (MD5: 9e82e8969303b3a86ac941ceafeeac86)

gg-13-8-99-nb-classifier.qza is the first one. So, no primers are used to select a region at all so the full 16S region is used for k-mer profiling and classification.

Using RESCRIPt to build a database for just your region of interest should perform better because it's more specific.