taxonomy assignment different result after re-training

Hi,

Previously I had trained a custom plant ITS2 database (using qiime2 2019 04 fit naive bayes), and the taxonomy assignment looked pretty accurate at species level on the positive controls. However, yesterday I installed q2-clawback, but encountered some warnings about sckit learn version not compatible, and feature classifier external joblib deprecated, when I re-trained this database, and re-fit it to the reads, I got taxonomy assignment that was off the species level on those positive controls. I have removed qiime2 2019 04 and reinstalled it, and have not been able to reproduce the result again.

Not sure what is going on?

Thanks

Which results can you not reproduce? Your original results, or, the “off the species level on those positive controls” results?

The original results.

I am attaching the qzv files here, the provenace might help.

Orignal used the qiime2 2019 4 version when it was just released.
Yesterday used qiime2 2019 4 version from current.

I think most of the package versions are the same though.

Positive controls are CV (Cirsium vulgare), SO(Sidalcea oregana), VC (Vicia cracca) , and TM (Thermopsis montana)

original : original.qzv (757.5 KB)

yesterday: barplot_beepollen1.qzv (662.8 KB)

Thanks,

Awesome, great idea!

You appear to have used two different reference databases between the two runs (according to the provenance capture):

Original

Reference taxonomy md5sum: 7a6a191f859d769b73b26f13cf39146f
Reference reads md5sum: 564023273711720746140ddf7ab04bdb

Redo

Reference taxonomy md5sum: dc6759734fe261d55389c949c24b2a09
Reference reads md5sum: f2d0863e291be7105f6d44dd6e0ab6fc

The original taxonomy is 8 levels deep, while the redo is only 7 levels deep.

Hope that helps!

Thank you for the reply. They are actually the same in the content just the format, in the redo one, I removed the last ending colon from the original one ( before: …;s__xxx xxxx; now: …;s__xxx xxx). Would that effect the assignment?

Hi @Xio_Lee — yes, removing one semicolon will remove an entire taxonomic level from the database.

1 Like