train a sequence classifier on your own data

chupperts · June 27, 2025, 9:17am

Dear,

I’m currently trying to understand the process of building and using a classifier, but I haven’t been able to find clear answers in the tutorials or forums.

To make things clearer, I’ll briefly describe our project:

We aim to analyze the microbiome of piglets at different stages: at weaning (4 weeks), post-weaning (8 weeks), and adulthood (6 months).

I already trained a classifier with files from the greengenes release (2022) for a previous experiment (But I'm not sure I used the best way!). Based on this, I wanted to prepare a new classifier using the following files from the Greengenes2 (2024) release: 2024.09.seqs.fna.qza and 2024.09.taxonomy.id.tsv.qza . With this classifier, I will be able to assign taxonomy to my data.

Here are my questions:

From this point, is it possible to train a new classifier using my own sequences and taxonomy files, based on the taxonomy already assigned?
Would it be better to create separate classifiers for each piglet age group (e.g., one for young piglets and another for adults), or is a single classifier sufficient for all age groups?

Thank you in advance for your help.

timanix · June 27, 2025, 10:29am

Hello!

If you are asking about the technical possibility, then yes, you can train the classifier with your sequences or taxa.
However, I would not recommend this approach, as it limits the database to the scope of the previously annotated community. You can lose a lot of information!

The single classifier would be better

Looks like the best option to me!

Check out RESCRIPt for tutorials on how to train the classifier (though you need to adapt the commands for predownloaded GG2 sequences).

Or check if GG2 repo and this post have some insights.

Best,

chupperts · June 27, 2025, 11:06am

Thank you very much !

system · July 28, 2025, 5:07pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.