Extracting sequences identified as a certain taxa and classifying with another database

I’m trying to further classify some sequences that match to a certain taxonomic level. More specifically, I have sequences that match to Staphylococcus and I’d like to see what species these would be classified to with a different custom database.
What would be the best way to do this? I was thinking I could just filter the sequences to have those feature ids matching to this bacteria, and then export those and do as I’ve done before, but I don’t think this would give me every single occurrence of each feature… instead I’d only have the representative ones.
Maybe I could just export those rep seqs, classify them, and then apply this classification to all of the sequences in my sample (basically assign this new taxonomy to the feature-table). But I think this would require me to develop/train a classifier for this and not sure if there’s some simpler way to do this I’m not thinking of.
Any thoughts on how to do this?

1 Like

Perhaps I misunderstand your question, but all you want is the representative sequences, so that each sequence is classified once.

So yes, use filter-seqs to grab the features of interest.

No need to export if you can use your custom reference database with one of QIIME 2’s classifiers.

Yep, you can do this! See qiime feature-table merge-taxa and read carefully how input order determines priority.

Well it all depends what you want to do. If you want to use a custom reference database, fine, but what classification method will you choose? If the classify-sklearn method in q2-feature-classifier, then yes you will need to train a new classifier. But you could also use the classify-consensus-vsearch classifier if you want to, e.g., search by alignment. I might actually recommend this — if you are searching against a custom reference database (which does not contain outgroups), you may want to set a high % identity threshold, maybe even use the exact match or top hit options, to make sure you are hitting the correct species with very high precision (classify-sklearn can also increase precision by increasing the confidence parameter).