Making the classifier small

Hi @jbbq

I think what you are proposing sounds reasonable. Though I'd leave in at least a handful of microbial sequences within your "non-microbe" classifier, and vice versa for your "microbe" classifier. Just in case something is incorrectly classified at the Domain level. That is, there should always be several outgroup taxa (domains in this case) in your classifier.

There are some other avenues you might try to exclude non-microbial reads:

Perhaps run one of the above two procedures... then only add a handful of representative non-microbial taxa to your microbial classifier as the representative outgroups.

How are you making your reference database? Have you tried the RESCRIPt plugin? The tutorials hint at a few ways you can reduce the size of the reference database, e.g. dereplicate the reference sequences, use the amplicon region.

-Good luck!

1 Like