Hi, I’m wondering if it makes sense to train classifiers using multiple marker genes. I found a thread on here related to this, and I noticed that it appears to not be recommended: Combine different reference sequences databases
I’m asking again because part of the reason given to not take this approach in the original thread is that we usually only sequence one marker gene at a time. What if you have a dataset that genuinely contains sequences from multiple marker genes? Is it necessary/preferable to bin by marker gene prior to analyzing with QIIME2, or would a “mixed classifier” be useful in this scenario.
I think the mixed classifier might be challenging in and of itself. But, can I ask a more basic question to maybe help understand where you’re working: do you have multiple samples where each sample has 1 marker gene, or do you have a set of samples where you’ve sequenced complementry marker genes (i.e. ITS, 16s?)
The most common approach is to treat the marker genes as seperate, particularly if they come from multiple organisms. So, a mixture of 16s and 18s would be processed seperately because they’re looking at different aspects of the community. Additionally, i think it would be challenging to mix the data because of the compositional nature of most marker gene sequencing. So, I guess. the short answer is theoretically possible but maybe not advisable.