Training classifiers using multiple marker genes

rhizorick · September 22, 2020, 4:19am

Hi, I’m wondering if it makes sense to train classifiers using multiple marker genes. I found a thread on here related to this, and I noticed that it appears to not be recommended: Combine different reference sequences databases

I’m asking again because part of the reason given to not take this approach in the original thread is that we usually only sequence one marker gene at a time. What if you have a dataset that genuinely contains sequences from multiple marker genes? Is it necessary/preferable to bin by marker gene prior to analyzing with QIIME2, or would a “mixed classifier” be useful in this scenario.

I appreciate any input you can provide.

jwdebelius · September 22, 2020, 7:30am

Hi @rhizorick,

I think the mixed classifier might be challenging in and of itself. But, can I ask a more basic question to maybe help understand where you’re working: do you have multiple samples where each sample has 1 marker gene, or do you have a set of samples where you’ve sequenced complementry marker genes (i.e. ITS, 16s?)

Best,
Justine

rhizorick · September 22, 2020, 7:47am

Hi Justine,

I appreciate your prompt response. Each sample has a mixture of marker genes.

Let me know if you need anymore info

jwdebelius · September 22, 2020, 7:52am

Hi @rhizorick,

The most common approach is to treat the marker genes as seperate, particularly if they come from multiple organisms. So, a mixture of 16s and 18s would be processed seperately because they’re looking at different aspects of the community. Additionally, i think it would be challenging to mix the data because of the compositional nature of most marker gene sequencing. So, I guess. the short answer is theoretically possible but maybe not advisable.

Best,
Justine

rhizorick · September 22, 2020, 7:58am

Justine,

Thanks again for your input!

Take care

system · October 23, 2020, 1:58pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.