Training feature classifier after q2-fragment insertion

Li-Fang_Yeo · October 16, 2020, 6:00am

Hi,

I have two set of data, one sequenced on Illumina V3-V4 and another on V4 (515F/806R) primers. I used q2-fragment-insertion to produce an insertion-tree.qza to combine both sets.

I want to train feature classifier with q2-feature-classifier. Is it correct if I trained the classifier on V3-V4 primers? I am assuming both the primers have an overlap at V4 region? Otherwise how should i train the feature classifier?

Another question is, I read in the forum (Reads processing with different primers - #3 by Lu_Yang) that q2-fragment-insertion isn't recommended for regions that have overlap. Could you please elaborate? Because i suppose my set of data has an overlapping V4 region, so i shouldn't use fragment insertion?

Thank you!

andrewsanchez · October 16, 2020, 9:43pm

Hi @Li-Fang_Yeo, and welcome to the forum!

I think this note from the Moving Pictures Tutorial says it all:

Taxonomic classifiers perform best when they are trained based on your specific sample preparation and sequencing parameters, including the primers that were used for amplification and the length of your sequence reads. Therefore in general you should follow the instructions in Training feature classifiers with q2-feature-classifier to train your own taxonomic classifiers. We provide some common classifiers on our data resources page, including Silva-based 16S classifiers, though in the future we may stop providing these in favor of having users train their own classifiers which will be most relevant to their sequence data.

I'm not too sure about the second question, but I think Bod's answer here might be relevant as well: Extracting V4 region from V3-V4 data - #2 by Mehrbod_Estaki.

Li-Fang_Yeo · October 17, 2020, 1:58pm

Hi Andrew,

Thanks for replying! I've got it! I can train feature classifier separately for V3-V4 and V4 set, assign taxonomy before merging taxonomy using qiime feature-table merge-tax!

For the second question, that's the thread that i was following. Nicholas mentioned this
"@Mehrbod_Estaki’s suggestion is excellent (thank you for the suggestion!) — comparing datasets amplified with different primers is indeed what q2-fragment-insertion was designed for (to my knowledge). I would think that plugin would be most advantageous, however, when comparing datasets with non-overlapping amplicons. So you have other options."

I was wondering if the plugin was thus "disadvantegous" when comparing datasets WITH overlapping amplicons, and if that's what he meant then why? I couldn't find anything on the original paper.

Nicholas_Bokulich · October 17, 2020, 2:02pm

No it's not disadvantageous exactly, the emphasis shouls instead be placed on the "you have other options" part, because V3-V4 overlaps the V4, so trimming to the V4 could be another way to merge these. Why trim instead of using fragment insertion? Fragment insertion is great for beta diversity analyses when you have different primers, but it does not help with unifying other analyses like alpha diversity... trimming would throw out data (disadvantageous) but would allow you to compare both primer sets on "equal grounds". Just an opinion, though; I have not compared V3-4 to V4 myself so I could be overlooking something.

Li-Fang_Yeo · October 17, 2020, 3:20pm

Dear Nicholas,

Haha! Got it! Sorry i had to nitpick at your words, I was looking for clues to make informed decisions. Thank you very much! You guys have been immensely helpful!