Analysis of fungi at LSU (28S)

Hi @F18D029G,
The issue does indeed appear to be how the database is formatted. The taxonomy file is not annotated at even levels. E.g.,

FP929119.18337.21721	D_0__Eukaryota;D_1__Fungi;D_2__Dikarya;D_3__Ascomycota;D_4__Pezizomycotina;D_5__Dothideomycetes;D_6__Pleosporomycetidae;D_7__Pleosporales;D_8__Pleosporineae;D_9__Leptosphaeriaceae;D_10__Leptosphaeria;D_11__Leptosphaeria maculans complex;D_12__Leptosphaeria maculans JN3;D_13__;D_14__;D_15__;D_16__;D_17__;D_18__;D_19__;D_20__;D_21__
AABX03000368.68941.72469	D_0__Eukaryota;D_1__Fungi;D_2__Dikarya;D_3__Ascomycota;D_4__Pezizomycotina;D_5__Sordariomycetes;D_6__Sordariomycetidae;D_7__Sordariales;D_8__Sordariaceae;D_9__Neurospora;D_10__Neurospora crassa OR74A;D_11__;D_12__;D_13__;D_14__;D_15__;D_16__;D_17__;D_18__;D_19__;D_20__;D_21__

These are both fungi, yet “D_12” is species level for the one, and “D_10” for the other. These uneven depths will cause issues for training a Naive Bayes classifier and also for consensus-based classifiers.

You will need to figure out a way to reformat these taxonomy strings to have even ranks. See this procedure that @SoilRotifer put together — a SILVA 7-level taxonomy formatting script is in there somewhere. As far as I know, that was designed for the SILVA 16S, not LSU, but it is worth a spin!

I hope that helps!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.