High relative abundance at family level. How can I increase the % at genus level?

Nicholas_Bokulich · January 14, 2020, 5:41pm

Thanks for clarifying. Enterobacteraceae often bloom when certain samples types (like feces) are stored at room temp for even a short time without some sort of stabilizer. Sounds like that is not the case with your samples but just wanted to check!

As @jwdebelius noted, it is difficult to differentiate genera in this family with lots of 16S regions because those regions are identical between some of these genera. So your observation is common: most taxa are classified at genus level but Enterobacteraceae stick out like a sort thumb.

You might be able to do better with a bit of elbow grease. A few possibilities:

If you have prior information about species abundances in your sample type, you could give q2-clawback a spin: Using q2-clawback to assemble taxonomic weights
You could check out the classify-consensus-vsearch classifier to get a "second opinion" on the Enterobacteraceae sequences. While this classifier usually performs no better than classify-sklearn, it is a bit easier to fiddle with the parameters to adjust things like % identity threshold... it also has options for finding exact matches and only considering top hits.
You could reduce the --p-confidence with classify-sklearn... this will increase recall and reduce precision (i.e., reduce risk of underclassification (what you have now!) but increase the risk of getting a false-positive genus or species classification)
You could create a custom database of Enterobacteraceae species (i.e., grab an existing database and exclude all species that you know could not possibly exist in your insect specimens!). I am not a fan of this approach — I highly recommend approach #1 to making a custom database since it utilizes more information instead of throwing out information and making dubious assumptions (in other words, "never say never").