Hi everyone,
There have been several posts in this forum outlining a current limitation of using Greengenes2: it lacks annotations of mitochondria or chloroplasts (at least until 2022.10). If I understood correctly, in the latest release 2024.09, the solution to this issue was still a workaround rather than a fully integrated phylogeny-derived taxonomy due to a technical difficulty of including mitochondria/chloroplast sequences in the phylogenetic backbone.
However, we’re still interested in continuing using GG2 for the reason that it seems to offer better species level annotations. In their paper they claimed “good concordance at the species level when comparing 16S and shotgun metagenomic data analyzed with the Greengenes2 tree”, and “46.5% of species-level leaves in the final tree are covered by a complete genome”, while SILVA has very clear warning for unreliable species labels.
I am fully aware of the resolution limits with 16S data therefore I’m not expecting high accuracy, but we would still like to retain species labels for record-keeping and downstream reporting.
I was wondering what could be the best practice to address this issue of GG2.
I ran taxonomy classification on two small 16S datasets (human oral and skin samples) comparing GG2 2024.09 and SILVA 138.2. In both datasets, sequences assigned to “d__Bacteria;p__Cyanobacteriota;c__Cyanobacteriia;o__Chloroplast;f__Incertae_Sedis;g__Incertae_Sedis;__” by SILVA were classified as “d__Bacteria;p__Cyanobacteriota;c__Cyanobacteriia;o__Cyanobacteriales;f__Coleofasciculaceae;g__Caldora;s__Caldora sp010672925” by GG2.
This aligns with what others have reported: SILVA and GG2 yield different classifications for chloroplast-associated sequences. (Note: my sample size was small, and I didn’t explicitly assess host/environmental contamination, this was more a quick exploratory check.)
Therefore I am considering this idea:
- Pre-filtering mitochondria and chloroplasts using SILVA, then
- Classify the remaining ASVs using GG2 to benefit from its species-level labeling.
I’ve seen a similar approach in a recent paper where authors “classified ASVs against the older GreenGenes (v13.8)” to remove mitochondria and chloroplasts. However I believe the older GG is quite outdated. I wonder if SILVA-based filtering would be more sensible here, or if mixing SILVA and GG2 would introduce unwanted inconsistencies in analysis and interpretation.
Curious to hear any suggestions or experiences on this topic. Thank you in advance!