MAG filtering prior functional annotation

Hello!

I am curious regarding the best approach for MAG filtering before functional annotation.

My workflow:

  • QC and host/feed DNA removal
  • Taxonomy (samples)
  • MAG binning
  • Filtering based on Completeness / Contamination
  • Dereplication
  • Taxonomy (MAGs)
  • Functional annotation (MAGs)

After I got the taxonomy profiles of all the samples (not MAGs), I noticed high presence of reads assigned to the host, feed and human (not the host), so I guess that first step didn't remove all the reads that should be excluded. I deleted such reads from the feature table by filtering based on taxonomy, but they were still among the reads in samples. Then, I got taxonomies for dereplicated MAGs. Among them, some MAGs are completely Unassigned/Unclassiified, some of them annotated as "Cellular organism" only. My question if it is recommended to delete any MAGs that were not annotated as at least "d__Bacteria" or "d__Archaea" for functional annotation, or it is better to use all MAGs dereplicated MAGs, even if they were not assigned to "d__Bacteria" or "d__Archaea"?

Best,
Timur

1 Like

Hi Timur,

Your workflow looks reasonable to me :slight_smile: Filtering really depends on your research question, but for bulk functional annotation I’d drop any MAGs not confidently classified as "d__Bacteria" or "d__Archaea"—unless you’re specifically chasing novel lineages. This keeps your annotations clean, focused on "bona fide" prokaryotes, and avoids wasting computational resources!

All the best,

Paula

4 Likes

Hi Paula,

Thank you for your reply! Since I was already thinking about it, I will add this step to the pipeline.
For now, I subset all my KO annotations to the pathways of interest and parsed ortholog tables for each module included in the selected pathways. I then extracted KO annotations that originated from Bacteria and Archaea, and filtered my KO table for DA analyses. Otherwise I am afraid that there are too strong penalties for multiple comparisons of KOs that are not even from Bacteria or Archaea.

Best,
Timur