Moshpit 2025.4.0: Co-assembling reads into contigs

Hello,

I have version 2025.4.0 of moshpit. I’m working with metagenomic data and want to co-assemble reads into contigs then perform downstream analyses.

In the “Assemble contigs with MEGAHIT” section of the 2025.4.0 documentation (Recovery of MAGs — MOSHPIT tutorials), it is mentioned that the –p-coassemble parameter is “…still under development: you will not be able to use the generated contigs for further analysis.”

I was wondering whether that statement regarding coassembly is global or only with regards to megahit; and also whether SPAdes coassembly outputs can be used for downstream analysis in moshpit.

Thanks,

Cindy

1 Like

Hey @Cindy,

Thank you for the question. Both, assemble-megahit and assemble-spades, actions support the coassemble option which would make the respective assembler use the reads from all the samples to generate a single set of contigs - the resulting artifact will be of type FeatureData[Contig]. The problem is that this artifact type is not yet allowed in any downstream action so you would not be able to proceed with the analysis of those. We want to make sure that all the downstream actions treat this data type correctly (as it has lost sample information) and we still need a bit of time to review those.

Would you mind sharing what you intended to do with those contigs downstream? This would help us prioritize where to expand the support for those contigs next.

Thanks!

Cheers,
Michal

2 Likes

Thanks for getting back, Michal. Cindy and I are part of the same team.
We plan to compare data between different experimental groups. Each group comprises of ~50 individuals and their associated WGS data.

Our analysis workflow is as follows:

  1. co-assemble reads from each group to obtain MAGs
  2. perform taxonomic and functional annotation of MAGs from each group
  3. for each group, map individuals back to their respective annotated MAGs to quantify microbial lineages and their (putative) functions
  4. subsequently perform within and between group statistical analysis (incl. differential abundance testing) on quantified lineages and (putative) functions

Happy to provide clarifications if necessary.

If you were to consider expanding support for these types of downstream analyses (sans statistical analysis that we can do elsewhere once we have the frequency tables), how long would that take?

Many thanks,
Nsa

1 Like

I'd like to add to this, as I'm also experimenting with this use case.

I previously built a small ad-hoc pipeline for going from reads -> contigs -> MAGs on a single sample (and I'd run it for many samples, collate, and then do stats and analysis with external tools). It's messy, I'm relatively new to genome assembly and I'm experimenting with what works. But within Q2, I had:

  1. qiime tools import on a directory with a single sample's paired-end reads
  2. qiime assembly assemble-spades with --p-meta
  3. qiime assembly index-contigs
  4. qiime assembly map-reads
  5. qiime annotate bin-contigs-metabat

After that point I've been pulling the MAG fastas out of the QZAs for downstream work with external tools, which so far has included CheckM1+2, dRep, GTDB-Tk, Bakta, and others.

Additionally, after running CheckM and dRep, I've been importing the resulting derep MAGs back into a FeatureData[MAG] QZA, and then running on them:

  1. qiime assembly index-derep-mags
  2. qiime annotate get-feature-lengths

And then once again per-sample:

  1. qiime assembly map-reads using the result of index-derep-mags
  2. qiime annotate estimate-abundance

And then merging the resulting feature tables for further downstream analysis. It's not strictly necessary to do everything on a per-sample basis where I've done it, but it's helped me keep things vaguely organized.

I was glad to see this thread as today I was going to start using the --p-coassembly True parameter and using per-subject inputs instead of per-sample. I was hoping this would work with minimal changes but I see why the type needs to be different after the sample information is gone. I think in the near-term I'm just going to do some questionable exporting and re-importing so I can get an idea of what I'm working with, and then I'll use spades, bowtie2, et.al. without Q2 to get more robust results if/when that causes issues. Hopefully this is helpful, looking forward to seeing this in future releases.

Hey @Adam_Cantor and @nerdynella,

thanks a lot for your invaluable input - knowing what kind of analysis you are interested in doing helps us a great deal as we know which work to prioritize first. I think it would be reasonably easy for us to expand support for the co-assembled contigs in the downstream actions to support something like you described @nerdynella - I think we could see this in the 2026.7 release (this is my non-binding best-effort promise :wink: ). For slightly more complex workflows like you described @Adam_Cantor, I'll need to see which elements are still missing/needed and we can think about those in the second pass.

If you have any other input/wishes/workflow suggestions we are always open for feedback!

Thanks,
Michal

4 Likes