Recommended Step for Merging Metagenomics Datasets

charlesalexandreroy · July 16, 2025, 1:50am

Hi all,

When merging multiple datasets, I know that for 16S data, it's recommended to denoise each run separately and then merge your feature tables. I'm wondering what the recommendation is for metagenomics data? Per the MOSHPIT tutorial, a comparable point would seemingly be after the Host read removal step of quality control, but just want to get people's thoughts on this.

Thanks so much!

timanix · July 16, 2025, 6:37am

Hello!

This recommendation is specific to the DADA2 algorithm, which contains an error-training step. For example, DEBLUR can denoise multiple runs together with no issues.
For metagenomic data, each sample is sequenced independently, and there is no such factor as sequencing run.
When working with big datasets, I split all the samples into batches and run each batch as independent job at server and then merge the outputs. But I do it for parallelization (submitting batches in parallel), not because of the sequencing run.