Merging NGS with TGS?

As the title suggests, one of the clients at my core lab is asking me to analyze a batch of data generated by NGS with a new batch of data generated by TGS. Now I have done DADA2 -> merge for two different NGS runs using the same primer before, and I know there are some protocols regarding merging results for two NGS runs using different primers, but I never heard of merging NGS with TGS results. For context, the sample is human stool, NGS targets v3v4 region and TGS targets the full length.

Hello!

Full-length TGS - you mean Nanopore 16S reads?
If yes, I would avoid merging datasets at the ASV/sequences level. Besides different lengths and primers, Nanopore has a much higher error rate, and the sequences it produces are more variable due to the introduced errors (the same sequence produces numerous copies with different errors).

The only way to merge such data is to process both datasets until taxonomy annotation, use the same database (such as GTDB), and then merge collapsed taxonomy counts. But personally, I would avoid merging them at all.

Hope that helps.

Best,
Timur

I think our protocol is PacBio HiFi 16S reads, whose error rate should be lower from what I heard.in your opinion do you think it changes things, i.e. justifies merging?

1 Like

Yes, you are right - PacBio reads have a much lower error rate.
Still, there is an issue of different lengths.

Technically, you can use dada2 for both datasets and merge them:

  • by using GreenGenes2 or fragment insertion plugins within Qiime2;
  • merging collapsed to taxonomy tables (same database for both).
  • extracting V3-V4 region from PacBio reads based on the primers used in the V3-V4 dataset.

However, you should be aware that you still get the biases due to different primers and PCRs - certain groups of bacteria can be amplified differently with each primer set, leading to biased abundances at the end. It is not crucial when all samples are processed in the same way, but it should be considered when merging datasets amplified with different primers.

Another question is how the samples are distributed across datasets. For example, you want to compare Group A with Group B. If both groups A and B have samples sequenced with both Illumina and PacBio, with equal proportions, then the biases I mentioned above shouldn't be a big problem since both groups have samples processed in the same way. Still, it increases the variability within each group, which affects statistical analyses. If Group A was sequenced with Illumina and Group B with PacBio, the differences between them could be due either to the group effect (what we usually check for) or to different primers used for amplification and to the sequencing approach. I would be skeptical of any conclusions based on such comparisons.

I am leaving this topic queued in case other mods have better suggestions. Welcome to join!

2 Likes