Kraken2 versus QIIME2 (dada2) for PacBio Hifi reads


I have pacbio hifi reads data (whole 16S gene amplified) and I ran it through Karken2 pipeline with green genes database using Partek flow software and I also ran the same dataset using QIIME2-DADA2-green genes database as described in Analyzing PacBio HiFi Mock Community 16S Data with QIIME 2 · PacificBiosciences/pb-16S-nf Wiki · GitHub

The issue is that both outputs are showing exact opposite alpha diversity. Kraken pipeline shows that Shannon Index is significantly higher in Control vs. KO, whereas QIIME2-DADA2 output shows that Shannon Index is lower in Control vs. KO. Usually two different approaches never show contrasting trends. Significance might alter but not trend.

I need guidance in deciding which output is more reliable for publishing.

Please guide me in figuring this out.


Hello R,

This is tricky to compare because there are a lot of differences between the components of these pipelines and it's hard to isolate which is causing this change. But like you said:

That is concerning! :grimacing:

Did you include any mock communities with known compositions you could use to validate these two pipelines?

If not, you could use existing positive controls! When you run the ATCC msa 1003 mock samples from that PacBio tutorial through Partek, does it produce comperable results or are they different as well?

