Qiime2 and Kraken2

Hi,

I am writing because it has happened to me to be asked if Kraken2 would be a valuable tool to use in substitution of Qiime2 because it is faster and seems to be able to classify well.

In a previous post the forum kindly helped me to think about the problem of the database construction

Blockquote

Assessing the quality of matches can be an arduous process, especially if you have 100s or 1000s of sequences — and this is why other methods are used by QIIME 2 (and similar platforms) for taxonomic classification: to automate the process of taxonomic classification”

Blockquote

However in a recent, non - per reviewd paper

Here we show that, using the same simulated 16S rRNA metagenomic data as
previous studies, Kraken 2 and Bracken are up to 300 times faster and
also more accurate at 16S profiling than QIIME 2.

Could you please comment on?

Thanks a lot

Michela

Hi @MichelaRiba,
kraken2 is a taxonomy classifier, not an analysis platform, so cannot be compared to QIIME 2 itself (which is a software platform for building custom analysis workflows). The appropriate comparison is vs. q2-feature-classifier classify-sklearn naive Bayes classifiers with uniform taxonomic weights (the taxonomy classifier that was actually compared in that pre-print), not QIIME 2 or any of the other taxonomy classifiers or plugins available in QIIME 2.

Perhaps as a substitute for q2-feature-classifier (as another taxonomy classifier), but not for QIIME 2, as mentioned above.

From this point on it is worth acknowledging my biases, as a Q2 and q2-feature-classifier developer :wink:

Also full disclosure: I am not and was not in any way involved in the peer review of that pre-print, just in case anyone is speculating! But here’s my opinion:

That article shows some promising performance re: faster than q2-feature-classifier (not QIIME 2, the authors should have done better to differentiate these).

However, the authors’ own benchmarks show comparable accuracy to q2-feature-classifier, not better, so I disagree with that stated conclusion…

further, they compare vs. the standard classify-sklearn method using default uniform class weights to show comparable accuracy, we have already shown that using q2-clawback to build habitat-specific taxonomic weights can improve accuracy further… kraken2 would not capture those benefits:

https://www.nature.com/articles/s41467-019-12669-6

Some final additional thoughts:

  1. It would be great to see a kraken2 plugin! The faster runtime is an advantage, even if the accuracy is comparable to other methods. Ask the authors to make a plugin :wink: they can reach out to me directly for help.
  2. stand-alone kraken2 would break QIIME 2 provenance, losing one of the many advantages of QIIME 2: that processing decisions are recorded in provenance.

So if you are using QIIME 2 pipelines already, I’d recommend sticking with taxonomy classifiers in QIIME 2 to preserve provenance, unless if that faster runtime is critical (e.g., you are a service company and need to optimize turnaround time!). At least until a kraken2 plugin for QIIME 2 gets built, so that you can run kraken2 and preserve provenance :wink:

2 Likes

Hi Nicholas,
thanks a lot for the very fast and precise answer.

I just commented about the plugin on the paper forum and cross-referenced to your reply.

Maybe you can find it out there

when approved (by bioRxiv)

Thanks a lot for putting everything in the overall perspective

I saw a comment in the paper’s discussion mentioning that using Kraken2 instead of Qiime2 for 16S analysis would not be a good thing, maybe to use it in the classification of the OTU representative sequences, that if I have understood correctly, is something similar to what you propose: use in the classification step, and for this purpose create a convenient plugin.
Thanks a lot,

Michela