Qiime2 and Kraken2

MichelaRiba · November 9, 2020, 9:01am

Hi,

I am writing because it has happened to me to be asked if Kraken2 would be a valuable tool to use in substitution of Qiime2 because it is faster and seems to be able to classify well.

In a previous post the forum kindly helped me to think about the problem of the database construction

Blockquote

Assessing the quality of matches can be an arduous process, especially if you have 100s or 1000s of sequences — and this is why other methods are used by QIIME 2 (and similar platforms) for taxonomic classification: to automate the process of taxonomic classification”

Blockquote

However in a recent, non - per reviewd paper

Here we show that, using the same simulated 16S rRNA metagenomic data as
previous studies, Kraken 2 and Bracken are up to 300 times faster and
also more accurate at 16S profiling than QIIME 2.

Could you please comment on?

Thanks a lot

Michela

Nicholas_Bokulich · November 9, 2020, 11:39am

Hi @MichelaRiba,
kraken2 is a taxonomy classifier, not an analysis platform, so cannot be compared to QIIME 2 itself (which is a software platform for building custom analysis workflows). The appropriate comparison is vs. q2-feature-classifier classify-sklearn naive Bayes classifiers with uniform taxonomic weights (the taxonomy classifier that was actually compared in that pre-print), not QIIME 2 or any of the other taxonomy classifiers or plugins available in QIIME 2.

Perhaps as a substitute for q2-feature-classifier (as another taxonomy classifier), but not for QIIME 2, as mentioned above.

From this point on it is worth acknowledging my biases, as a Q2 and q2-feature-classifier developer

Also full disclosure: I am not and was not in any way involved in the peer review of that pre-print, just in case anyone is speculating! But here's my opinion:

That article shows some promising performance re: faster than q2-feature-classifier (not QIIME 2, the authors should have done better to differentiate these).

However, the authors' own benchmarks show comparable accuracy to q2-feature-classifier, not better, so I disagree with that stated conclusion...

further, they compare vs. the standard classify-sklearn method using default uniform class weights to show comparable accuracy, we have already shown that using q2-clawback to build habitat-specific taxonomic weights can improve accuracy further... kraken2 would not capture those benefits:

https://www.nature.com/articles/s41467-019-12669-6

Some final additional thoughts:

It would be great to see a kraken2 plugin! The faster runtime is an advantage, even if the accuracy is comparable to other methods. Ask the authors to make a plugin they can reach out to me directly for help.
stand-alone kraken2 would break QIIME 2 provenance, losing one of the many advantages of QIIME 2: that processing decisions are recorded in provenance.

So if you are using QIIME 2 pipelines already, I'd recommend sticking with taxonomy classifiers in QIIME 2 to preserve provenance, unless if that faster runtime is critical (e.g., you are a service company and need to optimize turnaround time!). At least until a kraken2 plugin for QIIME 2 gets built, so that you can run kraken2 and preserve provenance

MichelaRiba · November 9, 2020, 12:46pm

Hi Nicholas,
thanks a lot for the very fast and precise answer.

I just commented about the plugin on the paper forum and cross-referenced to your reply.

Maybe you can find it out there

when approved (by bioRxiv)

Thanks a lot for putting everything in the overall perspective

I saw a comment in the paper's discussion mentioning that using Kraken2 instead of Qiime2 for 16S analysis would not be a good thing, maybe to use it in the classification of the OTU representative sequences, that if I have understood correctly, is something similar to what you propose: use in the classification step, and for this purpose create a convenient plugin.
Thanks a lot,

Michela

Angelica · January 23, 2021, 9:06pm

A Kraken2 plugin would be most useful, but can it really work at the end of the process as a classifier? The whole concept of Kraken2 analysis seems different to me than the ASV in QIIME2. Kraken2 would be more useful as a plugin in QIIME2 platform integrated as part of a new pathway of analysis following another concept. It could also provide a double pathway of analysis both of 16S data and WGS data which is as I understand it the purpose for which Kraken2 was built in the first place. However, regarding 16S Kraken2 plugin would offer propably more accuracy in the analysis of data derived from Ion Torrent platform. Most of the analysis of 16S in QIIME2 is standardized for Illumina reads as far as I have seen until now, despite recent efforts for the creation of an Ion torrent pipeline.

Nicholas_Bokulich · January 26, 2021, 7:09am

yes — and the benefit of using it in such a pipeline (instead of classifying fastq seqs) is that the reads can be denoised to remove/correct erroneous reads.

QIIME 2 is just a platform of many different tools — there are no required steps. Rather, the current tutorials follow a similar workflow that is typical for processing 16S reads. There is no reason why a kraken2 plugin for QIIME 2 could not be integrated in a different way if that is what the researcher desires...

Correct, kraken2 was originally designed for shotgun metagenome sequence data, and so it would be useful for the same application if integrated with QIIME 2.

Rather, I'd say that the tutorials are written specifically for Illumina. This is because most of the QIIME 2 developers are most familiar with Illumina data, and Illumina is overwhelmingly more common for 16S data that Ion Torrent.

But QIIME 2 is an openly developed platform and so we depend on the community to write tutorials tailored to their specific pipelines, etc. It would be wonderful to see a community contributed tutorial for Ion Torrent data or other platforms.

There is not anything preventing users from using QIIME 2 for Ion Torrent data, nor is QIIME 2 tailored specifically for Illumina data... it's just that nobody has "blazed the trail" by writing an Ion Torrent tutorial for others to follow. But that does not mean that QIIME 2 would necessarily deliver lower accuracy for ion torrent reads, just that users need to be saavy about the unique characteristics of their data that may require different processing steps — e.g., those using Ion Torrent (particularly the 16S kit) report features like mixed-orientation reads and mixed amplicons from multiple variable regions. These can be handled in QIIME 2 and its plugins (e.g., q2-dada2 denoise-pyro for denoising ion torrent reads, RESCRIPt to orient reads, then classify with a full-length 16S classifier).

But this discussion of Ion Torrent strays from this topic, in my opinion, so if you have more questions regarding Ion Torrent analysis with QIIME 2 please open a separate topic and we can discuss there.