kraken2 vs qiime moshpit classify-kraken2

Dear all,
I have run the same analysis using kraken2 and qiime moshpit classify-kraken2 using the same sample and the same database

kraken2 --db nt_core_12_28_2024 --confidence 0.5 --threads 48
RESULT:
52.84 unclassified
47.16 root

qiime moshpit classify-kraken2
--i-seqs seq.qza
--i-kraken2-db kraken2_nt_core_12282024.qza
--p-threads 48
--p-confidence 0.5
--p-minimum-base-quality 20
--o-hits kraken2-hits.qza
--o-reports kraken2-reports.qza
--p-report-minimizer-data False
--verbose
RESULT:
67.9 unclassified
32.10 root

In my opinion the two methods shuld give the same results but the difference among the classified and unclussified is kind of big. The fastq have been prefiltered with kneaddata and the minimum base quality does not affect. Do you have any suggestion about the reason of this difference?

I also have quickly inspect the two reports and the taxonomy appears very much the same, but the numer of hits changes

I also was wondering if there is a way to import into qiime2 the kraken2 report that i've already run in order to apply qiime moshpit estimate-bracken
Kind regards
Carlo

Hi @Carlo77,
I just want to make sure that you ran the exact same command. Did you give Kraken2 a minimum-base-quality? I see it in your qiime moshpit classify-kraken2 but not in your Kraken2 command.

If you didn't give Kraken2 a minimum-base-quality filter that could definitely explain the discrepancy. Bases that do not meet Kraken2's minimum-base-quality will be turned into ambiguous bases and that effects the classification of that read.

2 Likes

Hi Chloe,

actually that is the only difference between the two commands but,to be honest, I've filtered the raw secuence with kneaddata and this is fastqc. Thus I thougt it would not make any difference to include a quality filtering...

I will try to run it in qiime2 without quality filtering and let's see...

Hi there,

I run it again without --p-minimum-base-quality 20, and the two results are the same as expected.

Now I wonder why it heppens if I've already cleaned and filtered my raw secuence. Does anyone know the reason?

kind regards
Carlo

Hi @Carlo77,

To properly explain what is happening here, we need to discuss 2 things:

  1. How Kraken deals with bases that are below the quality cutoff set by this --p-minimum-base-quality 20 parameter?

Kraken2 turns any base with a quality score below the defined threshold in to an ambigious N. This basically nukes classification coming from minimizers/K-mers that are trying to classify the read surrounding the ambigious N. So therefore, quality filtering with Kraken2 is very stringent and use of these parameter may lead to lower classifications. This seems bad but in general those classification will be based off of higher quality bases.

  1. How quality filtering deals with bases that are below their quality cutoff?

It depends on what filtering methods were used under the hood. If your filter was set to a minimum base quality and either tuncated before any base with a quality below 20 or threw away any sequence that had bases with a quality score below 20, then we would expect the results of kraken2 to be the same with or without the --p-minimum-base-quality 20 parameters.

However, most quality filtering steps are little more lientent, than what I just desecribed. Some filters have windows, for example you have to see a base with a quality score below 20, 3 times in a row before they truncate the read.

I am not sure how your filtering method does this under the hood, but I am sure that when you look into it, the filtering wont exactly parrallel what kraken2 is doing.

Lets look at an example:

A read with 2 bases with a quality score of 15 would not be filtered at all with our quality filtering method. But you gave Kraken2 that --p-minimum-base-quality 20 parameter those 2 bases would be switch to ambigous bases and reduce your classification compared to a kraken2 run without that filtering parameter.

3 Likes

Dear Chloe,

I greatly appreciate your feedback. I have carefully researched the kneaddata pipeline, which, among other things, uses trimmomatic with the following parameters: SLIDINGWINDOW:4:20 MINLEN:50, which confirms your explanation. Perhaps the FastQC graph had misled me a bit, but now I know that even though everything seems to have excellent quality, it doesn't rule out the presence of a few bases with 'low' quality.
Thank you so much for your time.

kind regards
Carlo

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.