In my opinion the two methods shuld give the same results but the difference among the classified and unclussified is kind of big. The fastq have been prefiltered with kneaddata and the minimum base quality does not affect. Do you have any suggestion about the reason of this difference?
I also have quickly inspect the two reports and the taxonomy appears very much the same, but the numer of hits changes
I also was wondering if there is a way to import into qiime2 the kraken2 report that i've already run in order to apply qiime moshpit estimate-bracken
Kind regards
Carlo
Hi @Carlo77,
I just want to make sure that you ran the exact same command. Did you give Kraken2 a minimum-base-quality? I see it in your qiime moshpit classify-kraken2 but not in your Kraken2 command.
If you didn't give Kraken2 a minimum-base-quality filter that could definitely explain the discrepancy. Bases that do not meet Kraken2's minimum-base-quality will be turned into ambiguous bases and that effects the classification of that read.
actually that is the only difference between the two commands but,to be honest, I've filtered the raw secuence with kneaddata and this is fastqc. Thus I thougt it would not make any difference to include a quality filtering...
To properly explain what is happening here, we need to discuss 2 things:
How Kraken deals with bases that are below the quality cutoff set by this --p-minimum-base-quality 20 parameter?
Kraken2 turns any base with a quality score below the defined threshold in to an ambigious N. This basically nukes classification coming from minimizers/K-mers that are trying to classify the read surrounding the ambigious N. So therefore, quality filtering with Kraken2 is very stringent and use of these parameter may lead to lower classifications. This seems bad but in general those classification will be based off of higher quality bases.
How quality filtering deals with bases that are below their quality cutoff?
It depends on what filtering methods were used under the hood. If your filter was set to a minimum base quality and either tuncated before any base with a quality below 20 or threw away any sequence that had bases with a quality score below 20, then we would expect the results of kraken2 to be the same with or without the --p-minimum-base-quality 20 parameters.
However, most quality filtering steps are little more lientent, than what I just desecribed. Some filters have windows, for example you have to see a base with a quality score below 20, 3 times in a row before they truncate the read.
I am not sure how your filtering method does this under the hood, but I am sure that when you look into it, the filtering wont exactly parrallel what kraken2 is doing.
Lets look at an example:
A read with 2 bases with a quality score of 15 would not be filtered at all with our quality filtering method. But you gave Kraken2 that --p-minimum-base-quality 20 parameter those 2 bases would be switch to ambigous bases and reduce your classification compared to a kraken2 run without that filtering parameter.
I greatly appreciate your feedback. I have carefully researched the kneaddata pipeline, which, among other things, uses trimmomatic with the following parameters: SLIDINGWINDOW:4:20 MINLEN:50, which confirms your explanation. Perhaps the FastQC graph had misled me a bit, but now I know that even though everything seems to have excellent quality, it doesn't rule out the presence of a few bases with 'low' quality.
Thank you so much for your time.