Error converting Kraken 2 report into taxonomy artifact: Length of values does not match length of index

Hi,

I am currently following the qiime2 moshpit tutorial using version 2025.4.0 which I believe was installed using conda. I am getting an error message after running the mosh annotate kraken2-to-mag-features command.

Command I ran:

mosh annotate kraken2-to-mag-features \
–i-reports cache:kraken_reports_mags_derep_eukrayota \
–i-outputs cache:kraken_hits_mags_derep_eukaryota \
--o-taxonomy cache:taxonomy_mags_derep_eukaryota \
–use-cache cache \
–verbose

Error:

ValueError: Length of values (0) does not match length of index (28)
Plugin error from annotate:
Length of values (0) does not match length of index (28)

I’m not sure how to fix the issue. What does the length of values (which is appearing as 0) refer to? Any help would be appreciated, thanks!

Hello @Cindy,

It looks like this may be an edge case where some piece of data which was not expected to be empty is ending up empty. Are you comfortable sharing your inputs to this command (the kraken2 reports and outputs) so I can try to recreate the issue?

Thanks,
Colin

Hi @colinvwood ,

Thank you for your response! I got the same error when using the tutorial data. Attached are the kraken2 reports from the tutorial data.

60443d6c-1d7a-4046-8834-39c70a646ec3.report (1).txt (58.1 KB)

79c9a156-a66b-44d0-b59a-427d1e0cb7ab.report.txt (17.1 KB)

It won’t let me post more than two links, so here are the other two kraken2 reports.

a69c066d-62fd-473d-843a-de60e584b514.report.txt (6.0 KB)

dd293d93-953d-4ac2-a292-6b367b6ccadd.report.txt (116.4 KB)

And the last two kraken hit outputs. Sorry for the many separate comments.

60443d6c-1d7a-4046-8834-39c70a646ec3.output.txt (679.9 KB)

79c9a156-a66b-44d0-b59a-427d1e0cb7ab.output.txt (958.3 KB)

Two of the kraken hits outputs:

a69c066d-62fd-473d-843a-de60e584b514.output.txt (1.1 MB)

dd293d93-953d-4ac2-a292-6b367b6ccadd.output.txt (2.4 MB)

Hello @Cindy,

Which tutorial are you referencing? And which pairs of the reports and outputs that you posted give you this issue?

Thanks,
Colin

Hello @colinvwood ,

I am referencing the cocoa fermentation tutorial (:chocolate_bar: Cocoa fermentation - MOSHPIT documentation). I guess all pairs give the issue? The keys I provided as inputs in the command (eg. kraken_reports_mags_derep_eukaryota) encompasses the data for all four of the samples.

Hello @Cindy,

It looks like the software that finds the lowest common ancestor (LCA) among classifications for each MAG has a bug that does not allow it to properly handle the case when the LCA is the root of the taxonomy. We will work on a fix for this and let you know once it's done.

It sounds like you followed the tutorial you referenced exactly and with the tutorial-provided data, is that correct? Just checking so I can let the author(s) of the tutorial know that this step is broken.

Thank you,
Colin

1 Like

Thanks @colinvwood! It is with the cocoa fermentation tutorial-provided data, yes. I followed the tutorial, but it turns out not exactly. I had used the eukaryote lineage BUSCO dataset while the tutorial used the bacteria lineage for the BUSCO bin evaluation step. I just switched to the bacteria lineage at that step, then proceeding with those outputs, I am able to pass the mosh annotate kraken2-to-mag-features step where I initially got an error.

2 Likes

Hello @Cindy,

That makes sense. Since the dataset contained bacterial sequences, using a eukaryotic database to select MAGs left you with only low-quality (maybe spurious) eukaryotic MAGs. These were then classified with classify-kraken2 and given low-quality classifications which showed disagreement all the way to the domain level. And since all of the four MAGs had such disagreement, the edge case that resulted in the error you saw surfaced.

We created a fix for this issue here. That fix will then eventually be available in the next release.

Thanks,
Colin

2 Likes