Error when using qiime taxa barplot after fragment-insertion

Hi everyone,

I used fragment-insertion sepp to insert some sequences to a reference tree. After that, I used fragment-insertion filter-features to filter my taxonomic table and used fragment-insertion classify-otus-experimental to finish taxonomic classification.

I got filtered_table.qza and taxonomy.qza from steps above, and then I tried to use qiime taxa barplot to generate barplot but I faced an ERROR here.

STDERR is like:

Plugin error from taxa:
'float' object has no attribute 'split'
Debug info has been saved to /tmp/qiime2-q2cli-err-hgh9uksi.log

I converted taxonomy.qza to tsv file and found that it didn't have "Confidence" coloumn compared with other taxonomy files from qiime feature-classifier classify-sklearn. So my question is:

  1. Is the lack of "Confidence" column cause this error?
  2. If yes, how can I fix it? If no, then what cause this error?

I also attached the error file and .qza files I used. Hope someone can help me! Thank you all.

Best,
Julio

qiime2-q2cli-err-hgh9uksi.txt (2.3 KB)
filtered_table.qza (2.7 MB)
taxonomy.qza (2.3 MB)

Hello @Julio,

The issue is that some of the features in your taxonomy have no classification, which get encoded as missing values in python and which have type float, as you see in the error. The empty features are:

['4682b09297eb176b7b56c8b4d93ca220', '4eed6cfc74e8e86b7ff877ff1ad4034b', '8444dddefa8240c4296f0bd0ed5deb61', 'aed4f784049d42add23b04bb8ec5d6e1']

We generally make the assumption in qiime2 that taxonomy files do not have missing classifications--if a classifier is unsure about the classification of a feature it just gives the root taxonomic node that the classifier is aware of. Thus, there is probably a bug, or at least a lack of awareness of this precedent, in the fragment-insertion classify-otus-experimental action.

In the meantime before this behavior is fixed I think you can work around this problem by using the feature-table filter-features action to filter the problematic features that I listed above and then reclassify. Let us know if this works.

Colin

1 Like

Hi @colinvwood,

Thanks for your reply, it is really helpful!

I tried this and it worked. Besides, I am thinking to manually edit the missing values.

First, I tried qiime rescript edit-taxonomy, hoping that it could directly edit the .qza file. But I found it didn't work, maybe because it cannot search empty strings.

Then, I wrote a simple shell script (rely on q2cli) to fix this. Generally, thetaxonomy.qza will be transformed to .tsv file to detect and fix the missing classifications, and then the .tsv file will be transformed back to taxonomy_fixed.qza. I used this fixed file to generate barplot by qiime taxa barplot, and it worked. Here, I will attached the shell script (only .txt file are allowed here, but you can rename it) and usage. Hope that it will be helpful.

Best,
Julio

Usage:
        qiime_fg_taxo_fix.sh [-o OUTPUT_DIR] -r ROOT -i INPUT_FILE

Options:
        -r      Root of the reference taxonomy. e.g. 'd__Bacteria' or
                'd__Archaea'.
        -i      Taxonomy file to be fixed. QIIME2 format or TSV format. For
                QIIME2, it should be a .qza file getting from 'qiime
                fragment-insertion classify-otus-experimental'. For TSV, it
                should have two columns: 'Feature ID' and 'Taxon'.
        -o      Output directory for the results. Default is same as input.
        -h      Show this help message and exit.

qiime_fg_taxo_fix.txt (2.9 KB)

1 Like

Hello @Julio,

I'm glad you were able to resolve the issue. Thank you for taking the time to create that workaround script and sharing it! The only drawback with exporting, editing, and re-importing the taxonomy is the provenance of the original artifact will be lost. But it's a good workaround until we have the issue fixed.

1 Like

Hi @Julio,

Thanks for providing that workaround!

This issue has made me think we should consider adding a RESCRIPt action to replace taxonomy strings based on the feature-ids, partially to handle cases like this. Also to provide a way to simply update the taxonomy of specific IDs.

Just a quick aside... edit-taxonomy won't work here as there needs to be a string, even an empty string, for it to work. Right now your taxonomy file has no sting after the ID for the unassigned taxa, not even an empty string. If it did, then simply using a regex like ^$ to find the empty string to replace would work. You could simply add a tab and space character after the ID, then import, then edit-taxonomy would work. :slight_smile:

-Cheers!

1 Like

Hi @SoilRotifer,

Thanks for your reply!

Actually, it does have a tab after the IDs without classification, but lacking a space. I tried ^$ to search the strings but didn't work. I am curious about how --p-search-strings matches the patterns, can it match tab as \t like sed? Then maybe I can search \t$ and replace it with \td__Bacteria.

Best,
Julio

Like I said you need to have a tab followed by a space. When I edited your taxonomy file to have a space, I was able to replace these with any string I've chosen.

I am wondering if this part of the code is the culprit.

That is, if I explicitly run like so, I can use '' as a key:

In [30]: d = {"" : "d__Bact"}
In [31]: d
Out[31]: {'': 'd__Bact'}
In [32]: d['']
Out[32]: 'd__Bact'

But if I run the following, this does not work

In [33]: d2 = dict(zip('', 'd__Bact'))
In [34]: d2
Out[34]: {} # the '' is invalid?

Unless I convert the empty string'' to a string with a space' ':

In [35]: d3 = dict(zip(' ', 'd__Bact'))
In [36]: d3
Out[36]: {' ': 'd__Bact'}

There might be something else at play... but this is what I've been able to narrow it down to. But since the file you are trying to use was made incorrectly, i.e. no taxonomy label, you'd normally not see this. In fact, the other tools one would try to use to edit / filter the taxonomy file won't work either, not even our qiime rescript filter-taxa because there is no taxonomy. I might consider adding some code that looks for a missing string value or takes a '' string and converts it to a ' ' string so that empty or whitespace string searches like ^\s*$ will work.

I am curious as to how it is possible that a taxonomy file could even be written without a taxonomy string. Makes me think we need to update the test code to validate the taxonomy output. :thinking:

1 Like

Thanks for your explanation! In my experiences, the empty classification only happened when using fragment-insertion classify-otus-experimental. So far, I will use the shell script to solve this problem.

1 Like

Hi @Julio , thanks for reporting this issue! I am not yet clear about what is actually going wrong here. To better debug, I’d also need to have your insertion tree artifact as well as the reference_taxonomy artifact. I can deduce the representative_sequences artifact. Basically, I need all three inputs of the classify-otus-experimental method.

Could you post them as well?
Many thanks
Stefan

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.