Unicode error wtih bibliography file import in 2020.11

Thanks! Looking forward to playing with this update! (our 2020.2 env broke during a server migration, as good an excuse to update as ever!)

I'm running QIIME2 in a jupyter notebook. I was having some issues with two modules specifically on import - seems to be a Unicode Decode Error with the citations.bib file?

from qiime2.plugins import feature_table

ValueError: There was a problem loading the BiBTex file:'/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/q2_feature_table/citations.bib'

from qiime2.plugins.metadata.visualizers import tabulate

ValueError: There was a problem loading the BiBTex file:'/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/q2_feature_table/citations.bib'

Interestingly, I get the same error with python functions calling q2:

stats_filenamelist = glob.glob('*stats.qza')

def read_dada_denoise_stats(fn):
    table = q2.Artifact.load(fn)
    df = table.view(q2.Metadata).to_dataframe()
    df['seq_run'] = fn[0]
    df['f_trunc'] = fn[fn.find('-')+1:fn.find('_')]
    df['r_trunc'] = fn[fn.find('_')+1:fn.rfind('-')]
    df.index.names = ['sampleid']
    return df
        
stats_df = pd.concat([read_dada_denoise_stats(fn) for fn in stats_filenamelist])


print('Number of rows:', len(stats_df), '\nUnique sequence runs:', stats_df['seq_run'].unique())

error I've only copied the full error thrown after the python chunk and copied the valueError for the import statements.

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/core/cite.py in load(cls, path, package)
     31             try:
---> 32                 db = bp.load(fh, parser=parser)
     33             except Exception as e:

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/bibtexparser/__init__.py in load(bibtex_file, parser)
     70         parser = bparser.BibTexParser()
---> 71     return parser.parse_file(bibtex_file)
     72 

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/bibtexparser/bparser.py in parse_file(self, file, partial)
    176         """
--> 177         return self.parse(file.read(), partial=partial)
    178 

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/encodings/ascii.py in decode(self, input, final)
     25     def decode(self, input, final=False):
---> 26         return codecs.ascii_decode(input, self.errors)[0]
     27 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 804: ordinal not in range(128)

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-27-bc8ff275b40c> in <module>
     10     return df
     11 
---> 12 stats_df = pd.concat([read_dada_denoise_stats(fn) for fn in stats_filenamelist])
     13 
     14 

<ipython-input-27-bc8ff275b40c> in <listcomp>(.0)
     10     return df
     11 
---> 12 stats_df = pd.concat([read_dada_denoise_stats(fn) for fn in stats_filenamelist])
     13 
     14 

<ipython-input-27-bc8ff275b40c> in read_dada_denoise_stats(fn)
      2 
      3 def read_dada_denoise_stats(fn):
----> 4     table = q2.Artifact.load(fn)
      5     df = table.view(q2.Metadata).to_dataframe()
      6     df['seq_run'] = fn[0]

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/result.py in load(cls, filepath)
     64     def load(cls, filepath):
     65         """Factory for loading Artifacts and Visualizations."""
---> 66         archiver = archive.Archiver.load(filepath)
     67 
     68         if Artifact._is_valid_type(archiver.type):

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/core/archive/archiver.py in load(cls, filepath)
    305         rec = archive.mount(path)
    306 
--> 307         return cls(path, Format(rec))
    308 
    309     @classmethod

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/core/archive/format/v1.py in __init__(self, archive_record)
     27 
     28     def __init__(self, archive_record):
---> 29         super().__init__(archive_record)
     30 
     31         self.provenance_dir = archive_record.root / self.PROVENANCE_DIR

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/core/archive/format/v0.py in __init__(self, archive_record)
     71         self.uuid = _uuid.UUID(uuid)
     72         self.type = sdk.parse_type(type)
---> 73         self.format = sdk.parse_format(format)
     74 
     75         self.path = path

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/util.py in parse_format(format_str)
     86         return None
     87 
---> 88     pm = qiime2.sdk.PluginManager()
     89     try:
     90         format_record = pm.formats[format_str]

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py in __new__(cls, add_plugins)
     52         if cls.__instance is None:
     53             self = super().__new__(cls)
---> 54             self._init(add_plugins=add_plugins)
     55             cls.__instance = self
     56         else:

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/sdk/plugin_manager.py in _init(self, add_plugins)
     79                 project_name = entry_point.dist.project_name
     80                 package = entry_point.module_name.split('.')[0]
---> 81                 plugin = entry_point.load()
     82 
     83                 self.add_plugin(plugin, package, project_name)

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/pkg_resources/__init__.py in load(self, require, *args, **kwargs)
   2470         if require:
   2471             self.require(*args, **kwargs)
-> 2472         return self.resolve()
   2473 
   2474     def resolve(self):

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/pkg_resources/__init__.py in resolve(self)
   2476         Resolve the entry point from its module and attrs.
   2477         """
-> 2478         module = __import__(self.module_name, fromlist=['__name__'], level=0)
   2479         try:
   2480             return functools.reduce(getattr, self.attrs, module)

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/q2_feature_table/plugin_setup.py in <module>
     19                        feature_table_merge_three_tables_example)
     20 
---> 21 citations = Citations.load('citations.bib', package='q2_feature_table')
     22 plugin = Plugin(
     23     name='feature-table',

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/qiime2/core/cite.py in load(cls, path, package)
     33             except Exception as e:
     34                 raise ValueError("There was a problem loading the BiBTex file:"
---> 35                                  "%r" % path) from e
     36 
     37         entries = collections.OrderedDict()

ValueError: There was a problem loading the BiBTex file:'/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/q2_feature_table/citations.bib'

Thanks!

Hi @hsapers,

Not an answer, but I moved this into a separate topic to make it easier to find an explore. Have you added any additional plugins?

Best,
Justine

1 Like

Thanks - no additional plug-ins - just ran

wget https://data.qiime2.org/distro/core/qiime2-2020.11-py36-linux-conda.yml
conda env create -n qiime2-2020.11 --file qiime2-2020.11-py36-linux-conda.yml
# OPTIONAL CLEANUP
rm qiime2-2020.11-py36-linux-conda.yml

and

 jupyter serverextension enable --py qiime2 --sys-prefix

I do have several other python packages in the env for jupyter labs and visualizations

Don't know if that helps here but I had similar problems in the past and they were due to the wrong locale on a server. So it might be worth a shot to precede the qiime commands with export LC_ALL=en_US.UTF-8 and see if that works.


5 Likes

thanks @cdiener didn't seem to help here. I'm wondering if there is something specific to the Jupyter install

1 Like

actually - this fixes the problem some of the time - looking for a pattern....

3 Likes

With simple commands in Jupyter, I can prefix them like this (@cdiener's response) and they run in the notebook fine:

!export LC_ALL=C.UTF-8; qiime metadata tabulate \
--m-input-file subset_metadata.tsv \
--o-visualization tabulated-sample-metadata.qzv

But for import statements and any more complicated command or function that calls q2, I still get the value error:

ValueError: There was a problem loading the BiBTex file:'/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/q2_feature_table/citations.bib'

and it always seems to be related to q2_feature_table/citations.bib independent of the original call. Seems specific to the Jupyter API.

If you set the locale at the system level rather than the command level, you should be all sorted out. Ultimately, you will need to ensure that you have a UTF-8 locale set for all QIIME 2 commands, and setting it at the system or user level is the easiest way to do that.

You can do this by adding that "export LC_ALL..." statement to your shell's rc or config file. Don't forget to log out and log back in, and relaunch jupyter.

2 Likes

Thanks @thermokarst!

I can certainly try setting the locale at the system level, I'm just not sure why this is an issue with v2020.11. Right before updating, 2020.2 did not seem to have any issues, and nothing has changed on the server that I'm aware of. Does the locale need to be explicitly exported for a change between QIIME2 versions?

No - UTF-8 has always been a requirement for QIIME 2. My guess is that something changed somewhere on your system. Please note, this isn't really specific to QIIME 2 - this is actually mostly driven by the underlying dependencies of QIIME 2. The specific error you are seeing here is with a third party library, not being able to read the bibliography file because there are non-ASCII characters in it, and your system hasn't communicated to this library how it should be reading the non-ASCII characters (this is why you will need to set the locale). Hope that helps!

1 Like

Thanks - I figured it's likely something with Jupyter lab, Qiime2 commands run fine on command line. Going to try and isolate it - I just wasn't sure if it was something specific to the python-API - but it sounds like this hasn't changed. Will try a new isolated environment

2 Likes

Hello, I'm jumping in to say that I have always had this problem with Qiime (1 and 2) and a few other analyses programs too. It is due to the way our University Computers are set up. I now put the following 2 lines at the top my command line scripts and it has mostly solved the problem. I must use both lines though! I'm not going to guarantee the call is written exactly correctly for you. I found this work around several years ago on a forum and I had to play with the way utf-8 is written (caps vs lower case) and a few other things to get it to work correctly on my system. Good luck.

export LC_ALL=en_US.utf-8
export LANG=en_US.utf-8

3 Likes

Yeah, I agreed with @Paige_M_Miller, I had the same problem with our computer system. Still I’m using those two lines.

1 Like

Thanks everyone - I've been working on this since I re-created the qiime2-2020.2 env and everything was fine there. I poked around and found some interesting peculiarities with nodejs that were causing jupyter lab build to fail. I believe this is what was causing jupyter to default to the wrong locale (this only happened with quiime2-2020.11 and only with the python API run in Jupyter lab). I traced the nodejs versions and here's what I found:

edit thanks @thermokarst for correcting me on this - node.js isn't a qiime2 dependancy. In the below, I couldn't get Jupyter lab to build without the noted node.js versions for the different qiime2 releases. They were only 'required' by our system to run qiime2 in a Jupyter lab notebook.

qiime2-2020.2 required nodejs>=10.x to build. Installing using conda install -c conda-forge nodejs pulled in version 6.13.1. I tried forcing this using conda install "nodejs>=10.0" and everything worked. There's another work around linked on the stack overflow post linked above.

qiime2-2020.11 required nodejs>12.x to build. Forcing with conda install "nodejs>=12.0" resulted in a PackagesNotFoundError. I decided to see what conda install -c conda-forge nodejs pulled in - pulled nodejs-12.19.0. Once I had nodejs sorted I could run jupyter lab build and so far no locale errors.

I have no idea why this seemed to fix the issues, or why with earlier releases there wasn't a mis-match with nodejs - would love to hear any explanations. Also curious if this might fix other locale issues people are running into.

I'll update as I finish testing - hoping this solves the locale issues.

Here's the full code I ran to update Q2 and run in Jupyter lab:

wget https://data.qiime2.org/distro/core/qiime2-2020.11-py36-linux-conda.yml
conda env create -n qiime2-2020.11 --file qiime2-2020.11-py36-linux-conda.yml
# OPTIONAL CLEANUP
rm qiime2-2020.11-py36-linux-conda.yml

conda activate qiime2-2020.11

conda install -c conda-forge jupyterlab

jupyter serverextension enable --py qiime2 --sys-prefix

conda install -c conda-forge nodejs #may need to force a specific version if build fails

jupyter lab build

Thanks for sharing, @hsapers. QIIME 2 doesn't ship with nodejs - perhaps that requirement is coming from some jupyter-related tools you're installing? Or maybe you have it installed in your root conda env? You can see an exhaustive list of all of the dependencies for the QIIME 2 core distribution, here:

1 Like

Thanks @thermokarst - I'm really not sure where that dependancy comes from - I couldn't get Jupyter lab to build without the 'right' node.js and the locale issues seemed to be solved with a build (could be something odd about our system, but I tested in a clean, separate conda distribution). The only additional package I installed other than those in the qiime2-2020.11-py36-linux-conda.yml for testing was conda install -c conda-forge jupyterlab. Without this, Jupyter lab would run, but wouldn't load the q2 module.

qiime2-2020.11 always ran just fine on the command line, just seemed to behave differently than 2020.2 in Jupyter on our system - I'm assuming as a result of some secondary dependencies for Jupyter Lab.

1 Like

update - only fixed the issue in qiime2-2020.2. I've been very careful to track all steps in building the qiime2-2020.2 env and the qiime2-2020.11 env. The 2020.11 release behaves differently in Jupyter than the 2020.2 release with respect to the locale. It seems that using the 2020.11 release in Jupyter lab, bbparser.py sets the encoding to ascii:

/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/site-packages/bibtexparser/bparser.py in parse_file(self, file, partial)
    176         """
--> 177         return self.parse(file.read(), partial=partial)
    178 
/export/data1/sw/tag_conda/envs/qiime2-2020.11/lib/python3.6/encodings/ascii.py in decode(self, input, final)
     25     def decode(self, input, final=False):
---> 26         return codecs.ascii_decode(input, self.errors)[0]
     27 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 804: ordinal not in range(128)

I know this is not really a QIIME2 issue, as it seems it's an issue with a dependancy, but I'm not sure why this behaviour is different between releases. Adding export LC_ALL=en_US.UTF-8 to my .bashrc doesn't seem to make a difference, it needs to be explicitly set immediately before the q2 command call. Really not sure what to make of this or how to continue trouble shooting, will revert back to 2020.2 for now.

Thanks for the update, @hsapers.

Are you adding it with the !? If so, please remove the exclamation, just add export LC_ALL=en_US.UTF-8 to your bashrc file. The ! is a jupyter notebook syntax that says "execute everything after the ! in the user's shell env," and isn't applicable in a traditional shell, just through jupyter/ipython.

A suggestion - do you need jupyterlab, specifically? If not, maybe you can use the built in jupyter notebook server that does come with your QIIME 2 env.

Thanks - sorry - yet I did add export LC_ALL=en_US.UTF-8 without the ! to my bashrc

I can run Jupyter Notebook (and haven't had the locale error on import), but I do use the module and IDE structure of Jupyter lab. I also seem to be be getting errors on the console that suggest holoviews (which I've been using for interactive visualization) may not work in the notebook.

I found a script to compare envs and attached the output here. qiime2-2020_2-vs-qiime2020-11.txt (53.2 KB) Nothing jumps out right away, except maybe icu:

icu :
qiime2-2020.2  58.2      hf484d3e_1000       conda-forge         
qiime2-2020.11 67.1      he1b5a44_0          conda-forge

thanks

1 Like

icu does seem like it could likely be involved. For what it's worth, I just spent the last thirty minutes or so looking into this, and apparently the localization situation in jupyterlab still has many rough edges, looks like you might be bumping into one. I was able to recreate the issue you originally reported, in a 2020.11 env. I was able to fix it by prefixing my script with this:

# begin: manually set locale in situ
import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
# end

# the rest of my QIIME 2 code...
import qiime2
denoising_stats = qiime2.Artifact.load('stats-dada2.qza')
print(denoising_stats)
...

Give that a shot and let me know.
:qiime2: