Feature-classifier character encoding error between unicode and ascii


#1

Hi everyone! I look forward training my own classifier. Yet I’m running into this problem:

➜ qiime feature-classifier extract-reads \
  --i-sequences 99_otus.qza \
  --p-f-primer CCTACGGGNGGCWGCAG \
  --p-r-primer GACTACHVGGGTATCTAATCC \
  --o-reads ref-seqs.qza

Plugin error from feature-classifier:

  'ascii' codec can't encode character '\xa0' in position 17: ordinal not in range(128)

Debug info has been saved to /tmp/qiime2-q2cli-err-riuzgdfo.log

Log file:

Traceback (most recent call last):
  File "/home/adrian/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "<decorator-gen-330>", line 2, in extract_reads
  File "/home/adrian/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/home/adrian/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/adrian/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_feature_classifier/_cutter.py", line 155, in extract_reads
    first_read = next(reads)
  File "/home/adrian/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_feature_classifier/_cutter.py", line 102, in _gen_reads
    f_primer = skbio.DNA(f_primer)
  File "/home/adrian/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/skbio/sequence/_grammared_sequence.py", line 335, in __init__
    interval_metadata, lowercase)
  File "/home/adrian/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/skbio/sequence/_sequence.py", line 630, in __init__
    sequence = sequence.encode("ascii")
UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 17: ordinal not in range(128)

The 99_otus.qza file I’m using was built with:

qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path ./gg_13_8_otus/rep_set/99_otus.fasta \
  --output-path 99_otus.qza

and that corresponding fasta file was downloaded from ftp://greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz

➜ file -i ./gg_13_8_otus/rep_set/99_otus.fasta
./gg_13_8_otus/rep_set/99_otus.fasta: text/plain; charset=us-ascii
(qiime2-2018.8) 

➜ sha256sum ./gg_13_8_otus/rep_set/99_otus.fasta
56c567f8f20a3e8d7e8cd3e1f915dcaa98b635cdced66ab2759f1e9d56499205  ./gg_13_8_otus/rep_set/99_otus.fasta

(Nicholas Bokulich) #2

(Matthew Ryan Dillon) #3

Hey there @elsa — looking at the traceback, it looks like the issue might be related to the forward primer string. I wonder if you can run env and provide the results here. Please keep in mind if you have exported any passwords, tokens, or other sensitive info to an environment variable, you will want to redact that before sharing. Thanks! :t_rex:


(Matthew Ryan Dillon) #4

#5

Sorry, I don’t understand. What results you want me to provide?

I’m using qiime2-2018.8 conda environment.

➜ qiime info
System versions
Python version: 3.5.5
QIIME 2 release: 2018.8
QIIME 2 version: 2018.8.0
q2cli version: 2018.8.0

Installed plugins
alignment: 2018.8.0
composition: 2018.8.0
cutadapt: 2018.8.0
dada2: 2018.8.0
deblur: 2018.8.0
demux: 2018.8.0
diversity: 2018.8.0
emperor: 2018.8.0
feature-classifier: 2018.8.0
feature-table: 2018.8.0
gneiss: 2018.8.0
longitudinal: 2018.8.0
metadata: 2018.8.0
phylogeny: 2018.8.0
quality-control: 2018.8.0
quality-filter: 2018.8.0
sample-classifier: 2018.8.0
taxa: 2018.8.0
types: 2018.8.0
vsearch: 2018.8.0

Application config directory
/home/adrian/.config/q2cli

Getting help
To get help with QIIME 2, visit https://qiime2.org

(Matthew Ryan Dillon) #6
env

Thanks!


#7
adrian/disco/1810_microbioma via 🅒 qiime2-2018.8
➜ env
PATH=/home/adrian/anaconda3/envs/qiime2-2018.8/bin:/home/adrian/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
LANG=en_US.UTF-8
DESKTOP_SESSION=/usr/share/xsessions/plasma
KONSOLE_DBUS_SESSION=/Sessions/1
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
XDG_CONFIG_DIRS=/etc/xdg/xdg-/usr/share/xsessions/plasma:/etc/xdg:/usr/share/mint-artwork-kde/kf5-settings:/usr/share/mint-artwork-kde/kde4-profile/default/config
XDG_VTNR=7
KDE_MULTIHEAD=false
XDG_RUNTIME_DIR=/run/user/1000
HOME=/home/adrian
KONSOLE_PROFILE_NAME=Linux Mint
XAUTHORITY=/tmp/xauth-1000-_0
QT_AUTO_SCREEN_SCALE_FACTOR=0
XDG_SESSION_CLASS=user
SSH_AUTH_SOCK=/tmp/*idk.removed*
GTK2_RC_FILES=/etc/gtk-2.0/gtkrc:/home/adrian/.gtkrc-2.0:/home/adrian/.config/gtkrc-2.0
GS_LIB=/home/adrian/.fonts
LC_MONETARY=es_AR.UTF-8
KONSOLE_DBUS_WINDOW=/Windows/1
OLDPWD=/home/adrian
KDE_SESSION_UID=1000
XDG_SESSION_DESKTOP=KDE
MANDATORY_PATH=/usr/share/gconf//usr/share/xsessions/plasma.mandatory.path
XDG_SESSION_ID=1
_=/home/adrian/anaconda3/bin/env
PWD=/media/adrian/disco/1810_microbioma
XDG_CURRENT_DESKTOP=KDE
DBUS_SESSION_BUS_ADDRESS=*idk.removed*
LANGUAGE=
QT_LINUX_ACCESSIBILITY_ALWAYS_ON=1
LC_IDENTIFICATION=es_AR.UTF-8
LC_PAPER=es_AR.UTF-8
XDG_SEAT=seat0
SESSION_MANAGER=local/mx:@/tmp/.ICE-unix/2358,unix/mx:/tmp/.ICE-unix/2358
LC_ADDRESS=es_AR.UTF-8
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
WINDOWID=60817413
LC_MEASUREMENT=es_AR.UTF-8
KONSOLE_DBUS_SERVICE=:1.112
XDG_SESSION_TYPE=x11
GTK_MODULES=gail:atk-bridge
SHELL=/usr/bin/zsh
QT_ACCESSIBILITY=1
TERM=xterm
COLORFGBG=15;0
XDG_DATA_DIRS=/home/adrian/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share/:/usr/share//usr/share/xsessions/plasma:/usr/local/share/:/usr/share/
DISPLAY=:0
PROFILEHOME=
SHELL_SESSION_ID=*idk.removed*
SHLVL=1
GTK_RC_FILES=/etc/gtk/gtkrc:/home/adrian/.gtkrc:/home/adrian/.config/gtkrc
LC_TELEPHONE=es_AR.UTF-8
LOGNAME=adrian
LC_NAME=es_AR.UTF-8
KDE_SESSION_VERSION=5
XCURSOR_SIZE=0
XCURSOR_THEME=breeze_cursors
DEFAULTS_PATH=/usr/share/gconf//usr/share/xsessions/plasma.default.path
KDE_FULL_SESSION=true
XDG_SESSION_COOKIE=472b0889742f415085eca68425cbc8bc-1541514305.828377-1702966000
USER=adrian
LC_NUMERIC=en_US.UTF-8
SSH_AGENT_PID=2153
ZSH=/home/adrian/.oh-my-zsh
PAGER=less
LESS=-R
LC_CTYPE=en_US.UTF-8
LSCOLORS=Gxfxcxdxbxegedabagacad
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
http_proxy=http://192.168.10.1:8080
https_proxy=http://192.168.10.1:8080
SPACESHIP_VERSION=3.6.0
SPACESHIP_ROOT=/home/adrian/anaconda3/lib/node_modules/spaceship-prompt
CONDA_SHLVL=1
CONDA_DEFAULT_ENV=qiime2-2018.8
CONDA_EXE=/home/adrian/anaconda3/bin/conda
CONDA_PREFIX=/home/adrian/anaconda3/envs/qiime2-2018.8
CONDA_PROMPT_MODIFIER=(qiime2-2018.8)
CONDA_PYTHON_EXE=/home/adrian/anaconda3/bin/python
MPLBACKEND=Agg
R_LIBS_USER=/home/adrian/anaconda3/envs/qiime2-2018.8/lib/R/library/
PYTHONNOUSERSITE=/home/adrian/anaconda3/envs/qiime2-2018.8/lib/python*/site-packages/

Thanks.


(Nicholas Bokulich) #8

(Matthew Ryan Dillon) #9

This seems unlikely, but is it possible that this is at play here? What do the other LC_* env vars look like when you haven’t activated your QIIME 2 conda env?


(Matthew Ryan Dillon) #10

#11

I just tried to reproduce the error at home (laptop with linux mint) before switching LC_MONETARY. I’ve run the command succesfully now…

~/lab/1810_microbioma via 🅒 qiime2-2018.8 took 1m 0s
➜ qiime feature-classifier extract-reads \
  --i-sequences 99_otus.qza \
  --p-f-primer CCTACGGGNGGCWGCAG \
  --p-r-primer GACTACHVGGGTATCTAATCC \
  --o-reads ref-seqs.qza
Saved FeatureData[Sequence] to: ref-seqs.qza

We can close this now. Please let me know if you want any other output for comparison and debugging.

By the way, here’s the env output where the comand worked succesfully.

XCURSOR_SIZE=48
GTK2_RC_FILES=/etc/gtk-2.0/gtkrc:/home/kzkt/.gtkrc-2.0:/home/kzkt/.config/gtkrc-2.0
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-zecb0BHyHn,guid=f3cd4c2559a33e2362df87de5be736c0
LOGNAME=kzkt
LANGUAGE=
QT_QPA_PLATFORM=xcb
SSH_AUTH_SOCK=*idk.removed*
LC_COLLATE=es_AR.UTF-8
USER=kzkt
SHELL_SESSION_ID=*idk.removed*
DEFAULTS_PATH=/usr/share/gconf//usr/share/xsessions/plasma.default.path
_=/home/kzkt/miniconda3/bin/env
DISPLAY=:0
HOME=/home/kzkt
XDG_SEAT=seat0
LC_IDENTIFICATION=es_AR.UTF-8
XAUTHORITY=/tmp/xauth-1000-_0
LANG=en_US.UTF-8
WINDOWID=31457285
KONSOLE_DBUS_SESSION=/Sessions/1
GTK_MODULES=gail:atk-bridge
KDE_FULL_SESSION=true
KDE_SESSION_VERSION=5
GS_LIB=/home/kzkt/.fonts
LC_PAPER=es_AR.UTF-8
LC_MONETARY=es_AR.UTF-8
PWD=/home/kzkt/lab/1810_microbioma
XDG_CONFIG_DIRS=/etc/xdg/xdg-/usr/share/xsessions/plasma:/etc/xdg:/usr/share/mint-artwork-kde/kf5-settings:/usr/share/mint-artwork-kde/kde4-profile/default/config
XDG_SESSION_ID=2
TERM=screen-256color
OLDPWD=/home/kzkt
SESSION_MANAGER=local/mosfet:@/tmp/.ICE-unix/2645,unix/mosfet:/tmp/.ICE-unix/2645
EVENT_NOEPOLL=1
COLORFGBG=15;0
DESKTOP_SESSION=/usr/share/xsessions/plasma
GTK_RC_FILES=/etc/gtk/gtkrc:/home/kzkt/.gtkrc:/home/kzkt/.config/gtkrc
KDE_MULTIHEAD=false
KDE_SESSION_UID=1000
KONSOLE_DBUS_SERVICE=:1.27
KONSOLE_PROFILE_NAME=mynt
LC_ADDRESS=es_AR.UTF-8
LC_MEASUREMENT=es_AR.UTF-8
LC_NAME=es_AR.UTF-8
LC_NUMERIC=es_AR.UTF-8
LC_TELEPHONE=es_AR.UTF-8
LC_TIME=es_AR.UTF-8
MANDATORY_PATH=/usr/share/gconf//usr/share/xsessions/plasma.mandatory.path
PATH=/home/kzkt/miniconda3/envs/qiime2-2018.8/bin:/home/kzkt/miniconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
PROFILEHOME=
QT_ACCESSIBILITY=1
QT_AUTO_SCREEN_SCALE_FACTOR=0
QT_LINUX_ACCESSIBILITY_ALWAYS_ON=1
SHELL=/usr/bin/zsh
SHLVL=1
SSH_AGENT_PID=2326
TMUX=/tmp/tmux-1000/default,3073,0
TMUX_PANE=%5
TMUX_PLUGIN_MANAGER_PATH=/home/kzkt/.tmux/plugins/
XCURSOR_THEME=DMZ-Black
XDG_CURRENT_DESKTOP=KDE
XDG_DATA_DIRS=/home/kzkt/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share/:/usr/share//usr/share/xsessions/plasma:/usr/local/share/:/usr/share/
XDG_RUNTIME_DIR=/run/user/1000
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
XDG_SESSION_CLASS=user
XDG_SESSION_COOKIE=472b0889742f415085eca68425cbc8bc-1541879488.293346-2099713057
XDG_SESSION_DESKTOP=KDE
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session0
XDG_SESSION_TYPE=x11
XDG_VTNR=7
ZSH=/home/kzkt/.oh-my-zsh
PAGER=less
LESS=-R
LC_CTYPE=en_US.UTF-8
LSCOLORS=Gxfxcxdxbxegedabagacad
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
SPACESHIP_VERSION=3.6.0
SPACESHIP_ROOT=/home/kzkt/.oh-my-zsh/custom/themes/spaceship-prompt
CONDA_SHLVL=1
CONDA_DEFAULT_ENV=qiime2-2018.8
CONDA_EXE=/home/kzkt/miniconda3/bin/conda
CONDA_PREFIX=/home/kzkt/miniconda3/envs/qiime2-2018.8
CONDA_PROMPT_MODIFIER=(qiime2-2018.8) 
CONDA_PYTHON_EXE=/home/kzkt/miniconda3/bin/python
MPLBACKEND=Agg
R_LIBS_USER=/home/kzkt/miniconda3/envs/qiime2-2018.8/lib/R/library/
PYTHONNOUSERSITE=/home/kzkt/miniconda3/envs/qiime2-2018.8/lib/python*/site-packages/

Thanks a lot!


(system) #12

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.