Hey @nick-youngblut, this was addressed in the 2019.10 release of QIIME 2, all you need to do is upgrade.
From the 2019.10 Changelog:
For some more details, I have outlined two scenarios below.
Scenario A
Taxonomy with whitespace imported prior to QIIME 2 2019.10 (example uses the same FeatureData[Sequence]
for training and classification).
ref-taxonomy.qza (5.1 KB)
rep-seqs.qza (5.2 KB)
# first, export ref-taxonomy.qza to confirm there is whitespace present
qiime tools export \
--input-path ref-taxonomy.qza \
--output-path whitespace-check
cat whitespace-check/taxonomy.tsv
The cat
will show something like this:
Feature ID Taxon
f1 t1
f2 t2
f3 t3
f4 t4
Next, train a classifier, and classify FeatureData[Sequence]
.
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads rep-seqs.qza \
--i-reference-taxonomy ref-taxonomy.qza \
--o-classifier classifier.qza
qiime feature-classifier classify-sklearn \
--i-classifier classifier.qza \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qza
qiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzv
taxonomy.qzv (1.2 MB)
Please note the whitespace has been stripped.
qiime tools extract \
--input-path taxonomy.qzv \
--output-path .
# note the path will be different, UUIDs are unique
cat 45f0c56c-9ff7-48be-9e49-7d5118ece5f9/data/metadata.tsv
The results:
Feature ID Taxon Confidence
#q2:types categorical categorical
f1 t4 0.9970119521912352
f2 t3 0.9970119521912352
f3 t2 0.9970119521912347
f4 t1 0.9970119521912347
Scenario B
Importing taxonomy with whitespace in QIIME 2 2019.10 and newer.
taxonomy.tsv (40 Bytes)
qiime tools import \
--type 'FeatureData[Taxonomy]' \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path taxonomy.tsv \
--output-path ref-taxonomy-stripped.qza
qiime tools export \
--input-path ref-taxonomy-stripped.qza \
--output-path stripped
cat stripped/taxonomy.tsv
The results:
Feature ID Taxon
f1 t1
f2 t2
f3 t3
f4 t4
Note that the taxon strings whitespace has been stripped.
The machine classifier doesn't work like that, this is a binary file, editing it isn't recommended.
i) No need to remove the whitespace, simply upgrade. Please note, 2019.10 is the only version of QIIME 2 currently supported.
ii) You are generalizing your experience with trying to edit a binary pickle --- exporting and extracting data are first class citizens in QIIME 2, and the resulting data is in whatever format the Semantic Type represented the data as (TSV, JSON, fastq, pkl, etc). You are simply trying to do something that doesn't really make sense for this kind of data.
Hope that helps!
:qiime2: