Rescript merge-taxa non-urgent bug - blank taxon values created when using 'super'

I recently ran into an issue when merging taxonomies using the "super" method. The merge completed successfully, but when the merged taxonomy was evaluated using rescript evaluate-taxonomy, the following error was given:

Plugin error from rescript: 'float' object has no attribute 'split'

I was able to get the merged taxonomy file to work in evaluate-taxonomy by editing blank values present in the taxonomy file to match the formatting of other taxon values (i.e. tax=K__kingdom;phylum;etc;;;;). I did this outside of QIIME2.

It appears the blank values are created during the merge process anywhere there is no match at the Kingdom level between the taxonomies being merged. Most features that ended up with blank values after merging had "Unassigned" in one of the original taxonomies. One had been classified as Protozoa in one taxonomy and Animalia in the other.

Disclaimer: these issue could have arisen for me because of some discrepancies in QIIME2 and RESCRIPt versions. The taxonomy merging & all evaluation was done in QIIME2 v2021.4.0 and RESCRIPt v2021.8.0.dev0+3.g1ce2142. The taxonomies I merged, and the naive-Bayes classifiers used to create them, were created on QIIME2 v2021.2 and unknown RECSRIPt version installed April 2021

Files & code here

Code:

#setup
#activate qiime
conda activate qiime2-2021.4
#set working directory
cd /home/smayne11/miniconda3/envs/qiime2-2021.4/Final/MergeTaxonomies

#merge taxonomies
qiime rescript merge-taxa
--i-data SM_taxonomy_EastUSCAOnt_50.qza SM_taxonomy_anmlUSCA_50.qza
--p-mode super
--o-merged-data SM_taxNB50_EaAU_sup.qza

#visualize
qiime metadata tabulate
--m-input-file SM_taxNB50_EaAU_sup.qza
--o-visualization SM_taxNB50_EaAU_sup.qzv
#notice blank values in "Taxon" collumn

qiime rescript evaluate-taxonomy
--i-taxonomies SM_taxNB50_EaAU_sup.qza
--p-labels SM_taxNB50_EaAU_sup
--o-taxonomy-stats SM_taxNB50_EaAU_sup_comp.qzv
#error: 'float' object has no attribute 'split'
#need to adjust formatting of file it seems

#export as biom
qiime tools export
--input-path SM_taxNB50_EaAU_sup.qza
--output-path SM_taxNB50_EaAU_sup
#taxonomy should be present as tsv at this point in this folder:
cd /home/smayne11/miniconda3/envs/qiime2-2021.4/Final/MergeTaxonomies/SM_taxNB50_EaAU_sup
#named "taxonomy.tsv"

###outside qiime2###
#Check which cells under "Taxon" are blank & adjust them to match formatting in other cells
#correct formatting example: tax=k__Animalia;Arthropoda;Insecta;Lepidoptera;Geometridae;Operophtera;Operophtera bruceata

cp taxonomy.tsv taxonomy_copy.tsv
#reimport to qiime2
qiime tools import --type 'FeatureData[Taxonomy]' --input-format TSVTaxonomyFormat --input-path taxonomy_copy.tsv --output-path taxonomy_table.qza

#visualize to make sure it worked
qiime metadata tabulate
--m-input-file taxonomy_table.qza
--o-visualization taxonomy_table.qzv
#no blank values under "Taxon"

qiime rescript evaluate-taxonomy
--i-taxonomies taxonomy_table.qza
--p-labels taxonomy_table
--o-taxonomy-stats taxonomy_table_eval.qzv
#executes successfully

A potentially related issue I was also having with merge-taxa
Similar issues arising in different contexts

Update:

Still not an issue I need resolved, but I installed qiime2 v2021.2.0 and rescript v2021.2.0 and am still having the same issue as above.

Looking back into the citations of the original taxonomy artifacts I'm seeing qiime2 v2021.2.0 and rescript v2020.6.1+3.g39f608e, though I'm not sure why I would have installed such an old version of rescript since I hadn't started using qiime until this year.

I did not confirm that my workaround was successful in the older version, but I assume it does. There are still blank "Taxon" values.

Hope this helps troubleshoot if anyone ends up trying to solve this.

#setup
#activate qiime
conda activate qiime2-2021.2
#set working directory
cd /home/smayne11/miniconda3/envs/qiime2-2021.2/Final/MergeTaxonomies

#merge taxonomies
qiime rescript merge-taxa
--i-data SM_taxonomy_EastUSCAOnt_50.qza SM_taxonomy_anmlUSCA_50.qza
--p-mode super
--o-merged-data SM_taxNB50_EaAU_sup.qza

#visualize
qiime metadata tabulate
--m-input-file SM_taxNB50_EaAU_sup.qza
--o-visualization SM_taxNB50_EaAU_sup.qzv
#notice blank values in "Taxon" collumn

qiime rescript evaluate-taxonomy
--i-taxonomies SM_taxNB50_EaAU_sup.qza
--p-labels SM_taxNB50_EaAU_sup
--o-taxonomy-stats SM_taxNB50_EaAU_sup_comp.qzv
#error: 'float' object has no attribute 'split'
#need to adjust formatting of file it seems

#didn't confirm that my previous work around worked in v 2021.2.0, but I assume it does

Thanks for reporting this @smayne11 !

You are correct in your diagnosis, and this is indeed a minor bug: The LCA methods will leave blank taxonomies if there is no LCA (e.g., because one is "Unclassified"). I have opened an issue here and we will get this fixed soon.

In the meantime, I am glad that you figured out a workaround!

1 Like