KeyError when attempting phylo-RPCA with Gemelli

Hi, I'm attempting to use Gemelli to run phylo-RPCA on a dataset processed using dada2 in R, then imported into Qiime2. When running the phylogenetic-rpca-with-taxonomy command I get a KeyError of "C" (see log file below). I believe this error stems from something in the transfer from R into Qiime2, as I'm able to recreate the error by importing the output of the Moving Pictures tutorial into R, then exporting it back into Qiime2, but I can't figure out what the issue is.

In R, I'm gathering all the data and metadata into a Phyloseq object then exporting the ASV table, taxonomy table, metadata, and rep-seqs separately (see code below).

Any insight anyone can offer would be greatly appreciated!

CODE
#R
#Export taxonomy table as "tax.txt" "ps" is phyloseq object
tax<-as(tax_table(ps),"matrix")
tax_cols <- colnames(tax)
tax<-as.data.frame(tax)
tax$taxonomy<-do.call(paste, c(tax[tax_cols], sep=";"))
for(co in tax_cols) tax[co]<-NULL
write.table(tax, "tax.txt", quote=FALSE, col.names=FALSE, sep="\t")

#Export ASV table as .biom file
library(biomformat)
asv<-t(as(otu_table(ps),"matrix"))
asv_biom<-make_biom(data=asv)
write_biom(asv_biom,"asv_biom.biom")

#Export metadata as .txt file
write.table(data.frame("sampleid"=sample_names(ps),sample_data(ps)),"sample-metadata.txt", sep="\t", row.names=FALSE, col.names=TRUE, quote=FALSE)

#Export ASVs as .fna file
ShortRead::writeFasta(refseq(ps), "rep-seqs.fna")

#QIIME2
#Import ASV table
qiime tools import --input-path asv_biom.biom --type 'FeatureTable[Frequency]' --input-format BIOMV100Format --output-path ts_feature-table.qza

#Import taxonomy table
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path tax.txt --output-path taxonomy.qza

#Import ref-seqs
qiime tools import --input-path rep-seqs.fna --type 'FeatureData[Sequence]' --output-path rep-seqs.qza

#Align ref-seqs and create a phylogenetic tree
qiime phylogeny align-to-tree-mafft-fasttree --i-sequences rep-seqs.qza --o-alignment aligned-rep-seqs.qza --o-masked-alignment masked-aligned-rep-seqs.qza --o-tree unrooted-tree.qza --o-rooted-tree rooted-tree.qza

#Run phylo-RPCA with gemelli
qiime gemelli phylogenetic-rpca-with-taxonomy --i-table ts_feature-table.qza --i-phylogeny rooted-tree.qza --m-taxonomy-file taxonomy.qza --p-min-feature-count 10 --p-min-sample-count 500 --o-biplot phylo-ordination.qza --o-distance-matrix phylo-distance.qza --o-counts-by-node-tree phylo-tree.qza --o-counts-by-node phylo-table.qza --o-t2t-taxonomy phylo-taxonomy.qza

DEBUG INFO FROM ERROR
/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/preprocessing.py:416: RuntimeWarning: divide by zero encountered in log
mat = np.log(matrix_closure(matrix_closure(mat) * branch_lengths))
Traceback (most recent call last):
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in phylogenetic_rpca_with_taxonomy
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/rpca.py", line 82, in phylogenetic_rpca_with_taxonomy
output = phylogenetic_rpca(table=table,
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/rpca.py", line 269, in phylogenetic_rpca
traversed_taxonomy = retrieve_t2t_taxonomy(phylogeny, taxonomy)
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/preprocessing.py", line 112, in retrieve_t2t_taxonomy
constrings = pull_consensus_strings(tree)
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/preprocessing.py", line 150, in _pull_consensus_strings
_add_name_to_consensus_string(node.name, consensus_string)
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/preprocessing.py", line 138, in _add_name_to_consensus_string
rank_idx = rank_order[node_name[0]]
KeyError: 'C'

1 Like

Hello @NathanStewart and welcome to the forum,

If you share the ts_feature-table.qza, rooted-tree.qza, and taxonomy.qza artifacts we can try to recreate the problem.

Thanks.

1 Like

Thank you @colinvwood!
Here are the tree and taxonomy:
rooted-tree.qza (35.1 KB)
taxonomy.qza (7.8 KB)

…and the feature table:
ts_feature-table.qza (15.4 KB)

I'm going to tag @cmartino whom I believe is the developer of this plugin. He may be interested in this issue and will be quicker to troubleshoot than me.

1 Like

Thanks, @colinvwood. Hi @NathanStewart,

Thanks for using Gemelli and for posting this error!

The issue is with the way the taxonomy is formatted. I assumed QIIME2 required a rank order delimiter as a requirement for import but it looks like it does not. I will put in an issue to Gemelli to force a more descriptive error when the rank delimiter is missing (which is needed to determine the LCA along the phylogeny).

To fix this for your taxonomy, the following python code with do the job. But I have also attached the revised taxonomy here: taxonomy_fixed.qza (8.0 KB)

import qiime2 as q2
import pandas as pd

taxonomy = q2.Artifact.load('taxonomy.qza')
RANK_ORDER = ['d', 'p', 'c', 'o', 'f', 'g', 's']
taxonomy_df = taxonomy.view(q2.Metadata).to_dataframe()
taxonomy_df['Taxon'] = ['; '.join([r  + '__' + t_r for t_r, r in zip(t.split(';'), RANK_ORDER)])  + '; s__'
                        for t in taxonomy_df.Taxon.values]
taxonomy_fixed = q2.Artifact.import_data('FeatureData[Taxonomy]', taxonomy_df)
taxonomy_fixed.save('taxonomy_fixed.qza')

I was able to run the command with the new taxonomy but let me know if you have any other problems.

On a related note, this command works better (more accurate LCA) with GG2 here, so it might be worth trying it out for generating your phylogeny/taxonomy.

Thanks!

Cameron

3 Likes

Thank you for the help and the recommendations, @cmartino! I'll look into using GG2 for the taxonomy.
-Nathan