Hi, I'm attempting to use Gemelli to run phylo-RPCA on a dataset processed using dada2 in R, then imported into Qiime2. When running the phylogenetic-rpca-with-taxonomy command I get a KeyError of "C" (see log file below). I believe this error stems from something in the transfer from R into Qiime2, as I'm able to recreate the error by importing the output of the Moving Pictures tutorial into R, then exporting it back into Qiime2, but I can't figure out what the issue is.
In R, I'm gathering all the data and metadata into a Phyloseq object then exporting the ASV table, taxonomy table, metadata, and rep-seqs separately (see code below).
Any insight anyone can offer would be greatly appreciated!
CODE
#R
#Export taxonomy table as "tax.txt" "ps" is phyloseq object
tax<-as(tax_table(ps),"matrix")
tax_cols <- colnames(tax)
tax<-as.data.frame(tax)
tax$taxonomy<-do.call(paste, c(tax[tax_cols], sep=";"))
for(co in tax_cols) tax[co]<-NULL
write.table(tax, "tax.txt", quote=FALSE, col.names=FALSE, sep="\t")
#Export ASV table as .biom file
library(biomformat)
asv<-t(as(otu_table(ps),"matrix"))
asv_biom<-make_biom(data=asv)
write_biom(asv_biom,"asv_biom.biom")
#Export metadata as .txt file
write.table(data.frame("sampleid"=sample_names(ps),sample_data(ps)),"sample-metadata.txt", sep="\t", row.names=FALSE, col.names=TRUE, quote=FALSE)
#Export ASVs as .fna file
ShortRead::writeFasta(refseq(ps), "rep-seqs.fna")
#QIIME2
#Import ASV table
qiime tools import --input-path asv_biom.biom --type 'FeatureTable[Frequency]' --input-format BIOMV100Format --output-path ts_feature-table.qza
#Import taxonomy table
qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path tax.txt --output-path taxonomy.qza
#Import ref-seqs
qiime tools import --input-path rep-seqs.fna --type 'FeatureData[Sequence]' --output-path rep-seqs.qza
#Align ref-seqs and create a phylogenetic tree
qiime phylogeny align-to-tree-mafft-fasttree --i-sequences rep-seqs.qza --o-alignment aligned-rep-seqs.qza --o-masked-alignment masked-aligned-rep-seqs.qza --o-tree unrooted-tree.qza --o-rooted-tree rooted-tree.qza
#Run phylo-RPCA with gemelli
qiime gemelli phylogenetic-rpca-with-taxonomy --i-table ts_feature-table.qza --i-phylogeny rooted-tree.qza --m-taxonomy-file taxonomy.qza --p-min-feature-count 10 --p-min-sample-count 500 --o-biplot phylo-ordination.qza --o-distance-matrix phylo-distance.qza --o-counts-by-node-tree phylo-tree.qza --o-counts-by-node phylo-table.qza --o-t2t-taxonomy phylo-taxonomy.qza
DEBUG INFO FROM ERROR
/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/preprocessing.py:416: RuntimeWarning: divide by zero encountered in log
mat = np.log(matrix_closure(matrix_closure(mat) * branch_lengths))
Traceback (most recent call last):
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in phylogenetic_rpca_with_taxonomy
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/rpca.py", line 82, in phylogenetic_rpca_with_taxonomy
output = phylogenetic_rpca(table=table,
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/rpca.py", line 269, in phylogenetic_rpca
traversed_taxonomy = retrieve_t2t_taxonomy(phylogeny, taxonomy)
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/preprocessing.py", line 112, in retrieve_t2t_taxonomy
constrings = pull_consensus_strings(tree)
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/preprocessing.py", line 150, in _pull_consensus_strings
_add_name_to_consensus_string(node.name, consensus_string)
File "/Users/Nathan/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/gemelli/preprocessing.py", line 138, in _add_name_to_consensus_string
rank_idx = rank_order[node_name[0]]
KeyError: 'C'