Hey there,
I get a 'MissingNodeError' when trying to calculate beta diversity on a merged feature table.
I have several runs I want to compare samples from, so I did the first upstream parts (up to dada2) for each run separately, and for each run I got representative sequences qza and feature table qza.
I merged the rep-seqs qza files using CLI
qiime feature-table merge-seqs \
$(for rpsq in $(ls ./*/*rep-seqs.qza);do echo --i-data $rpsq ;done) \
--o-merged-data merged_repseqs.qza
And successfully created merged_repseqs.qza
Afterwards, I created a rooted phylogenetic tree as described in the “Moving Pictures” tutorial. That is:
qiime alignment mafft --i-sequences merged_repseqs.qza --o-alignment aligned_merged_repseqs.qza --p-n-threads 50 --verbose;
qiime alignment mask --i-alignment aligned_merged_repseqs.qza --o-masked-alignment masked_aligned_merged_repseqs.qza --verbose;
qiime phylogeny fasttree --i-alignment masked_aligned_merged_repseqs.qza --o-tree unrooted-tree.qza --verbose --p-n-threads 50 --verbose;
qiime phylogeny midpoint-root --i-tree unrooted-tree.qza --o-rooted-tree rooted-tree.qza --verbose;
(Successfully finished without errors).
As for the feature tables (frequencies), since I want easier way to query the data for relevant samples (and mostly because I'm way more comfortable working with python rather than using bash) I joined the feature table using the Artifact API and pandas:
tables_artifacts = (Artifact.load(p) for p in tables_paths) # create generator of artifacts
tables_dataframes = (a.view(pd.DataFrame) for a in tables_artifacts) # generate dataframe views out of artifacts
all_samples_dataframe = pd.concat(tables_dataframes).fillna(0) # evaluate and concatenate tables
# fill NaN's with zeros for the sake skbio's nature of throwing annoying warnings
some_samples_dataframe = all_samples_dataframe.loc[....] # query whatever I need from the full table
# create a 'FeatureTable[Frequency]' artifact out of the table of interest
merged_table_art = Artifact.import_data('FeatureTable[Frequency]'
,some_samples_dataframe,view_type=pd.DataFrame)
bdv = qiime2.plugins.diversity.methods.beta_phylogenetic(table= merged_table_art
, metric="unweighted_unifrac"
, phylogeny=rooted_phylogeny)
# where rooted_phylogeny is a 'Phylogeny[Rooted]' artifact loaded from the merged_repseqs.qza file.
Now, to my understanding, every feature in my table should be also present in the rooted tree, but alas I get this err message:
---------------------------------------------------------------------------
MissingNodeError Traceback (most recent call last)
~/.conda/envs/qiime2-2017.12/lib/python3.5/site-packages/q2_diversity/_beta/_method.py in beta_phylogenetic(table, phylogeny, metric, n_jobs)
69 pairwise_func=sklearn.metrics.pairwise_distances,
---> 70 n_jobs=n_jobs
71 )
~/.conda/envs/qiime2-2017.12/lib/python3.5/site-packages/skbio/diversity/_driver.py in beta_diversity(metric, counts, ids, validate, pairwise_func, **kwargs)
347 metric, counts_by_node = _setup_multiple_unweighted_unifrac(
--> 348 counts, otu_ids=otu_ids, tree=tree, validate=validate)
349 counts = counts_by_node
~/.conda/envs/qiime2-2017.12/lib/python3.5/site-packages/skbio/diversity/beta/_unifrac.py in _setup_multiple_unweighted_unifrac(counts, otu_ids, tree, validate)
484 counts_by_node, _, branch_lengths = \
--> 485 _setup_multiple_unifrac(counts, otu_ids, tree, validate)
486
~/.conda/envs/qiime2-2017.12/lib/python3.5/site-packages/skbio/diversity/beta/_unifrac.py in _setup_multiple_unifrac(counts, otu_ids, tree, validate)
448 if validate:
--> 449 _validate_otu_ids_and_tree(counts[0], otu_ids, tree)
450
~/.conda/envs/qiime2-2017.12/lib/python3.5/site-packages/skbio/diversity/_util.py in _validate_otu_ids_and_tree(counts, otu_ids, tree)
105 (n_missing_tip_names,
--> 106 " ".join(missing_tip_names)))
107
MissingNodeError: All ``otu_ids`` must be present as tip names in ``tree``. ``otu_ids`` not corresponding to tip names (n=15002): e2cc357ffe57e5d5d20d4cc929a9803e 40570145f37809857b3fd113bedfe52a e0b19cf3a8136f7a6bb5e569a71030e5 f53a1bf1752fc1438d5f3211c9a269a0 ...)
I also tried to build the rooted phylogeny tree without the masking step (to make sure all features are included in the tree), and got the same error.
What did help, was removing features that didn't sum up to at least 20 across all of the samples (following the bottom line in the linked issue), but since I'm working with ~2000 samples, the number of features I loose is roughly 20000 out of 40000 I don't want to loose that many.
I want to understand, weather I'm not building the phylogeny tree correctly (losing to much information somewhere down the road), or should I actually remove some features prior to beta phylogeny?
Thanks,
Uria
edit
I have found the qiime phylogeny filter-table... is this what I should use?