Dear qiime2 community!
I still have not figured out how to solve the problem of unmaching tree tips and sequence tables. Also discussed here
The user with the same problem did not post the solution.
The error I get is the same is in the post:
/usr/appli/freeware/miniconda/3.6/envs/qiime2-2018.2/lib/python3.5/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype int64 was
converted to bool by check_pairwise_arrays.
warnings.warn(msg, DataConversionWarning)
Traceback (most recent call last):
File "/usr/appli/freeware/miniconda/3.6/envs/qiime2-2018.2/lib/python3.5/site-packages/q2_diversity/_alpha/_method.py", line 46, in alpha_phylogenetic
tree=phylogeny)
File "/usr/appli/freeware/miniconda/3.6/envs/qiime2-2018.2/lib/python3.5/site-packages/skbio/diversity/_driver.py", line 170, in alpha_diversity
counts, otu_ids, tree, validate, single_sample=False)
File "/usr/appli/freeware/miniconda/3.6/envs/qiime2-2018.2/lib/python3.5/site-packages/skbio/diversity/alpha/_faith_pd.py", line 136, in _setup_faith_pd
_validate_otu_ids_and_tree(counts[0], otu_ids, tree)
File "/usr/appli/freeware/miniconda/3.6/envs/qiime2-2018.2/lib/python3.5/site-packages/skbio/diversity/_util.py", line 106, in _validate_otu_ids_and_tree
" ".join(missing_tip_names)))
skbio.tree._exception.MissingNodeError: Allotu_ids
must be present as tip names intree
.otu_ids
not corresponding to tip names (n=23324): 4b0f96635e87ecb3e7903c0b4ab0bfb2abe2856a 5a299a483a1212f879ee358b2ec00495c256ef1f [omitting feature_ids]
So after this command the feature_ids have the same name:
qiime vsearch dereplicate-sequences
--i-sequences OUT_DIR/out4-qual-filter.qza
--o-dereplicated-table OUT_DIR/out5-derep-table.qza
--o-dereplicated-sequences OUT_DIR/out5-derep-sequences.qza
[I checked by exporting the files and manually checking some of the feature_ids.]
Hence, somewhere during these commands the feature_id's get changed:
qiime alignment mafft
--i-sequences out5-derep-sequences.qza
--o-alignment out5-derep-sequences_alignment_mafft.qza
--p-n-threads 8
qiime alignment mask
--i-alignment out5-derep-sequences_alignment_mafft.qza
--o-masked-alignment out5-derep-sequences_alignment_masked.qza
qiime phylogeny fasttree
--i-alignment out5-derep-sequences_alignment_masked.qza
--o-tree out5-derep-sequences_tree.qza
qiime phylogeny midpoint-root
--i-tree out5-derep-sequences_tree.qza
--o-rooted-tree out5-derep-sequences_rooted_tree.qza
qiime tools export
out5-derep-sequences_tree.qza
--output-dir ./
I played around with my data and used the tools feature-table summarize
(the third tab, ‘Feature Detail’) and feature-table tabulate-seqs
to visualize the names of my feature_ids. I then exported my tree and viewed it in MEGA. There I found why the tips do not match.
The tips of the tree are all changed the same way:
e0330553235196aa25c184cd0b1a1f8284706dd3
becomes
e0330553235196aa25c184cd0b1a1f8284706dd3 UU3micro-18S-12_S14_L001_132201
UU3micro-18S-12_S14_L001 is the name of one of the fasta files I used and _132201 is also added. It is not reads.
The problem is this that I do not know where the fasta-filename is added and why. I used mafft and fasttree outside of the pipeline, they never added names to the sequences. I think this is the only reason that I cannot get the qiime diversity core-metrics-phylogenetic command to run.
I also tried the command recommended in the moving pictures tutorial:
qiime phylogeny align-to-tree-mafft-fasttree
but I get this error (I should update my qiime version...):
Error: QIIME 2 plugin 'phylogeny' has no action 'align-to-tree-mafft-fasttree'.
So thanks for your help so far! I am struggling to export mafft data so until I figure this out I cannot know where it is changed and why. There is also the possibility that the exporting changes the tip names and the error lies somewhere else.
I will also post the solution if I can find it and update my Qiime2 version (its from February)!
Flo