Q2-ghost-tree: which pre-built tree should I use?

pierrj · December 6, 2018, 1:02am

Hello! Not sure if this is the right place to post this but I am interested in using one of your prebuilt trees for analysing a small ITS dataset in qiime2 and I'm having trouble deciding which pre-built tree to use. I used the 97% UNITE database from 1.12.2017 to make my OTU table so I know I am supposed to use the ghost-tree that corresponds to that database but there 6 different options to choose from. From the little bit of searching I've done it sounds like the number after "ghost_tree" in the file names in GitHub corresponds to the similarity threshold used to put the trees together. Is that correct? If so what threshold would you recommend for my analysis? Also, half of the folders corresponding to the version of the UNITE databases have a "s_" before them, could you explain what this means and which folder to choose? I'd really appreciate your inputs!

Thanks,
Pierre

Jennifer_Fouquier · December 6, 2018, 8:19pm

Hi Pierre! Posting here works.

The link to the 01.12.2017 ghost trees are here and this screenshot shows you the options for the UNITE 97 % database:

You can use the 100, 90 or 80, but I like the 80% and that's what I had the best results with (but you may have different results depending on your dataset). This is just a ghost-tree option (yes, you're right it's the similarity level) and what it does is regroups the OTUs at different % similarity threshold (well, 100 does not regroup them, it keeps them the same). The problem with the 100% clustered seqs is that the seqs that have no names will get discarded as ghost-tree needs to have nomenclature. By using the 80% ghost-tree that means you make bigger clusters of OTUs to find a way to graft them to the 18S tree which has a better overall understanding of the phylogeny. As you can imagine this will decrease the quality of the tree, but it will capture more "unidentified" organisms. So using the 100% throws away more data and the 80% makes the tree slightly less accurate but contains more OTUs. Depending on your curiosity you could try a few. I would compare the PCoA results using ghost-tree to qualitative (Binary Jaccard) and quantitative (Bray Curtis). Just pay attention to artifacts in your data. If for some reason your dataset does not work well with ghost-tree there is a chance that you just can't use UniFrac for your analysis. Hopefully it works though! Most people I have helped have had good results but there have been a few people that said it didn't look right to them. I don't know if it was user error or if ghost-tree discarded too much data.

The "s_" UNITE database ghost trees are in a different folder than the link I sent you. I'm not the best one to explain what the "s_" means. UNITE has made updates to their databases and they likely explain what this means and the reason people should select it or not. If you look at the database you used and you do not see the "s_" like you do here under the "media" tab, then you won't need it. If you figure out more details about this, let me know.

Let me know if you have any more questions.

Jennifer

pierrj · December 13, 2018, 7:13pm

Thank you so much for the info! That helps a lot.

I am now having issues running the diversity plugin using the premade ghost-tree in qiime2 (see screen shot and debug info below). The issue seems to be that the tip names don't match the IDs in my ID filtered feature table. I am pretty confused as to why this would be going on since I used the tip names from the ghost-tree to filter the table so all of the IDs should match.

I would appreciate any advice you might have on possible solutions or where to look for one.

Thanks!
Pierre

/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype float64 was converted to bool by check_pairwise_arrays.
warnings.warn(msg, DataConversionWarning)
/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/skbio/stats/ordination/_principal_coordinate_analysis.py:152: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is -0.042913273085820366 and the largest is 0.8932142584482181.
RuntimeWarning
/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/skbio/stats/ordination/_principal_coordinate_analysis.py:152: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is -0.07262095678211977 and the largest is 0.8777920791084088.
RuntimeWarning
Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_diversity/_alpha/_method.py", line 46, in alpha_phylogenetic
tree=phylogeny)
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/skbio/diversity/_driver.py", line 170, in alpha_diversity
counts, otu_ids, tree, validate, single_sample=False)
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/skbio/diversity/alpha/_faith_pd.py", line 136, in _setup_faith_pd
_validate_otu_ids_and_tree(counts[0], otu_ids, tree)
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/skbio/diversity/_util.py", line 104, in _validate_otu_ids_and_tree
" ".join(missing_tip_names)))
skbio.tree._exception.MissingNodeError: All otu_ids must be present as tip names in tree. otu_ids not corresponding to tip names (n=21): SH020854.07FU_JN622205_reps SH025243.07FU_AF461639_reps SH021890.07FU_KY104337_reps SH013736.07FU_AY015439_refs SH004915.07FU_AF444469_refs SH026748.07FU_AF444575_refs SH022127.07FU_KT799159_reps SH016817.07FU_EF452449_refs SH013735.07FU_AY015438_refs SH000526.07FU_EF568057_refs SH493451.07FU_KU515747_reps SH013109.07FU_AF444329_refs SH013449.07FU_AB030323_refs SH014188.07FU_AJ244232_refs SH015477.07FU_EF126366_reps SH006339.07FU_DQ449990_refs SH030064.07FU_HM148081_refs SH010820.07FU_FM246501_refs SH032431.07FU_AF444373_refs SH020855.07FU_AF444668_refs SH013590.07FU_AF444627_refs

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in core_metrics_phylogenetic
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 455, in callable_executor
outputs = self._callable(scope.ctx, **view_args)
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_diversity/_core_metrics.py", line 55, in core_metrics_phylogenetic
metric='faith_pd')
File "", line 2, in alpha_phylogenetic
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in callable_executor
output_views = self._callable(**view_args)
File "/home/qiime2/miniconda/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_diversity/_alpha/_method.py", line 50, in alpha_phylogenetic
raise skbio.tree.MissingNodeError(message)
skbio.tree._exception.MissingNodeError: All feature_ids must be present as tip names in phylogeny. feature_ids not corresponding to tip names (n=21): SH020854.07FU_JN622205_reps SH025243.07FU_AF461639_reps SH021890.07FU_KY104337_reps SH013736.07FU_AY015439_refs SH004915.07FU_AF444469_refs SH026748.07FU_AF444575_refs SH022127.07FU_KT799159_reps SH016817.07FU_EF452449_refs SH013735.07FU_AY015438_refs SH000526.07FU_EF568057_refs SH493451.07FU_KU515747_reps SH013109.07FU_AF444329_refs SH013449.07FU_AB030323_refs SH014188.07FU_AJ244232_refs SH015477.07FU_EF126366_reps SH006339.07FU_DQ449990_refs SH030064.07FU_HM148081_refs SH010820.07FU_FM246501_refs SH032431.07FU_AF444373_refs SH020855.07FU_AF444668_refs SH013590.07FU_AF444627_refs

Jennifer_Fouquier · December 14, 2018, 6:56pm

Hi Pierre, bummer. I will have a chance to look at this this weekend (hopefully it's a quick fix but I can't look at it right now). If you wouldn't mind, do you think you can send me your input files (or a dropbox or Google link) to jennifer DOT fouquier@ucdenver.edu? Thanks!

pierrj · December 16, 2018, 12:51am

I sent you the files. Thank you!!

microB · February 6, 2019, 9:10am

Was this ever resolved? I am running into similar issues, and I think it may have something to do with the underscores in the IDs, but replacing them with spaces didn't do the trick.

Thanks for any help!