Q2-ghost-tree: which pre-built tree should I use?


(Pierre Joubert) #1

Hello! Not sure if this is the right place to post this but I am interested in using one of your prebuilt trees for analysing a small ITS dataset in qiime2 and I’m having trouble deciding which pre-built tree to use. I used the 97% UNITE database from 1.12.2017 to make my OTU table so I know I am supposed to use the ghost-tree that corresponds to that database but there 6 different options to choose from. From the little bit of searching I’ve done it sounds like the number after “ghost_tree” in the file names in GitHub corresponds to the similarity threshold used to put the trees together. Is that correct? If so what threshold would you recommend for my analysis? Also, half of the folders corresponding to the version of the UNITE databases have a “s_” before them, could you explain what this means and which folder to choose? I’d really appreciate your inputs!

Thanks,
Pierre


Q2-ghost-tree Plugin: Community Tutorial for Creating Hybrid-Gene Phylogenetic Trees
(Jennifer Fouquier) #2

Hi Pierre! Posting here works. :grinning:

The link to the 01.12.2017 ghost trees are here and this screenshot shows you the options for the UNITE 97 % database:

You can use the 100, 90 or 80, but I like the 80% and that’s what I had the best results with (but you may have different results depending on your dataset). This is just a ghost-tree option (yes, you’re right it’s the similarity level) and what it does is regroups the OTUs at different % similarity threshold (well, 100 does not regroup them, it keeps them the same). The problem with the 100% clustered seqs is that the seqs that have no names will get discarded as ghost-tree needs to have nomenclature. By using the 80% ghost-tree that means you make bigger clusters of OTUs to find a way to graft them to the 18S tree which has a better overall understanding of the phylogeny. As you can imagine this will decrease the quality of the tree, but it will capture more “unidentified” organisms. So using the 100% throws away more data and the 80% makes the tree slightly less accurate but contains more OTUs. Depending on your curiosity you could try a few. I would compare the PCoA results using ghost-tree to qualitative (Binary Jaccard) and quantitative (Bray Curtis). Just pay attention to artifacts in your data. If for some reason your dataset does not work well with ghost-tree there is a chance that you just can’t use UniFrac for your analysis. Hopefully it works though! Most people I have helped have had good results but there have been a few people that said it didn’t look right to them. I don’t know if it was user error or if ghost-tree discarded too much data.

The “s_” UNITE database ghost trees are in a different folder than the link I sent you. I’m not the best one to explain what the “s_” means. UNITE has made updates to their databases and they likely explain what this means and the reason people should select it or not. If you look at the database you used and you do not see the “s_” like you do here under the “media” tab, then you won’t need it. If you figure out more details about this, let me know.

Let me know if you have any more questions.

Jennifer