Ghost tree filtering?


(Irene) #1

Hi, I am trying to use a ghost tree for fungal ITS sequences and I’ve been following the tutorial but I am having trouble using the pre-made ghost trees and filtering the table to match the feature_ids. I thought there might be some file that just lists all the ghost tree IDs somewhere and you filter using that somehow, but I couldn’t find one.

"All feature_ids must be present as tip names in phylogeny. feature_ids not corresponding to tip names (n=3430): "

Can someone explain how exactly you filter your table so that it gets rid of anything not in the ghost tree?

I used a 2/2/2019 UNITE version to do the vclust (I am also confused as to why you cannot use the output from the taxonomy classifier). I could only find 2017 ghost trees so I originally picked the one for the classifier I was using (unite-ver7-dynamic-classifier-01.12.2017.qza), so do those versions have to be identical?

Any help would be very appreciated!
Thank you!

(Matthew Ryan Dillon) #2

In the meantime, @ihoxie, have you seen this tutorial?

(Jennifer Fouquier) #3

Hi @ihoxie, glad you are using the pre-made ghost tree, but it sounds like it’s not working well for you for some reason. n=3430 is not good! It could be because you’re using a tree that doesn’t match the names in your table.

So to clarify there is some confusion with the terms (no fault of yours). There is the ghost-tree software tool (italicized) and there are also ghost trees that are just the .nwk trees that are the output of running the ghost-tree tool. Basically we just call them ghost trees because they’re not a true phylogenetic tree… it’s a hybrid of two databases.

There is an accession file included in each ghost tree folder that you can use to filter your table to match the ghost-tree.nwk files. These are the most recent trees I have made and each folder contains the file you can use, although I see now that there is a 2018 UNITE version now. I can make these in a few weeks. Where did you find a 2019 UNITE version? :slight_smile: Either way, you need to make sure the IDs in your tables will exactly match the IDs in the ghost tree .nwk file. If you want a custom ghost tree, you would have to install the ghost-tree plugin and build a new one.

You cannot use the output from the taxonomy classifier because ghost-tree requires an OTU map. This is something I need to look into at a later time. ghost-tree was developed in 2014-2015 and published in 2016 when there was always an OTU map. So for now, ghost-tree just needs one. :slight_smile: However, if you are using the pre-built trees to run your diversity analysis, you should not need to install and run the “ghost-tree” software tool. You just need a phylogenetic tree (a prebuilt ghost tree .nwk) to run your diversity analysis. If you are following the ghost-tree community tutorial, make sure you’re not mixing up method 1 with method 2.

The link Matthew provided will help you filter your table to keep only sequences that are in the ghost tree so you can run your diversity analysis.

Let me know if you have any more questions. Responses may be delayed due to travel. -Jennifer

(Irene) #4

Hi, thanks for responding!

Yes I was trying to use a pre-built tree. I have the plug-in as well, but I didn’t try using that yet (I will give that a try, to make a custom tree instead).

It seems when I try to filter using the accession file I get a file with Plugin error from diversity:

The rarefied table contains no samples or features.

Which sounds like I filtered everything out into an empty table.
If I don’t filter I get:

Plugin error from diversity:

All feature_ids must be present as tip names in phylogeny. feature_ids not corresponding to tip names (n=2094)

But the trees and the database match, I’ve opened the files and the format is the same, and if I search sequences, they’re in the tree, accession, and ref seqs files.

I have to filter using the metadata based filtering as that’s the only one that accepts a non-qza file. If I try to import the accessions file, I get errors

Traceback (most recent call last):

File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/”, line 140, in import_data

….(more errors… saying the same thing with different lines)

IndexError: list index out of range

As for the UNITE database I see I downloaded a file UNITE_public_02.02.2019.fasta (I was using a 2017 version for the ghost tree so the names would match) but now I can’t where I got it from, I can only find the 2018 ones, so not sure what happened there!

(Jennifer Fouquier) #5

Hi @ihoxie, sorry for the delay, I was at a conference.

Yes, you should be using the filter command that does not require a .qza. A .tsv or .txt should work.

I’m a little bit confused. So do you have a ghost tree you would like to use that matches the IDs in the table? If so, can you send me your files (everything you would need to run your diversity analysis… if they are large files the links to them work)? At this point I’m not sure why they’re not working together. My email is jennifer.fouquier (at) ucdenver (dot) edu and I will see if I can find the issue. If the files are too large a google drive link works.