Ghost tree filtering?

ghost-tree
its
(Irene) #1

Hi, I am trying to use a ghost tree for fungal ITS sequences and I’ve been following the tutorial but I am having trouble using the pre-made ghost trees and filtering the table to match the feature_ids. I thought there might be some file that just lists all the ghost tree IDs somewhere and you filter using that somehow, but I couldn’t find one.

"All feature_ids must be present as tip names in phylogeny. feature_ids not corresponding to tip names (n=3430): "

Can someone explain how exactly you filter your table so that it gets rid of anything not in the ghost tree?

I used a 2/2/2019 UNITE version to do the vclust (I am also confused as to why you cannot use the output from the taxonomy classifier). I could only find 2017 ghost trees so I originally picked the one for the classifier I was using (unite-ver7-dynamic-classifier-01.12.2017.qza), so do those versions have to be identical?

Any help would be very appreciated!
Thank you!

2 Likes
(Matthew Ryan Dillon) #2

Pinging @Jennifer_Fouquier! :qiime2:

In the meantime, @ihoxie, have you seen this tutorial?

1 Like
(Jennifer Fouquier) #3

Hi @ihoxie, glad you are using the pre-made ghost tree, but it sounds like it’s not working well for you for some reason. n=3430 is not good! It could be because you’re using a tree that doesn’t match the names in your table.

So to clarify there is some confusion with the terms (no fault of yours). There is the ghost-tree software tool (italicized) and there are also ghost trees that are just the .nwk trees that are the output of running the ghost-tree tool. Basically we just call them ghost trees because they’re not a true phylogenetic tree… it’s a hybrid of two databases.

There is an accession file included in each ghost tree folder that you can use to filter your table to match the ghost-tree.nwk files. These are the most recent trees I have made and each folder contains the file you can use, although I see now that there is a 2018 UNITE version now. I can make these in a few weeks. Where did you find a 2019 UNITE version? :slight_smile: Either way, you need to make sure the IDs in your tables will exactly match the IDs in the ghost tree .nwk file. If you want a custom ghost tree, you would have to install the ghost-tree plugin and build a new one.

You cannot use the output from the taxonomy classifier because ghost-tree requires an OTU map. This is something I need to look into at a later time. ghost-tree was developed in 2014-2015 and published in 2016 when there was always an OTU map. So for now, ghost-tree just needs one. :slight_smile: However, if you are using the pre-built trees to run your diversity analysis, you should not need to install and run the “ghost-tree” software tool. You just need a phylogenetic tree (a prebuilt ghost tree .nwk) to run your diversity analysis. If you are following the ghost-tree community tutorial, make sure you’re not mixing up method 1 with method 2.

The link Matthew provided will help you filter your table to keep only sequences that are in the ghost tree so you can run your diversity analysis.

Let me know if you have any more questions. Responses may be delayed due to travel. -Jennifer

2 Likes
(Irene) #4

Hi, thanks for responding!

Yes I was trying to use a pre-built tree. I have the plug-in as well, but I didn’t try using that yet (I will give that a try, to make a custom tree instead).

It seems when I try to filter using the accession file I get a file with Plugin error from diversity:

The rarefied table contains no samples or features.

Which sounds like I filtered everything out into an empty table.
If I don’t filter I get:

Plugin error from diversity:

All feature_ids must be present as tip names in phylogeny. feature_ids not corresponding to tip names (n=2094)

But the trees and the database match, I’ve opened the files and the format is the same, and if I search sequences, they’re in the tree, accession, and ref seqs files.

I have to filter using the metadata based filtering as that’s the only one that accepts a non-qza file. If I try to import the accessions file, I get errors

Traceback (most recent call last):

File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/tools.py”, line 140, in import_data

….(more errors… saying the same thing with different lines)

IndexError: list index out of range

As for the UNITE database I see I downloaded a file UNITE_public_02.02.2019.fasta (I was using a 2017 version for the ghost tree so the names would match) but now I can’t where I got it from, I can only find the 2018 ones, so not sure what happened there!

1 Like
(Jennifer Fouquier) #5

Hi @ihoxie, sorry for the delay, I was at a conference.

Yes, you should be using the filter command that does not require a .qza. A .tsv or .txt should work.

I’m a little bit confused. So do you have a ghost tree you would like to use that matches the IDs in the table? If so, can you send me your files (everything you would need to run your diversity analysis… if they are large files the links to them work)? At this point I’m not sure why they’re not working together. My email is jennifer.fouquier (at) ucdenver (dot) edu and I will see if I can find the issue. If the files are too large a google drive link works.

1 Like
(Irene) #6

Hi @Jennifer_Fouquier ,
thanks for the reply, I attached the files. The table is the non-filtered table. I’ve been using the same versions of the ghost-tree, UNITE, and accessions file (01.12.2017), but the table doesn’t seem to match the accessions file even though the format is the same. I am sure the issue is probably just some file is not quite the right version. I’m about to try again.

Also, I tried making a custom tree using the plugin following the tutorial, and got a confusing error. Just wondering if this looks like an error resulting from one of the files or an error from maybe an incomplete download or something?

qiime ghost-tree scaffold-hybrid-tree-foundation-alignment
–i-otu-map extensions_otu_map_90.qza
–i-extension-taxonomy m2taxonomy2.qza \
–i-extension-sequences sh_refs_qiime_ver7_dynamic_01.12.2017.qza
–i-foundation-alignment silva_fungi_only_full_aligned_132_FILTERED.qza
–o-ghost-tree ghost-tree-foundation-allignment-90-otus.qza
–verbose

Traceback (most recent call last):

File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call

results = action(**arguments)

File “<decorator-gen-458>”, line 2, in scaffold_hybrid_tree_foundation_alignment

File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable

output_types, provenance)

File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 362, in callable_executor

output_views = self._callable(**view_args)

File “/Users/ihoxie/q2-ghost-tree/q2_ghost_tree/_scaffold_hybrid_tree_foundation_alignment.py”, line 44, in scaffold_hybrid_tree_foundation_alignment

gt_path, graft_level, None)[0]

File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/ghosttree/scaffold/hybridtree.py”, line 104, in extensions_onto_foundation

graft_level)

File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/ghosttree/scaffold/hybridtree.py”, line 205, in _extension_genus_accession_dict

taxonomy = accession_taxonomy_dic[i]

KeyError: ‘296d0566e223449e84f48b2f3fafded4’

Plugin error from ghost-tree:

’296d0566e223449e84f48b2f3fafded4’

See above for debug info.

Thanks for the help!

table-cr-973.qza (263.0 KB)
ghost-tree-midpoint-root2.qza (493.8 KB)
Metadatajustsamples.tsv (2.7 KB)

1 Like
(Jennifer Fouquier) #7

Sorry for the delay, as I have classes this semester. I got the files and will start looking into these issues tomorrow after a meeting I have. :slight_smile: -Jennifer

1 Like
(Jennifer Fouquier) #8

Hello Irene,

I have had a chance to look into this and unfortunately I haven’t been able to get to the bottom of it yet. :slightly_frowning_face::ghost::evergreen_tree:

I’m also getting “All feature_ids must be present as tip names in phylogeny . feature_ids not corresponding to tip names” error when running qiime diversity that I believe are related to underscores in the names of Newick files. The IDs definitely match! I can see them. See this issue if you’re interested: Errors when importing biomtable and tree files created by Ucluster into QIIME2

Previously, the UNITE files I have used did not have underscores, so I wasn’t aware that this would be a problem within the QIIME 2 environment. Underscores in names and IDs are pretty common. I tried to replace all _ with '_ in the Newick tree ghost tree (the ' is the escape character that was recommended) and rerun your diversity analysis, but I still got a failure. This time it is a new failure unrelated to the IDs not matching. I’m getting the following traceback now which is ultimately list index out of range plugin error from the qiime diversity plugin.

/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/sklearn/utils/validation.py:595: DataConversionWarning: Data with input dtype float64 was converted to bool by check_pairwise_arrays.
  warnings.warn(msg, DataConversionWarning)
/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/skbio/stats/ordination/_principal_coordinate_analysis.py:152: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is -0.07012680524153422 and the largest is 3.719931633058869.
  RuntimeWarning
Traceback (most recent call last):
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "</Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-375>", line 2, in core_metrics_phylogenetic
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 458, in _callable_executor_
    outputs = self._callable(scope.ctx, **view_args)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_diversity/_core_metrics.py", line 55, in core_metrics_phylogenetic
    metric='faith_pd')
  File "</Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/decorator.py:decorator-gen-482>", line 2, in alpha_phylogenetic
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/action.py", line 225, in bound_callable
    spec.view_type, recorder)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/sdk/result.py", line 287, in _view
    result = transformation(self._archiver.data_dir)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/core/transform.py", line 70, in transformation
    new_view = transformer(view)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/qiime2/core/transform.py", line 213, in wrapped
    return transformer(view.file.view(self._wrapped_view_type))
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/q2_types/tree/_transformer.py", line 26, in _2
    return skbio.TreeNode.read(fh, format='newick', verify=False)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/skbio/io/registry.py", line 652, in read
    return registry.read(file, into=cls, format=format, **kwargs)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/skbio/io/registry.py", line 513, in read
    return self._read_ret(file, format, into, verify, kwargs)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/skbio/io/registry.py", line 520, in _read_ret
    return reader(file, **kwargs)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/skbio/io/registry.py", line 998, in wrapped_reader
    return reader_function(fhs[-1], **kwargs)
  File "/Users/jenniferfouquier/anaconda3/envs/qiime2-2019.1/lib/python3.6/site-packages/skbio/io/format/newick.py", line 307, in _newick_to_tree_node
    while current_depth == tree_stack[-1][1]:
IndexError: list index out of range

Maybe @thermokarst or @ebolyen know what this issue is or can point me somwhere

Regarding running ghost-tree. It looks like you’re using qiime ghost-tree scaffold-hybrid-tree-foundation-alignment command correctly. I’m sorry you’re having so much trouble with all of this. It’s definitely not you. I would love to see these files. Can you attach all of these or send to jennifer.fouquier (at) ucdenver.edu?

1 Like
Issues with q2-ghost-tree .nwk file
(Matthew Ryan Dillon) assigned thermokarst #9
(Irene) #10

Hi @Jennifer_Fouquier, sorry for the delay! I’ve attached the files for the qiime ghost-tree scaffold-hybrid-tree-foundation-alignment command. If it’s a qiime2 environment issue, I could try it in QIIME1 or maybe earlier versions of QIIME2 or with earlier versions of the files/UNITE to see if it works then.

extensions_otu_map_90.qza (490.9 KB)
m2taxonomy2.qza (1.3 MB)
sh_refs_qiime_ver7_dynamic_01.12.2017.qza (3.6 MB)
silva_fungi_only_full_aligned_132_FILTERED.qza (458.2 KB)

1 Like
(Matthew Ryan Dillon) #11

Hey there @Jennifer_Fouquier! I started back at the beginning of this thread and worked my way through, I have some ideas. So, the main issue was a Feature ID mismatch. I was just reviewing the Ghost Tree Tutorial and noticed a discrepancy between what you suggest there, and what @ihoxie has done here. By looking at the provenance for extensions_otu_map_90.qza, I noticed that this file was built using the extensions_cluster command in q2-ghost-tree, but, the input sequences were @ihoxie’s FeatureData[Sequence] from q2-dada2! My understanding is that the extensions_cluster step should have used the FeatureData[Sequence] file sh_refs_qiime_ver7_dynamic_01.12.2017.qza, because that is the list of features @ihoxie is attempting to build the foundation alignment from.

As well, it appears that the Feature IDs are subtly different between the extension taxonomy and the extension sequences, @ihoxie — the IDs in the taxonomy has suffixes after the Feature ID, which changes the entire ID, so even after recomputing the OtuMap, you will still have issues there.

TLDR: input files to this command are a bit of a jumble right now, I think if you get things in order you will be all set.

(Matthew Ryan Dillon) unassigned thermokarst #12
(Jennifer Fouquier) #13

Hey all! So sorry, I had a family emergency come up and I’m really behind on everything. I will catch up hopefully next week realistically. Thanks in advance. -Jennifer

1 Like
(Matthew Ryan Dillon) #14

Sorry to hear that @Jennifer_Fouquier, I hope all is well :heart_decoration:


@ihoxie – I think you can start moving forward based on my recommendations in the meantime.

(Irene) #15

No rush @Jennifer_Fouquier, hope everything goes okay!
@thermokarst Thanks for checking it out!
The feature ID issue was from trying to match the pre-made trees to my data as opposed to using the plug in, but I guess there’s mismatch in both.
When I’d tried using the sh_refs_qiime_ver7_dynamic_01.12.2017.qza for the extensions_cluster step and then running the ghost-tree scaffold-hybrid-tree-foundation-alignment,
I got an error:
Traceback (most recent call last):
File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/commands.py”, line 274, in call
results = action(**arguments)
File “”, line 2, in scaffold_hybrid_tree_foundation_alignment
File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 231, in bound_callable
output_types, provenance)
File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py”, line 362, in callable_executor
output_views = self._callable(**view_args)
File “/Users/ihoxie/q2-ghost-tree/q2_ghost_tree/_scaffold_hybrid_tree_foundation_alignment.py”, line 44, in scaffold_hybrid_tree_foundation_alignment
gt_path, graft_level, None)[0]
File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/ghosttree/scaffold/hybridtree.py”, line 104, in extensions_onto_foundation
graft_level)
File “/Users/ihoxie/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/ghosttree/scaffold/hybridtree.py”, line 205, in _extension_genus_accession_dict
taxonomy = accession_taxonomy_dic[i]
KeyError: ‘SH124384.07FU_AY997045_refs_singleton’

Plugin error from ghost-tree:

‘SH124384.07FU_AY997045_refs_singleton’

See above for debug info.

I looked through the files and can’t find that ID in all the files, but the format appears identical, so not sure how you tell if the files match. Sorry for the confusion!

(Matthew Ryan Dillon) #16

That error is because of this:

This is an issue with the source data — the IDs should be the same between the taxonomy and the sequences, otherwise how can you figure out which one belongs to which? Maybe you should double check that you used the right input files (for example, didn’t mix-and-match database versions).

1 Like
(Jennifer Fouquier) #17

@thermokarst thanks for looking into this for me. I agree with your comment about the feature IDs not matching, but that was later when she was trying to build a custom tree after the pre-built tree use failed for her.

Earlier in the thread (on her first file upload, not the second one) there is still an issue that she identified that is possibly caused by underscores in the IDs. I’m having the same issue she is where I see the IDs matching between her feature table and the pre-built ghost tree .nwk but it’s still failing when we run qiime diversity. I tried escaping the underscores but it still failed. This part should be super simple for users because they’re just using the qiime diversity plugin with a pre-built ghost tree .nwk file, not q2-ghost-tree. Any ghost tree .nwk is just a phylogenetic tree that they can use in qiime diversity analysis. Do you or any of the team have any insight about this underscore issue? Thank you!

1 Like
(Matthew Ryan Dillon) #19

Hi @Jennifer_Fouquier! I don’t think there is an underscore related issue — I dove down into the provenance of @ihoxie’s input files and it looks like two different versions of the database were accidentally used (taxonomy from one db variant, seqs from another).

Screengrab from the UNITE DB:

QZA UUID Source File DB variant
m2taxonomy.qza 5fc4a97c-d600-420f-8e5d-c921c055747b sh_taxonomy_qiime_ver7_dynamic_s_01.12.2017.txt “Includes singletons set as RefS (in dynamic files).”
sh_refs_qiime_ver7_dynamic_01.12.2017.qza 37fdc128-7b5d-4ab0-b49c-ee30021f02e8 sh_refs_qiime_ver7_dynamic_01.12.2017.fasta “Includes global and 97% singletons.”

So, the taxonomy was imported from the first row above (‘includes singletons’), while the seqs were imported from the second row (‘include global & 97% singletons’). These two different versions of the database have two different ID schemes that don’t overlap. @ihoxie, my suspicion is that this was done on accident. If that is the case, go ahead and choose two files from the same database, then try again.

Thanks!

(Jennifer Fouquier) #20

@thermokarst, we’re looking at different files. :slight_smile: I’m not talking about when she tried to build her own ghost tree. She should be able to use the pre-built trees and it still isn’t wasn’t working for her. That’s why she later tried to build a custom ghost tree (which yes those IDs are mismatching). These are the files reposted from her earlier post on this thread. I think if you read this and then re-read my previous comment it should make sense. She should just be able to use these files and run qiime diversity. Let me know if I need to clarify more.

Thanks!

table-cr-973.qza (263.0 KB)
ghost-tree-midpoint-root2.qza (493.8 KB)
Metadatajustsamples.tsv (2.7 KB)

1 Like
(Matthew Ryan Dillon) #21

Yep, all makes sense. I think that is actually the same problem, a case of mismatched IDs. Check out this demonstration:

import qiime2
import skbio
import biom

table_artifact = qiime2.Artifact.load('table-cr-973.qza')
tree_artifact = qiime2.Artifact.load('ghost-tree-midpoint-root2.qza')

table = table_artifact.view(biom.Table)
tree = tree_artifact.view(skbio.TreeNode)

table_ids = set(table.ids(axis='observation'))
tip_ids = {tip.name for tip in tree.tips()}

print(len(table_ids))
print(len(tip_ids))
print(len(table_ids.intersection(tip_ids)))

Those last three print:

3157
23574
0

As to why, I think your underscore comment leads us to the answer here:

So if ghost-tree is producing trees with underscore in the IDs, those IDs will need to be escaped with a single quote, otherwise they will turn into spaces!

Hope that helps! :qiime2: :t_rex: