Errors with .qza when normally this test file works well

I am stuck on plugin development and would love some guidance.

In the q2-ghost-tree plugin I am getting an error from ghost-tree (“taxonomy file must contain genera”), which is great because at least it’s entering ghost-tree code, but the file does contain genera and I’m familiar with this error. This is a test file and I’ve used it 100s of times. q2-ghost-tree errors out with this file (.qza) but when it was a .txt file ghost-tree accepted this file just fine. The .qza is in TSVTaxonomyDirectoryFormat and it was imported successfully using qiime tools import. Any suggestions on why the .qza is not being read correctly?

Finally, how do I safely update to 2017.8? I see install directions, but no upgrade directions. I’m scared to mess up my qiime2 + q2-ghost-tree development when I can see some results. :blush:

Thank you!

Hi @Jennifer_Fouquier!

Hmm, as you mention, this is a ghost-tree error, which means that ghost-tree is expecting data to be in a different shape/format than what you are currently handing it. There are a few reasons why this could be the case, you might be missing a view call somewhere, or you might be passing around the wrong object, or something else altogether. Do you have this code available somewhere, that way I could take a peek? Also, have you looked at something like q2-alignment, to see an example of how a typical plugin method might look? If you can provide a concrete minimum working example of your method's function, that would be super helpful!

Because we are using conda environments, we recommend users just install the latest version outright, in a new conda environment (which will be the case if you copy-and-paste our native install commands). Once you have done that and activated the new environment, you can install q2-ghost-tree in development mode (pip install -e ., or whatever it is you did previously!).

Thanks! :t_rex:

1 Like

Thanks for the help! First, I updated to qiime2-2017.8 so I verified that I'm still getting this error with the current qiime code base.

I pushed a draft to this branch.

This is the error I'm getting, so it's entering ghost-tree code.

   Traceback (most recent call last):
  File "/Users/jenniferfouquier/miniconda3/envs/qiime2-2017.8/lib/python3.5/site-packages/q2cli/commands.py", line 222, in __call__
    results = action(**arguments)
  File "<decorator-gen-310>", line 2, in scaffold_hybrid_tree
  File "/Users/jenniferfouquier/miniconda3/envs/qiime2-2017.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 201, in callable_wrapper
    output_types, provenance)
  File "/Users/jenniferfouquier/miniconda3/envs/qiime2-2017.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 334, in _callable_executor_
    output_views = callable(**view_args)
  File "/Users/jenniferfouquier/repos/q2-ghost-tree/q2_ghost_tree/_scaffold_hybrid_tree.py", line 36, in scaffold_hybrid_tree
    foundation_alignment_fh, gt_path )[0]
  File "/Users/jenniferfouquier/repos/q2-ghost-tree/ghost-tree-master/ghosttree/scaffold/hybridtree.py", line 93, in extensions_onto_foundation
    extension_taxonomy_fh)
  File "/Users/jenniferfouquier/repos/q2-ghost-tree/ghost-tree-master/ghosttree/scaffold/hybridtree.py", line 161, in _extension_genus_accession_dic
    accession_taxonomy_dic = _create_taxonomy_dic(extension_taxonomy_fh)
  File "/Users/jenniferfouquier/repos/q2-ghost-tree/ghost-tree-master/ghosttree/scaffold/hybridtree.py", line 199, in _create_taxonomy_dic
    raise ValueError("Taxonomy file must contain genera")
ValueError: Taxonomy file must contain genera

I previously had the main code working with with a custom ghost-tree type for taxonomy files (this is on the master branch), but after further discussion I want to avoid using a custom taxonomy type. So I'm currently trying to get the generic FeatureData[Taxonomy] to work.

This is what my .qza for this file looks like. It was imported via:

qiime tools import --input-path minitaxonomy.txt --type FeatureData[Taxonomy] --output-path headerlesstaxonomy.qza --source-format HeaderlessTSVTaxonomyFormat

1 Like

Hey @Jennifer_Fouquier!

I suspect ghost-tree still needs the view type to be HeaderlessTSVTaxonomyFormat. An artifact’s internal representation is standardized to a single format (TSVTaxonomyFormat in this case).

That means the first line of that file will actually be:

Feature ID<tab>Taxon

Which is why the current code thinks there’s missing genera data.

It turns out there is no transformer defined for TSVTaxonomyFormat -> HeaderlessTSVTaxonomyFormat (I think @jairideout had a good reason for that, but I don’t remember the details).

So I bet if you change this line to:

 extension_taxonomy: pd.DataFrame,

and then wrote that without the header, ghost-tree would work.

Alternatively, you could have ghost-tree skip the header if it’s present.

2 Likes

It turns out there is no transformer defined for TSVTaxonomyFormat -> HeaderlessTSVTaxonomyFormat (I think @jairideout had a good reason for that, but I don’t remember the details).

See my comment on @Jennifer_Fouquier's corresponding GitHub issue:

@JTFouquier we intentionally didn't create a transformer to turn files with headers into headerless files, in order to discourage using/generating these types of files (it's generally considered a bad practice in data science). Thus, we support importing headerless taxonomy files in order to be compatible with popular reference databases (Greengenes, for example, doesn't have headers in its taxonomy files), but we don't support exporting into a headerless format.

2 Likes

Thanks Evan, this makes sense.

I remember this discussion with @jairideout about headerless files, I just wasn’t sure how I was going to implement things. I just don’t really think it makes sense to have so many ghost-tree specific types.

So I was able to remove the header from the DataFrame, but now I get 'DataFrame' object has no attribute 'close'… and this reminds me that I used .close() in ghost-tree. I could change ghost-tree to use with file.open() but I don’t think I can .open() a DataFrame.

So maybe the best option is to make ghost-tree look for headers like you suggest? Or add a try/except to close a file? hmmm…

Hey @Jennifer_Fouquier,

You would need to write the dataframe to a file for ghost-tree (it’s expecting to read a file, not a Python object like a DataFrame).

However, since you’ll need to modify this function to accept grafting at generic taxonomic levels, you would probably have an easier time checking for the header there (and ignoring if present). Then you can leave the kind of input that ghost-tree expects (a filehandle) alone.

2 Likes