Transformer help


(Jennifer Fouquier) #1

ghost-tree needs Silva which is in RNA format.

I discussed making AlignedRNAFASTAFormat (because there was only AlignedDNAFASTAFormat) with the original Silva developer (he sits next to me :slight_smile:) and he thinks we should indeed just convert the RNA to DNA prior to importing or during I suppose. So is this a transformer? What’s the easiest way to do this. Examples, please?

This is what I did but I don’t think it’s the best way to do things.

q2-ghost-tree works, I just need to get the transformer stuff fixed, conda envs sorted out and unit testing done.

Thx :slight_smile:


(Matthew Ryan Dillon) #2

(Matthew Ryan Dillon) #3

Hey there @Jennifer_Fouquier!

Cool, I think you have a few options here, I will lay them out here, based around answers to the following question:

Is FeatureData[AlignedSequence] the right sematic type for this data?

(Why am I asking this question?)

I ask this question because as a programmer-turned-psuedo-biologist, I am not sure what the considerations of using this semantic type on RNA seqs would be, pinging @gregcaporaso & @Nicholas_Bokulich for their input.

If “Yes”…

You could defined a new format, AlignedRNAFASTAFormat (and a corresponding directory format AlignedRNASequencesDirectoryFormat, examples of this below). Since the sematic type FeatureData[AlignedSequence] already exists, you don’t need to register that. Then, you can define a transformer that transforms AlignedRNAFASTAFormat to AlignedDNAFASTAFormat:

def _my_great_transformer(ff: AlignedRNAFASTAFormat) -> AlignedDNAFASTAFormat:
    # convert RNA to DNA, output is a new instance of AlignedDNAFASTAFormat

The way a user would use this new format wrt q2-ghosttree is they would import RNA seqs as AlignedRNAFASTAFormat:

qiime tools import \
  --input-path my-ghosttree-RNA-seqs.fasta \
  --type 'FeatureData[AlignedSequence]' \
  --source-format AlignedRNAFASTAFormat \
  --output-path my-ghosttree-DNA-seqs.qza

The transformer will be invoked while importing, so the data will be converted from RNA to DNA while loading up. The user will not have access to RNA sequences in this artifact now, which may or may not be a problem, I just want you to be aware that by the time it makes it into an Artifact, it will be DNA.


As promised, some examples of how the different types, file formats, and directory formats fit together in the case of aligned DNA seqs:

The artifact_format for FeatureData[AlignedSequence] is AlignedDNASequencesDirectoryFormat, which in turn is the directory format representation of AlignedDNAFASTAFormat

Also, what I am proposing above with defining a new transformer, this is super similar to how the two BIOM formats work in QIIME 2 (V1.0.0 & V2.1.0). Any FeatureTable[Frequency | RelativeFrequency | PresenceAbsence | Balance | Composition] created in QIIME 2 will be saved as BIOMV210DirFmt, this is because the artifact_format is specified as such. Then, there is a transformer defined from BIOMV100Format to BIOMV210Format. When you import specifying the source format, that transformer is invoked, which allows users to import other formats of feature tables.

If “No”…

In this case, it probably just makes more sense to create a new Method on your plugin that will accept a FeatureData[AlignedRNASequence] as input and will return a new FeatureData[AlignedSequence] as output (the input would be an all new semantic type). An example of that would be creating a relative frequency feature table: the method accepts FeatureTable[Frequency] as input and produces FeatureTable[RelativeFrequency] as output. The method’s signature annotation looks like this:

def relative_frequency(table: biom.Table) -> biom.Table:
   # etc

In your case, this would look something like this:

def convert_rna_to_dna(data: AlignedRNAFASTAFormat) -> AlignedDNAFASTAFormat:
   # etc

Conclusion

Yowza, that was a mouthful! Okay, I probably missed something, so if you have any questions, please don’t hesitate to ask them! :t_rex: :qiime2:


(Jennifer Fouquier) #5

Hey! Thanks so much for the details and links!

So I pushed commits last night after seeming to get the import to work using qiime tools import --type FeatureData[AlignedSequence] --input-path ../small_test_files/silva_fungi_only_tiny.txt --source-format AlignedRNAFASTAFormat --output-path silva-fungi-only-082718-tiny-k.qza but the data format is incorrect.

Also, I tried testing this morning and the same commit is not working as it did last night. I’ve been getting sporadic results which are really painful.

If you or any developer out there has a few mins to help me with the transformer that would be awesome! My community tutorial is drafted :slight_smile: Will post soon I hope.

q2-ghost-tree repository
AlignedRNAFASTAFormat
Here’s the transformer attempt
Not sure if plugin setup is registering things correctly…

Thank you!


(Matthew Ryan Dillon) #6

(Matthew Ryan Dillon) #7

Hey there @Jennifer_Fouquier! We are trying to get the release out the door today, so you might not hear back from us on this until early next week. Thanks for your patience! :heart:


(Matthew Ryan Dillon) #8

Hey there @Jennifer_Fouquier!

I tried installing your plugin today, but ran into the following issue:

qiime dev refresh-cache                                                                                                                                                                       master e963529
QIIME is caching your current deployment for improved performance. This may take a few moments and should only happen once per deployment.
Traceback (most recent call last):
  File "/Users/matthew/.conda/envs/q2dev/bin/qiime", line 11, in <module>
    sys.exit(qiime())
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2cli/dev.py", line 27, in refresh_cache
    import q2cli.cache
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2cli/cache.py", line 301, in <module>
    CACHE = DeploymentCache()
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2cli/cache.py", line 61, in __init__
    self._state = self._get_cached_state(refresh=refresh)
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2cli/cache.py", line 107, in _get_cached_state
    self._cache_current_state(current_requirements)
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2cli/cache.py", line 200, in _cache_current_state
    state = self._get_current_state()
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/q2cli/cache.py", line 238, in _get_current_state
    plugin_manager = qiime2.sdk.PluginManager()
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/qiime2/sdk/plugin_manager.py", line 44, in __new__
    self._init()
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/qiime2/sdk/plugin_manager.py", line 59, in _init
    plugin = entry_point.load()
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2325, in load
    return self.resolve()
  File "/Users/matthew/.conda/envs/q2dev/lib/python3.5/site-packages/pkg_resources/__init__.py", line 2331, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/Users/matthew/src/JTFouquier/q2-ghost-tree/q2_ghost_tree/plugin_setup.py", line 15, in <module>
    from ._tip_to_tip_distances import tip_to_tip_distances
ImportError: No module named 'q2_ghost_tree._tip_to_tip_distances'

Looking at the source repo, I don’t see a module called _tip_to_tip_distances.

Anyway, it also looks like while you have defined a transformer, it might not actually be registered. In order to do that, the module the transformer lives in needs to be imported in plugin_setup.py. One option to do that is like this. Another option is to just move all of the transformer code into plugin_setup.py.

Also, I think you need to remove this registration, since it is overwriting the type-to-format registration that should be there (cc @ebolyen, this seems like the kind of thing we should guard against in the framework…).


(Matthew Ryan Dillon) #9

(Jennifer Fouquier) #10

@thermokarst somehow I excluded or ignored that file because I think it’s still WIP…but it’s not essential (utility script that exists in other worlds anyways). The transformer is essential :slight_smile: Sorry about that. Added and pushed some changes, but I’m still having issues with importing. I’m probably just not understanding things.

So I made the AlignedRNAFASTAFormat because somehow the transformer needs to convert it from RNA to DNA… so trying to import an RNA seq just gives me this error. But when I look at the transformer it doesn’t make sense to me how it knows to go from AlignedRNAFASTAFormat to AlignedDNAFASTAFormat from the import I use. But it needs a type and a format which is what I have…

Thank you!

(qiime2-2018.8) [email protected] Pro:~/repos/q2-ghost-tree/[email protected]$ qiime tools import --input-path silva_fungi_only_tiny.txt --type FeatureData[AlignedSequence] --input-format AlignedRNAFASTAFormat --output-path silva_fungi_only_tiny.qza
Traceback (most recent call last):
File “/Users/jenniferfouquier/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/tools.py”, line 140, in import_data
view_type=input_format)
File “/Users/jenniferfouquier/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 219, in import_data
return cls.from_view(type, view, view_type, provenance_capture)
File “/Users/jenniferfouquier/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/result.py”, line 244, in _from_view
result = transformation(view)
File “/Users/jenniferfouquier/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/core/transform.py”, line 70, in transformation
new_view = transformer(view)
File “/Users/jenniferfouquier/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/core/transform.py”, line 220, in wrapped
file_view = transformer(view)
File “/Users/jenniferfouquier/repos/q2-ghost-tree/q2_ghost_tree/_transformer.py”, line 64, in _my_great_transformer
write_fasta(ff2, seqs)
File “/Users/jenniferfouquier/repos/q2-ghost-tree/q2_ghost_tree/_transformer.py”, line 50, in write_fasta
for desc, seq in seqs:
File “/Users/jenniferfouquier/repos/q2-ghost-tree/q2_ghost_tree/_transformer.py”, line 30, in parse_fasta
f = iter(f)
TypeError: ‘AlignedRNAFASTAFormat’ object is not iterable

An unexpected error has occurred:

‘AlignedRNAFASTAFormat’ object is not iterable

See above for debug info.


(Jennifer Fouquier) #11

@thermokarst, hold up, I think I see the issue.


(Matthew Ryan Dillon) #12

Just a quick follow-up — your changes look great! I just had a quick peek through the repo :heart: I was able to get the import to work, with the transformation being invoked, but it required a minor tweak to your transformer:

index e5d7f8c..5d1494b 100644
--- a/q2_ghost_tree/_transformer.py
+++ b/q2_ghost_tree/_transformer.py
@@ -58,10 +58,11 @@ def write_fasta(f, seqs):
 def _my_great_transformer(ff: AlignedRNAFASTAFormat) -> \
         AlignedDNAFASTAFormat:

-    ff2 = AlignedDNAFASTAFormat()
-    seqs = parse_fasta(ff)
-
-    write_fasta(ff2, seqs)
+    with ff.open() as fh:
+        seqs = parse_fasta(fh)
+        ff2 = AlignedDNAFASTAFormat()
+        with ff2.open() as fh2:
+            write_fasta(fh2, seqs)

This just keeps the file handle open for the source data for long enough to do what it needs to do - there are a few other ways to get this to work, but this approach is pretty straightforward.

My test file before import:

>a
-A-AUU-
>b
-----U-
>c
CCCCCC-

Then, after import:

>a
-A-ATT-
>b
-----T-
>c
CCCCCC-

And, the peek:

UUID:        ac46460f-de1c-4962-beae-00c59f5acc64
Type:        FeatureData[AlignedSequence]
Data format: AlignedDNASequencesDirectoryFormat

Woohoo! Looks like you got it! :tada: :champagne:


(Jennifer Fouquier) #13

Thanks @thermokarst! Yes, it’s working now. :partying_face: Thanks for all your help. I know you guys must be swamped!

@epruesse sat down to help me sort through a lot of stuff (funny since he’s the original Silva developer :smile: ). So transformer is good and ghost-tree is now formally merged into Bioconda. I know ghost-tree didn’t need to be in Bioconda, but I needed to learn anyways and it streamlines things a bit.

So just polishing docs/community tutorial and getting q2-ghost-tree on our Anaconda channel. :star_struck: