Transformer for existing data type

jwdebelius · April 7, 2020, 9:01pm

In another variation on the theme of “I’m in quarantine and programing sounds easier than writing”… I have a library of python code that I’m trying to shoehorn into QIIME 2 for multiple reasons.

I have a handful of objects im operating off of, most of which are a dataframe of unaligned sequences where the index is a sequence ID and the columns give positions. For example:

>>> import pandas as pd
>>> 
>>> seq_array = pd.DataFrame(data=[list('CAT'), list("WANT"), list('CAN')],
                             index=['1', '2', '3'])    
>>> print(seq_array)           
   0  1  2     3
1  C  A  T  None
2  W  A  N     T
3  C  A  N  None                        

>>> sequences = [seq_array]

If I’m feeling fancy, I may also add convert those to dask delayed objects, because that’s my best attempt at parallelization, in which case I have a data representation that is a list of delayed sequence arrays.

>>> import dask
>>> 
>>> @dask.delayed
>>> def f(x):
...     return x
... 
>>> sequences = [f(seq_array)]

I have a function that is a cheap transformer and will take an existing sequence format and convert it to a seq array (which should just be a feature data representation). Im trying to figure out if I should just call this function inside my function or if I should create a new format and converter.

Thanks,
Justine

thermokarst · April 8, 2020, 10:00pm

My gut is telling me to just keep it as a util function, rather than format/transformer/view type/etc. The advantage is that you can always convert it into the more idiomatic QIIME 2 stuff later, but if you don’t have a clear need for it now, then don’t worry about it.

jwdebelius · April 8, 2020, 10:06pm

Thank you! That’s probably much easier.

Best,
Justine