In another variation on the theme of “I’m in quarantine and programing sounds easier than writing”… I have a library of python code that I’m trying to shoehorn into QIIME 2 for multiple reasons.
I have a handful of objects im operating off of, most of which are a dataframe of unaligned sequences where the index is a sequence ID and the columns give positions. For example:
>>> import pandas as pd
>>>
>>> seq_array = pd.DataFrame(data=[list('CAT'), list("WANT"), list('CAN')],
index=['1', '2', '3'])
>>> print(seq_array)
0 1 2 3
1 C A T None
2 W A N T
3 C A N None
>>> sequences = [seq_array]
If I’m feeling fancy, I may also add convert those to dask delayed objects, because that’s my best attempt at parallelization, in which case I have a data representation that is a list of delayed sequence arrays.
>>> import dask
>>>
>>> @dask.delayed
>>> def f(x):
... return x
...
>>> sequences = [f(seq_array)]
I have a function that is a cheap transformer and will take an existing sequence format and convert it to a seq array (which should just be a feature data representation). Im trying to figure out if I should just call this function inside my function or if I should create a new format and converter.
Thanks,
Justine