In another variation on the theme of “I’m in quarantine and programing sounds easier than writing”… I have a library of python code that I’m trying to shoehorn into QIIME 2 for multiple reasons.
I have a handful of objects im operating off of, most of which are a dataframe of unaligned sequences where the index is a sequence ID and the columns give positions. For example:
>>> import pandas as pd >>> >>> seq_array = pd.DataFrame(data=[list('CAT'), list("WANT"), list('CAN')], index=['1', '2', '3']) >>> print(seq_array) 0 1 2 3 1 C A T None 2 W A N T 3 C A N None >>> sequences = [seq_array]
If I’m feeling fancy, I may also add convert those to dask delayed objects, because that’s my best attempt at parallelization, in which case I have a data representation that is a list of delayed sequence arrays.
>>> import dask >>> >>> @dask.delayed >>> def f(x): ... return x ... >>> sequences = [f(seq_array)]
I have a function that is a cheap transformer and will take an existing sequence format and convert it to a seq array (which should just be a feature data representation). Im trying to figure out if I should just call this function inside my function or if I should create a new format and converter.