I am currently working on integrating
qiime2 has limited options for shotgun preprocessing, so I'd like to load sequences in semantic type
FeatureData[Sequences] in order to execute taxonomical classification on them and provide
FeatureTable[Frequency|RelativeFrequency] for downstream analysis.
q2-shogun did similar thing (GitHub - qiime2/q2-shogun: A QIIME 2 plugin wrapper for the SHOGUN shallow shotgun sequencing taxonomy profiler), but they don't describe how they imported data into
qiime2, which would be crucial at that point.
I think the types you want will be of the
SampleData variety rather than the
We kind of model the difference between the two as a "feature selection" step, where you decide which of these raw sequences are actually features, and the you create a corresponding fasta file
FeatureData[Sequences] which represent what's in the
FeatureTable[...]. This seems like precisely what mOTU-tool is doing (at least in the "profile" bit of their diagram).
Kind of the most basic feature-selector would be this action:
Which is essentially the "no op" selector where anything unique is a feature. It's intended to be used by further clustering workflows, but it's I think a good example of the kind of thing you are making.
In terms of importing that data, it should look identical to importing amplicon sequences. We're currently not making any strong distinction between the two.
@misialq (who is working on some shotgun stuff as well), do you have any other pointers?
Not much to add here - I agree with @ebolyen, I would start with importing the data as a
SampleData type and continue from there to using the mOTU-tool. Good luck!