How to load preprocessed single-end or paired-end data into QIIME2?

Hello,

I am currently working on integrating mOTU-tool into qiime2. However, qiime2 has limited options for shotgun preprocessing, so I'd like to load sequences in semantic type FeatureData[PairedEndSequences] and FeatureData[Sequences] in order to execute taxonomical classification on them and provide FeatureTable[Frequency|RelativeFrequency] for downstream analysis.
I see q2-shogun did similar thing (GitHub - qiime2/q2-shogun: A QIIME 2 plugin wrapper for the SHOGUN shallow shotgun sequencing taxonomy profiler), but they don't describe how they imported data into qiime2, which would be crucial at that point.

Cheers
Valentyn

2 Likes

Hey @crusher083!

I think the types you want will be of the SampleData variety rather than the FeatureData variety.

We kind of model the difference between the two as a "feature selection" step, where you decide which of these raw sequences are actually features, and the you create a corresponding fasta file FeatureData[Sequences] which represent what's in the FeatureTable[...]. This seems like precisely what mOTU-tool is doing (at least in the "profile" bit of their diagram).

Kind of the most basic feature-selector would be this action:

https://docs.qiime2.org/2022.8/plugins/available/vsearch/dereplicate-sequences/

Which is essentially the "no op" selector where anything unique is a feature. It's intended to be used by further clustering workflows, but it's I think a good example of the kind of thing you are making.

In terms of importing that data, it should look identical to importing amplicon sequences. We're currently not making any strong distinction between the two.

@misialq (who is working on some shotgun stuff as well), do you have any other pointers?

2 Likes

Hey both,

Not much to add here - I agree with @ebolyen, I would start with importing the data as a SampleData type and continue from there to using the mOTU-tool. Good luck!

Best,
Michal

1 Like