qiime moshpit bin-contigs-metabat: FeatureData[MAG] nowhere to be found

Dear @misialq and @Nicholas_Bokulich lab,

When binning contigs using qiime moshpit bin-contigs-metabat, the output produced for --o-mags are of type SampleData[MAGs]. However, for dereplicating MAGs and for classifying MAGs using 'qiime moshpit classify-kraken2' a FeatureData[MAG] type is required instead. browsing through your tutorials and help files, I am unable to find how to find or create the FeatureData[MAG] artifact. What I did see in my temp dir was a hash directory for every MAG created, and the data folder contains a fast format file. I suspect this is the data I am looking for. I am wondering why this is still in temp, and not output as FeatureData[MAG]. Are they unfinished somehow?

Happy to hear from you on how you advise me to proceed.

Cheers,
Pieter

Hey @pietervanveelen,

great question, thanks! So the idea is that the SampleData[MAGs] represents non-dereplicated MAGs (straight from binning) and FeatureData[MAG] then corresponds to the dereplicated ones. The dereplicating action which we implemented for now in q2-moshpit can accept any DistanceMatrix representing similarity between all the MAGs in the SampleData artifact - we then use that similarity to find the non-redundant set of MAGs. This is not yet really documented anywhere but one way to obtain such a matrix is by using sourmash through its QIIME 2 plugin - you can then use it as input to the dereplicate action. If you want to see the steps involved in this process, you can check out our semi-official tutorial here.

Let me know if you need more information :slight_smile:

Cheers,
Michal

Hello @misialq,

Thanks, that is very helpful. I'll get onto trying that soon.
I'm gonna bug you about another moshpit request, but I'll create another topic for it.

Best,
Pieter

If sourmash is the recommended way to go, it would be great to include it in future core distributions of qiime2-metagenome...

Hi @Mechah,

Jumping in for @misialq here! This is currently in the works, and we are hoping to have q2-sourmash included in the metagenome distribution for the 2024.10 release. :slightly_smiling_face:

Cheers :lizard: