When binning contigs using qiime moshpit bin-contigs-metabat, the output produced for --o-mags are of type SampleData[MAGs]. However, for dereplicating MAGs and for classifying MAGs using 'qiime moshpit classify-kraken2' a FeatureData[MAG] type is required instead. browsing through your tutorials and help files, I am unable to find how to find or create the FeatureData[MAG] artifact. What I did see in my temp dir was a hash directory for every MAG created, and the data folder contains a fast format file. I suspect this is the data I am looking for. I am wondering why this is still in temp, and not output as FeatureData[MAG]. Are they unfinished somehow?
Happy to hear from you on how you advise me to proceed.
great question, thanks! So the idea is that the SampleData[MAGs] represents non-dereplicated MAGs (straight from binning) and FeatureData[MAG] then corresponds to the dereplicated ones. The dereplicating action which we implemented for now in q2-moshpit can accept any DistanceMatrix representing similarity between all the MAGs in the SampleData artifact - we then use that similarity to find the non-redundant set of MAGs. This is not yet really documented anywhere but one way to obtain such a matrix is by using sourmash through its QIIME 2 plugin - you can then use it as input to the dereplicate action. If you want to see the steps involved in this process, you can check out our semi-official tutorial here.
Jumping in for @misialq here! This is currently in the works, and we are hoping to have q2-sourmash included in the metagenome distribution for the 2024.10 release.