Hey @pietervanveelen,
great question, thanks! So the idea is that the SampleData[MAGs]
represents non-dereplicated MAGs (straight from binning) and FeatureData[MAG]
then corresponds to the dereplicated ones. The dereplicating action which we implemented for now in q2-moshpit can accept any DistanceMatrix
representing similarity between all the MAGs in the SampleData
artifact - we then use that similarity to find the non-redundant set of MAGs. This is not yet really documented anywhere but one way to obtain such a matrix is by using sourmash through its QIIME 2 plugin - you can then use it as input to the dereplicate action. If you want to see the steps involved in this process, you can check out our semi-official tutorial here.
Let me know if you need more information
Cheers,
Michal