Gemelli "transform" functions

jwdebelius · September 4, 2024, 2:56pm

Hi All (but probably mostly @cmartino),

I'm working with a set of data where I have some technical effects across some subsets. I'd like to do some cross validation where I run rPCA on the first subset, and then see if I get similar results when i run it on a different subset. It looks like the transform and rpca-transform functions are what I need...
except the documentation indicates they only work on the first PC? I think?

Could you clarify if this is a reasonable use case and if the only PC 1 thing is true?

Thanks,
Justine

cmartino · September 9, 2024, 7:11pm

Hi @jwdebelius,

Good question! The documentation could be clearer here, I will put in an issue for that.

The transformation (like sklearn's PCA transform of new data) will work for all PCs in the input ordination.

All you need is the transform function/command, which will assume by default the data are counts and rclr transform them for you. If you happened already transform the tables wit rclr or some other transformation then use rclr_transform=False and the tables will be used as is.

The output ordination will contain all the samples from the input ordination (training) and unseen samples (test) found in the input tables for PC1-PCN (where N is the rank used in the RPCA for the training data).

Let me know if anything is unclear.

Thanks for using RPCA!

Cameron

jwdebelius · September 9, 2024, 7:23pm

Thanks so much @cmartino!

It would be super helpful if you could update the docs!

Best,
Justine

cmartino · September 10, 2024, 4:24pm

Will do, issue put in here, set for the next version. Thanks!