Dear all,

Thanks again to Sigurd for today, I personally wasn’t expecting an RKHS/Functional Analysis perspective on this for GPLVM but it makes sense to wanna look at it like that first so thank you twice.

A) For those not that familiar to the RKHS view (I think the reading group went over that last year a bit) the following resources will be useful:

The main book in this area again by Scholkopf and Smola (Sigurd I think your worldview preference matches nicely with Scholkopf's as he is one of the main ML theoreticians in this space): [I have a copy if anyone wants to borrow, library should have more]

J.S-T and N. C. book: [chapter 3 has most of basic intro and theory for RKHS and Mercer’s theorem etc]

B) Theory and GPs in terms of compactness, universal approximation properties, proof of dense-ness.

Thats one of the main ones in this area (in ML-land, surely there is much more in Stats/Math land disguised under Brownian motion view or Krigging estimators)

A lot of it is a bit beyond me atm but maybe someone one day can guide us through it. Relevant follow ups: [section 3.1 is relevant] [on kernel embeddings of measures, mmd, quite heavy in theory]


Now of course this is for the standard setting and I am not sure how much of it would carry through given the RV inputs of the GPLVM/ Deep GP construction. I would guess a lot since its a composition of GPs but it sounds messy to show and I haven’t seen any theory like that developed in that space. There might be a nice theoretical paper in this.. I don’t think Damianou does/shows anything like that in his PhD/papers.

Finally, for the DGP construction a somewhat relevant paper (although not on the RKHS view) on understanding better what that composition entails for effective depth, ergodicity, sampling etc is by some well-known suspects to us:

Best, Theo