want to compare the two different smoother matrices used for smoothing splines and kernel smoothers, respectively. for unexplained notation please refer to the book.
in the case of smoothing splines, the smoother matrix
is symmetric + positive semidefinite, and it can be expressed in Reinsch form because the lambda-dependence is based on a very simple operation of perturbation type. this has two consequences. first, the eigenvalues are all positive, and their size might decay to zero quickly except for a few bigger ones, and hence , which is the sum of these eigenvalues, kind of gives the dimension of the subspace associated with the bigger eigenvalues justifying that this trace has something to do with degrees of freedom. second, the Reinsch form makes it possible to understand how depends on lambda, and therefore one can calculate meaningful lambdas for given degrees of freedom.
i now think that both consequences break down in the case of kernel smoothers in the following sense: one would have to do some extra work to find out whether the situation is APPROXIMATELY the same as in the case of smoothing splines.
let's have a look at the smoother matrix itself whose ith row reads
where is the training input. it has to be given row by row and is NOT just a product of matrices like in the case of smoothing splines. furthermore, the lambda-dependence is much more complex and non-linear, and there won't be a Reinsch form in general. worse, such smoother matrices are usually neither symmetric nor positive semidefinite.
to demonstrate this, i have calculated two examples---see the attached pdf. the first kernel is linear and the corresponding smoother matrix is almost symmetric and there is symmetry in the rows. this can easily be broken by making the kernel kind of non-linear as in the case of my second kernel. still one sees some symmetries in the smoother matrix, and as a consequence the eigenvalues are still real. i think this is due to the symmetry and homogeneity of my . i am pretty sure i can construct smoother matrices with complex eigenvalues using less homogeneous training input.
but what is then the connection between the trace and the decay of eigenvalues?? i think one should rather look at the sum of the absolute values of the eigenvalues. but then the problem would be how to choose the equivalent kernel as this choice would be based on possibly complex eigenvectors...
all in all, i think that the MEANINGFUL use of as a kind of number of degrees of freedom would pretty much depend on the structure of the training data. but even if the data is nice and the meaning justified, as there is no Reinsch form, it's not clear that the trace depends on lambda in a monotone way, and hence one would have to dicuss all lambdas when working backward from a given trace (I would not know if standard software produces all possible lambdas or if it would only return a convenient one).