Learn Representer Theorem | Structure of Reproducing Kernel Hilbert Spaces

Swipe to show menu

Formal Statement of the Representer Theorem

Suppose you have an RKHS, denoted by $H$ , with reproducing kernel $k: X × X → ℝ$ , and you are given a dataset of points $x_1, ..., x_n$ from $X$ . Consider a regularized empirical risk minimization problem of the form:

\min_{f \in H} \left\{ L\left(f(x_1), ..., f(x_n)\right) + \lambda \|f\|_H^2 \right\}

where:

$L$ is any loss function that depends only on the values of $f$ at the data points;
$λ > 0$ is a regularization parameter;
$\|f\|_H$ is the norm of $f$ in the RKHS.

Representer Theorem:
Any minimizer $f*$ of this problem admits the representation:

f^*(x) = \sum_{i=1}^n \alpha_i k(x, x_i)

for some coefficients $α_1, ..., α_n$ in $ℝ$ .

Proof Outline: Why Minimizers are Finite Kernel Expansions

The key idea behind the representer theorem is that, for functionals depending only on the values of $f$ at the sample points and the RKHS norm, any function in $H$ can be decomposed into two parts: one lying in the span of the kernel functions ${k(\cdot, x_i)}$ and one orthogonal to this span. The orthogonal component does not affect the empirical loss but increases the RKHS norm, making it suboptimal due to regularization. Therefore, the optimal solution must lie entirely in the finite-dimensional span of the kernel functions at the data points.

Consequences: Tractable Optimization in Infinite-Dimensional Spaces

The practical impact of the representer theorem is profound. Although the RKHS may be infinite-dimensional, the solution to the regularized risk minimization problem always lives in an $n$ -dimensional subspace determined by the data. This reduces the original infinite-dimensional optimization problem to a finite-dimensional one, which can be solved efficiently using linear algebra techniques. This property underpins the computational feasibility of many kernel methods in machine learning, such as support vector machines and kernel ridge regression.

Geometric Intuition: The Projection Interpretation

Geometrically, the representer theorem can be understood as a projection. The set of all functions in the RKHS that agree with a given set of values at the training points forms an affine subspace. The minimizer of the regularized risk is the unique function in this subspace that is closest to the origin in the RKHS norm, i.e., the orthogonal projection onto the finite-dimensional span of ${k(\cdot, x_i)}$ . This projection ensures that the solution is both data-adaptive and regularized for smoothness or complexity control.

Definition

Regularization in RKHS
Regularization in the RKHS context refers to the addition of a penalty term, typically involving the RKHS norm $\|f\|_H^2$ , to the empirical loss. This term discourages overly complex or "rough" functions by penalizing high-norm solutions, thereby promoting smoothness and improving generalization ability in learning tasks.

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 3