Learn Moore–Aronszajn Theorem | Kernels as Inner Products

Swipe to show menu

The Moore–Aronszajn theorem is a foundational result in the theory of reproducing kernel Hilbert spaces (RKHS). It formally establishes a deep correspondence between positive definite kernels and Hilbert spaces of functions, clarifying how every such kernel uniquely determines a Hilbert space in which it serves as an inner product for function evaluation.

Let $X$ be a nonempty set, and let $K : X \times X \to \mathbb{R}$ be a symmetric, positive definite kernel. The theorem states:

Moore–Aronszajn Theorem:
For every positive definite kernel $K$ on $X$ , there exists a unique Hilbert space $\mathcal{H}$ of functions $f : X \to \mathbb{R}$ such that:

For every $x \in X$ , the function $K(\cdot, x)$ belongs to $\mathcal{H}$ ;
For every $f \in \mathcal{H}$ and every $x \in X$ , the reproducing property holds: $f(x) = \langle f, K(\cdot, x) \rangle_{\mathcal{H}}$

Moreover, $\mathcal{H}$ is called the reproducing kernel Hilbert space associated with $K$ , and $K$ is its reproducing kernel.

To understand why this correspondence exists and is unique, consider the following proof sketch. The proof has two main parts: existence and uniqueness.

Existence:
Given a positive definite kernel $K$ , you can construct a vector space of finite linear combinations of the form

f = \sum_{i=1}^n \alpha_i K(\cdot, x_i)

where $x_i \in X$ and $\alpha_i \in \mathbb{R}$ . Define an inner product on this space by

\left\langle \sum_{i=1}^n \alpha_i K(\cdot, x_i), \sum_{j=1}^m \beta_j K(\cdot, y_j) \right\rangle = \sum_{i=1}^n \sum_{j=1}^m \alpha_i \beta_j K(x_i, y_j)

This inner product is well-defined and positive definite due to the properties of $K$ . Completing this space with respect to the induced norm yields a Hilbert space $\mathcal{H}$ of functions on $X$ . By construction, $K(\cdot, x) \in \mathcal{H}$ for all $x$ , and the reproducing property holds: for any $f \in \mathcal{H}$ and $x \in X$ , $f(x) = \langle f, K(\cdot, x) \rangle_{\mathcal{H}}$ .

Uniqueness:
Suppose there are two Hilbert spaces of functions on $X$ with reproducing kernel $K$ . The construction above shows that any function in either space can be written as a limit of finite linear combinations of $K(\cdot, x)$ . The inner product must agree on these combinations, so the two spaces coincide as Hilbert spaces. Thus, the RKHS associated with $K$ is unique.

The consequences of the Moore–Aronszajn theorem are far-reaching. It provides the mathematical justification for using kernels in functional analysis, as it guarantees that every positive definite kernel gives rise to a unique Hilbert space of functions with powerful evaluation properties. In machine learning, this underpins kernel methods such as support vector machines, kernel ridge regression, and Gaussian processes: any algorithm that relies on a positive definite kernel can be interpreted as operating in an implicit Hilbert space of functions, even when that space is infinite-dimensional. This insight enables you to design algorithms that handle nonlinear relationships and complex data structures using only kernel evaluations.

Definition

Definition:

Kernel: A function $K : X \times X \to \mathbb{R}$ that is symmetric ( $K(x, y) = K(y, x)$ ) and positive definite (for any finite set $\{x_1, ..., x_n\} \subset X$ , the matrix $[K(x_i, x_j)]$ is positive semidefinite);
Section: For fixed $x \in X$ , the function $K(\cdot, x)$ is called the section of $K$ at $x$ ;
Reproducing property: For all $f$ in the RKHS and $x \in X$ , $f(x) = \langle f, K(\cdot, x) \rangle_{\mathcal{H}}$ .

From a geometric perspective, the Moore–Aronszajn theorem reveals that positive definite kernels act like inner products in a (possibly infinite-dimensional) Hilbert space of functions. Each point $x \in X$ is associated with the section $K(\cdot, x)$ , which can be viewed as a feature vector in the RKHS. The kernel $K(x, y)$ computes the inner product between the feature vectors corresponding to $x$ and $y$ . This visualization allows you to interpret kernel methods as linear operations in a high-dimensional feature space, even if you never explicitly construct the space itself. The theorem thus bridges abstract functional analysis and practical computation, making the power of Hilbert space geometry available for analyzing and modeling complex data.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 3