# Mercer’s Theorem and SVMs Patterns of Ideas

In a funny coincidence, this post has the same basic structure as my previous one: proving some technical result, and then looking at an application to machine learning. This time it’s Mercer’s theorem from functional analysis, and the kernel trick for SVMs. The proof of Mercer’s theorem mostly follows Lax’s Functional Analysis.

### 1. Mercer’s Theorem

Consider a real-valued function \$latex {K(s,t)}&fg=000000\$, and the corresponding integral operator \$latex {mathbf{K}: L^2[0,1]rightarrow L^2[0,1]}&fg=000000\$ given by

\$latex displaystyle (mathbf{K} u)(s)=int_0^1 K(s,t) u(t), dt.&fg=000000\$

We begin with two facts connecting the properties of \$latex {K}&fg=000000\$ to the properties of \$latex {mathbf{K} }&fg=000000\$.

Proposition 1. If \$latex {K}&fg=000000\$ is continuous, then \$latex {mathbf{K} }&fg=000000\$ is compact.

Proof: Consider a bounded sequence \$latex {{f_n}_{n=1}^{infty} subset L^2[0,1]}&fg=000000\$. We wish to show that the image of this sequence, \$latex {{mathbf{K} f_n}_{n=1}^{infty}}&fg=000000\$, has a convergent subsequence. We show that \$latex {{mathbf{K} f_n}}&fg=000000\$ is equicontinuous, and Arzela-Ascoli then gives a…

View original post 848 more words