kernel methods:
- map data into higher dimensional space in the hope that in this higher-dimensional space the data could be more easily separated or better structured.
- The mapping function, doesn't need to be comptued because of the kernel trick.
- kernel trick can be applied to any algorithm which solely depends on the dot product.Wherever a dot product is sued, it is replaced by a kernel function.
- Kernel functions must be continuous, symmetric, and should have a positive (semi-) definite Gram matrix. Kernels which are said to satisfy the Mercer's theorem as PSD. PSD property insures that the optimization problem will be convex and soltuion will be unique.
- There are non-PSD kernel that works better sometimes, such as the sigmoid function.
- The motivation behind the choice can be intuitive depending on what kind of information we are expecting to extract about the data.
- Linear kernel, $ k(x, y) =x^T y +c $.
- Polynomial kernel.
- Gaussian kernel, carefully tune the parameter $ \sigma$. [
- Exponential kernel.
No comments:
Post a Comment