VC theory, Support vectors and Hedged prediction technology
VC theory, Support vectors and Hedged prediction technology
Overfitting in classification
Assume a family C of classifiers of points infeature space F. A family of classifiers is a map from CF to {0,1} (Negative and positive class).
For each subset X of F and each c in C, c(X) defines a partitioning of X into two classes.
C shatters X if every partitioning of X is accomplished by some c in C
If every point set X of size d is shattered by C, then the VC dimension is at least d.
If a point set of d+1 elements cannot be shattered by C, then the VC-dimension is at most d.
VC-dimension of hyperplanes
The set of points on the line shatters any two points, but not three
The set of lines in the plane shatters any three non-collinear points, but no four points.
Any d+2 points in E^d can be partitioned into two blocks whose convex hulls intersect.
VC-dimension of hyperplanes in E^d is thus d+1.
Why VC-dimension?
Elegant and pedagogical, not very useful.
Bounds future error of classifier, PAC-learning.
Exchangeable distribution of (xi, yi).
For first N points, training error for c isobserved error rate for c.
Goodness of selecting from C a classifier with best performance on training set depends on VC-dimension h:
Why VC-dimension?
Classify with hyperplanes
Frank Rosenblatt (1928 – 1971)
Pioneering work in classifying byhyperplanes in high-dimensional spaces.
Criticized by Minsky-Papert, sincereal classes are not normallylinearly separable.
ANN research taken up again in1980:s, with non-linear mappingsto get improved separation.Predecessor to SVM/kernel methods
Find parallel hyperplanes
Separate examples by wide margin hyperplanes (classifications).
Enclose examples between hyperplanes (regression).
If necessary, non-linearly map examples to high-dimensional space where they are better separated.
Find parallel hyperplanes
Classification
Red true separatingplane.
Blue: wide marginseparation in sample
Classify by planebetween blue planes
Find parallel hyperplanes
Regression
Red: true central plane.
Blue: narrowest margin enclosing sample
New xk : predict ykso (xk, yk) lies on mid-plane (dotted).
From vector to scalar product
Soft Margins
Soft Margins
Quadratic programming goes through also with soft margins.
Specification of softness constant C is part of most packages.
However, no prior rule for setting C is established, and experimentation is necessary for each application.
Choice is between narrowing margin, allowing more outliers, and using a more liberal kernel (to be described).
SVM packages
Inputs xi, yi, and KERNEL and SOFTNESS information
Only output is , non-zero coefficients indicatesupport vectors.
Hyperplane obtained by
Comments