Overview of multilayer neural networks Chapter 6 in Duda et. al.”There is nothing particularly magical about multilayer neural networks; they implement linear discriminants, but in a space where the inputs have been mapped nonlinearly”, Duda, Hart, Stork
Overview of multilayer neural networks Chapter 6 in Duda et. al.
”There is nothing particularly magical about multilayer neural networks; they implement linear discriminants, but in a space where the inputs have been mapped nonlinearly”, Duda, Hart, Stork
Multilayer neural networks
In general a NN implements a non-linear mapping
For classification
Input is the d-dimensional feature vector x
Output is the c discriminant functions
We strive to obtain
Example 3-d feature vectors, two-category case, neural network with 5 hidden units
Terminology of neural networks
Bias weights Non-linearity,activation function Input layer Hidden layer Output layer A hidden unit Net activation Weights,
synapses Target vector
Structure of a neural network
We will study fully-connected, three layer networks with a fixed non-linearity
We train the NN by optimizing the weights according to some criterion
Generalizations:
Different non-linearities in each node.
Other network topologies: not fully connected, feedback paths
Sloppy notation!
Weight indices are used to distinguish between layers
Sigmoid non-linearities
”Hard limiter” or step function: Sigmoids are non-decreasing, scalar functions that satisfy
Examples
For training it is beneficial (if not crucial) that the sigmoid is differentiable
Expressive power of neural networks
Neural networks can implement any multidimensional mapping
Kolmogorov (1957): finite number of hidden units but unknown and arbitrarily complex scalar non-linearities
Hornik (Neural networks, vol 4, 1990) and many others: fixed scalar non-linearities (continuous, bounded, non-constant) but arbitrarily many hidden units
This situation is closer to practice where we typically use differentiable sigmoids, and vary the number of hidden units until satisfactory performance.
In practice engineering skills are more important
Application specific knowledge that guide the choice of network topology
Number of hidden layers
Number of units in each hidden layer
Feedback networks
Pruning techniques
Backpropagation training of neural networks
Training data fromall categories Supervised learning
For each feature vector there is an associated target vector
A gradient descent algorithm that modifies the weights iteratively so that the MSE is minimized
Often a stochastic gradient descent algorithm is used
For each input vector we consider the error
Calculate the stochastic gradient with respect to all weightsand update by
How choose the target vectors?
We do not know the posterior probabilities!
Somehow the target vector should indicate the category…
For the batch version we have
What more for session 6?
Read sections 6.1-6.6
Derivation of the backpropagation algorithm
Convergence of gradient algorithms
Interpretations of neural networks
Mapping of feature vectors to a space where they can be linearly separated
MSE approximation of the Bayes discriminant functions
Gives one idea how to specify the target vector
Comments