Intro to neural Networks with Keras

The perceptron

The perceptron is one of the simplest ANN architectures. It is based on an artificial neuron called threshold logic unit (TLU). The inputs are numbers and each input connection is associated with an weight . The TLU calculates the weighted sum of the inputs and then applies a step function to that sum and then outputs the result enter image description here The most common step function used in perceptrons is called Heaviside step function.

A single TLU can be used for simple linear binary classification, it computes the linear combination of the inputs and if the result exceeds the threshold, it gives a positive output. For example, you could use a single TLU to classify iris flowers based on the petal length and width.Training a TLU in this case means finding the right values for w0 ,w1 , and w2.

A perceptron is basically a single layer of TLUs, with each TLU connected to all the inputs. When all the neurons in a layer are connected to every neuron in the previous layer, it is called a fully connected layer or dense layer.

The decision output boundary is linear, hence just like logistic regression, perceptrons are unable to learn complex patterns. However if the training instances are linearly separable then the algorithm will converge to a solution. This is called Perceptron Convergence theorem

Scikit provides a Perceptron class that implements a single TLU network

Perceptrons learning algorithm strongly resembles that of Stochastic Gradient Descent.


NOTE:

Perceptrons do not use probabilities like logistic regression does to make predictionsl, instead it makes predictions based on a hard threshold value. This may be a key while choosing whether to use Perceptron or Logistic regression


Perceptrons have a number of weaknesses, the most common being its inability to solve trivial problems, which is true with any linear classification model as well. With time, it was found out that some of the limitations of perceptrons can be overcome by stacking multiple perceptrons. The resulting ANN is called Multi-Layer Perceptron (MLP)

Multi-Layer Perceptron and Backpropagation

An MLP is composed of one (pass) input layer, one or more layers of TLUs, called hidden layers, and one final layer of TLUs called the output layer. The layers close to the input layer are usually called the lower layers, and the ones close to the outputs are usually called the upper layers. Every layer except the output layer includes a bias neuron and is fully connected to the next layer.

enter image description here When an ANN contains a deep stack of hidden layers it is called deep neural network (DNN).

Backpropagation, in short is basically gradient descent. In just two passes ( forward and backward ) the algorithm is able to calculate the gradient of the network with regards to every single parameter.

The algorithm in more detail is:

Activation functions for backpropagation

  1. The logistic function

    This is a continuous and differentiable function, shaped as a S, with a range of 0 to 1

  2. The hyperbolic tangent function

    This is just like the logistic function, that is, S shaped curve. However the range is from -1 to 1. This makes each layers input centered around 0 at the beginning of the training, This also helps in converging sooner

  3. The rectified linear unit function

    This is also continuous but not differentiable. In practice it works very well and has the advantage of being fast to compute