This is a note for learning Neural Network in Machine Learning based on this course on Coursera.

An overview of the main types of neural network architecture

1. Feed-forward neural networks

  • The commonest type of neural network in practical applications
  • If there is more than one hidden layer, we call them “deep” neural networks
  • The activities of the neurons in each layer are a non-linear function of the activities in the layer blow
    ![](Feed-forward NN.png)

2. Recurrent networks

  • Have directed cycles in their connection graph
  • Powerful and biologically realistic, but very difficult to train
  • Recurrent nets with multiple hidden layers are just a special case that has some of the hidden->hidden connections missing compare to the general recurrent network
    ![](Recurrent networks.png)
  • It is very natural way to model sequential data
  • Have the ability to remember information for quite a long time
    ![](Sequential framework.png)
  • An exciting example: Ilya Sutskever (2011), trained a special recurrent NN to predict the next character in a sequence, after training for a long time on a string of half a billion characters from English Wikipedia, and made it generate a new text by predicting the probability distribution for the next character and then sampling a character from this distribution. The result is quite reasonable with great syntax. Demo.

Perceptrons

Standard Perceptron architecture

![](Perceptron architecture.png)

  • So the perceptron is just learning last hidden layer’s weight. It is linear, and can not learn features.
  • Perceptron uses binary threshold neuron as its final last output layer.

Learning procedure

  • Pick training cases using any policy that ensures that every training case will keep getting picked.
  • If the output unit is correct, leave its weights alone.
  • If the output unit incorrectly outputs a zero, add the input vector to the weight vector.
  • If the output unit incorrectly outputs a 1, subtract the input vector from the weight vector.
    ![](Perceptron cone.png)

What perceptrons can’t do

First example: XOR

Positive cases: (1,1) → 1; (0,0) → 1;
Negative cases: (1,0) → 0; (0,1) → 0;
![](Perceptron XOR.png)
Also it can be proved in a geometric view:
![](Perceptron XOR geo.png)

Second example: Discriminating patterns

For two patterns like:
![](Perceptron patterns.png)
If we treat every individual pixel as an input, than pattern A and B’s output function are the same.
The reason is that, if we do not apply preprocessing on the input feature, the perceptron cannot learn translation of patterns.

So we must make NN to learn the inside features which is same to adding hidden units, but we need an efficient way to adapting all the weights.