Artificial Neural Network
An artificial neuron network (ANN) is a computational model based on the structure and functions of biological neural networks. Information that flows through the network affects the structure of the ANN because a neural network changes - or learns, in a sense - based on that input and output. ANNs are considered nonlinear statistical data modeling tools where the complex relationships between inputs and outputs are modeled or patterns are found. ANN is also known as a neural network. ANNs have three layers that are interconnected. The first layer consists of input neurons. Those neurons send data on to the second layer, which in turn sends the output neurons to the third layer. Training an artificial neural network involves choosing from allowed models for which there are several associated algorithms.
Artificial Neural Networks are multi-layer fully-connected neural nets. They consist of an input layer, multiple hidden layers, and an output layer. Every node in one layer is connected to every other node in the next layer. We make the network deeper by increasing the number of hidden layers.
The training procedure works as follows:
Randomly initialize the weights for all the nodes. There are smart initialization methods which we will explore in another article. For every training example, perform a forward pass using the current weights and calculate the output of each node going from left to right. The final output is the value of the last node. Compare the final output with the actual target in the training data, and measure the error using a loss function. Perform a backwards pass from right to left and propagate the error to every individual node using backpropagation. Calculate each weight’s contribution to the error, and adjust the weights accordingly using gradient descent. Propagate the error gradients back starting from the last layer.
Backpropagation with gradient descent is literally the “magic” behind the deep learning models. It’s a rather long topic and involves some calculus, so we won’t go into the specifics in this applied deep learning series. For a detailed explanation of gradient descent refer here. A basic overview of backpropagation is available here. For a detailed mathematical treatment refer here and here. And for more advanced optimization algorithms refer here. In the standard ML world this feed forward architecture is known as the multilayer perceptron. The difference between the ANN and perceptron is that ANN uses a non-linear activation function such as sigmoid but the perceptron uses the step function. And that non-linearity gives the ANN its great power.
Table
ANN Models | Test Accuracy | Test Loss | F1-Score |
---|---|---|---|
32-nodes-0-dense | 76 | 213.54 | 76 |
64-nodes-0-dense | 77 | 229.56 | 77 |
128-nodes-0-dense | 80 | 181.14 | 80 |
32-nodes-1-dense | 24 | 1.38 | 10 |
64-nodes-1-dense | 25 | 1.38 | 13 |
128-nodes-1-dense | 25 | 1.39 | 10 |
32-nodes-2-dense | 26 | 1.33 | 10 |
64-nodes-2-dense | 24 | 1.3 | 10 |
128-nodes-2-dense | 25 | 1.38 | 10 |
Artificial Neural Network
Among the Ten so models trained on the training data and validated on Testing data, we can see that a ANN model consisting of 64 modes and 0 Dense layer yeilds higher accuracy and lower loss.
This was our best ANN model consists of '128' nodes and '0' dense Layer .The input layer was of size (150,150,3). This model was trained for '20' epochs, using the loss function 'sparse-categorical-crossentropy' and activation function 'ReLu' after each layer. The batck size of '32' and 'Adam' optimizer was used. However to prevent overfitting of data we introduce a slight drop-out of '33%' after few layers. The learning rate of '0.001' combined with few callbacks of Early-Stopping, Loss Monitering were called in the fit function.Thus the 'hdf5'file was generated. Below are the graphs for better understanding of how the training process for the model went.