# Artificial Neural Network Learning

Artificial Neural Network Learning - processes, mathematics and intuition

## 1. Overview

**Artificial neural networks (ANNs)** are a powerful class of models used for **nonlinear regression and classification** tasks
inspired by **biological neural** computation.
Significance of this notes is to give the clarity and completeness on the **Neural Network Learning**
process and **mathematics** behind it!

## 2. Neural Network Learning

Neural Network consist of **input, hidden, or output layers**. There is only **one input layer** and **one output layer**
but the number of **hidden layers is unlimited**. Neural Networks are `feed-forward`

because
nodes within a particular layer are connected only to nodes in the immediately `downstream`

layer, so that nodes in the input layer activate only nodes in the subsequent hidden
layer, which in turn activate only nodes in the next hidden layer, and so on until the nodes
of the final hidden layer, which acts as output layer. This arrangement is illustrated in the three layered network
with one hidden layer figure 1 below.

Let \( J(\theta) = 3\theta^2 + 2\)

Let \(\theta = 1$ and $\epsilon = 0.01\)

Use the formula to numerically compute an approximation to the derivative of \(\theta\) at \(theta = 1\)

\[ \[ \frac{J(\theta + \epsilon) - J(\theta + \epsilon)}{2\epsilon} \] \[ = \frac{(3(1.01)^2 + 2) - (3(0.99)^2 + 2)}{0.002} = 9.003 \] \]

The Neural Network Learning is divided into two phases:

## a. Forward Propogation

## b. Backpropogation

## 3. Forward Propogation

## 2. What is backpropogation?

Backpropagation is a method used in artificial neural networks to calculate a gradient (loss) used by the gradient descent
optimization algorithm to adjust the weight of neurons. This technique is also called **backward propagation of
errors**, because the error is calculated at the output and distributed back through the network layers.

## 2. Recap

The **cost function** for neural networks is a generic form of **logistic regression** function as below:

\( \begin{aligned} J(\Theta) = - \frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left[y^{(i)}_k \log ((h_\Theta (x^{(i)}))_k) + (1 - y^{(i)}_k)\log (1 - (h_\Theta(x^{(i)}))_k)\right] + \frac{\lambda}{2m}\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} ( \Theta_{j,i}^{(l)})^2 \end{aligned} \tag{1}\) L = total number of layers in the network

l = number of units (not counting bias unit) in layer l

k = number of output units/classes

i = training set

\( h_\Theta(x)_k \) = hypothesis that results in the \( k^{th} \) output

- The double sum simply adds up the logistic regression costs calculated for each cell in the output layer
- The triple sum simply adds up the squares of all the individual Θs in the entire network
- The i in the regularization triple sum does not refer to training example i

## 3. Backpropagation error \(\delta^{(l)}\)

To prove that error delta \(\delta^{(l)}\) of layer l is:

\[\delta^{(l)} = ((\Theta^{(l)})^T \delta^{(l+1)})\ .*\ a^{(l)}\ .*\ (1 - a^{(l)}) \tag{2}\]

## 4. The Delta Rule

Least Mean Square (LMS) method,

## Share this post

Twitter

Google+

Facebook

Reddit

LinkedIn

StumbleUpon

Pinterest

Email