\(\newcommand{\prt}{\partial}\)

6.4.3.5. One Layer in a Neural Network

In the figure below one layer in a fully connected neural network is sketched. Not only the arrows and formula’s for the forward pass are given but also the arrows and formula’s for the backward pass.

The first sketch is for one training example, input \(\v x\) and output \(\hat{\v y}\).

../../../_images/nnoneblock.png — Fig. 6.4.19 One layer in a fully connected neural network.

The formulas for the forward pass by concatenating the actions of the three blocks in one layer (see previous sections). In the formula’s below we write \(W\) instead of \(W\ls i\) and \(\v b\) instead of \(\v b\ls i\).

\[\begin{split}\v u &= W \v x\\ \v v &= \v u + \v b &= W\v x + \v b\\ \hat{\v y} &= g\aew(\v v) &= g\aew(W\v x + \v b)\end{split}\]

For the backpropagation of the ‘error’ we have:

\[\begin{split}\frac{\prt\ell}{\prt \v v} &= g'\aew(\v v) \odot \frac{\prt\ell}{\prt \hat{\v y}} &\\ \frac{\prt\ell}{\prt \v u} &= \frac{\prt\ell}{\prt \v v} &= g'\aew(\v v) \odot \frac{\prt\ell}{\prt \hat{\v y}}\\ \frac{\prt\ell}{\prt \v x} &= W\T \frac{\prt\ell}{\prt \v u} &= W\T \left(g'\aew(\v v) \odot \frac{\prt\ell}{\prt \hat{\v y}}\right)\end{split}\]

and the gradients of the error with respect to the parameters \(W\) and \(\v b\) are given by:

\[\begin{split}\frac{\prt\ell}{\prt \v b} &= \frac{\prt\ell}{\prt \v v} = g'(\v v) \odot \frac{\prt\ell}{\prt \hat{\v y}} \\ \frac{\prt\ell}{\prt W} &= \frac{\prt\ell}{\prt \v u} \v x\T = \left(g'(\v v) \odot \frac{\prt\ell}{\prt \hat{\v y}}\right) \v x\T\\\end{split}\]

Now let us consider an input batch \(X\) and corresponding output batch \(Y\). In this case the graph looks like:

../../../_images/nn_batchoneblock.png — Fig. 6.4.20 One layer in a fully connected neural network (for a batch of input and output data)

The forward pass is described with:

\[\begin{split}U &= X W\T \\ V &= U + \v 1 \v b\T &= X W\T + \v 1 \v b\T\\ \hat{Y} &= g\aew(V) &= g\aew(X W\T + \v 1 \v b\T)\end{split}\]

For the backpropagation of the ‘error’ we have:

\[\begin{split}\frac{\prt\ell}{\prt V} &= g'(V) \odot \frac{\prt\ell}{\prt \hat{Y}} &\\ \frac{\prt\ell}{\prt U} &= \frac{\prt\ell}{\prt V} &= g'(V) \odot \frac{\prt\ell}{\prt \hat{Y}}\\ \frac{\prt\ell}{\prt X} &= \frac{\prt\ell}{\prt U} W &= \left(g'(V) \odot \frac{\prt\ell}{\prt \hat{Y}}\right) W\end{split}\]

and the gradients of the error with respect to the parameters \(W\) and \(\v b\) are given by:

\[\begin{split}\frac{\prt\ell}{\prt \v b} &= \frac{\prt\ell}{\prt V}\T \v 1 = \left(g'(V) \odot \frac{\prt\ell}{\prt \hat{Y}}\right)\T \v 1 \\ \frac{\prt\ell}{\prt W} &= \frac{\prt\ell}{\prt U}\T X = \left(g'(V) \odot \frac{\prt\ell}{\prt \hat{Y}}\right)\T X\\\end{split}\]