Architecture of Artificial Neural Networks

1. Introduction

Artificial Neural Networks are probably the best known of all biologically inspired computing technologies in that most people have a rough understanding, or at least a suspicion, of what they do. Sadly though, in my own experience most programmers are afraid of sitting down and getting a simple neural network working. Whenever I've given talks on the subject people have been interested and often quite impressed with the capabilities of neural networks to solve real-world problems. However, when I bump in to them later and ask them if they've followed up on what I said and spent a few hours working on a neural network of their own, they invariably say something like "oh no, that's too hard".

Programming and using artificial neural networks is not hard! There is a little bit of maths involved and it's true that working with some of the traditional learning techniques can be a bit of a headache at times but at the end of the day, neural networks are not complicated. Honestly. I think that the problem lies with the way in which many academics write about neural networks. Most web resources on the subject present the technology as a spaghetti of impenetrable equations, hide it behind reams of compiler-specific nonsense or just avoid the issue and give references.

So, in this article, I'll attempt to clearly outline some of the computational aspects of artificial neural networks, without getting too bogged down in implementation-specific detail. The article will, hopefully, outline the components that make up a neural network and the way in which they are linked together and interact. Before reading any further, you may wish to have a quick look at the article Neural Networks in Nature for a brief introduction to the biological inspiration for neural networks.

I'm going to try very hard to avoid mentioning any specific programming language or development environment in this article because, as I mentioned above, the ideas and principles behind artificial neural networks (or any technology for that matter) are presented behind a layer of implementation-specific waffle all to often. Where code examples are given I'll give them in a syntactically neutral pseudocode which will be meaningless to any compiler but which any half-decent programmer should easily understand.

2. Neural Networks in the Real World

Before we get going with the technical side of this article, it worth having a quick look at the behavoiur and benefits of Neural Networks as a technology.

Neural nets can be trained to solve numerical problems. That doesn't limit them to the world of maths though, since just about every problem can be expressed as a set of numbers. For example, we might feed numbers representing the behaviour of the stock market over the last few days into our neural network, then read a number from its output corresponding to the market's predicted behaviour for tomorrow. Alternately, we might input an encoded image from a camera and use the outputs to control servos, move a robot arm and catch a ball.

Both of those applications are examples of the areas where neural networks are particularly useful. Areas where a traditional computational approach may be very complicated, or where the only information we have available for descision making is historical data recorded from our system.

A traditional approach to the stock market prediction problem might be to come up with a set of rules like these...

if airline shares are low and oil price is low
  airline shares will go up
if airline shares are higher than average
  airbus and boeing shares will go up

It doesn't take a genius to work out that these rules are never going to be complete. The prices of shares on the stock market are governed by countless factors, many of which we don't know about, yet humans can predict the stock market to a reasonable degree of accuracy. Neural networks allow us to mimic this human ability to deal with a problem on a sub-symbolic level, to "get a feel" for something, rather than analysing it using logic alone.

Generalisation is another key strengh of neural networks. A finite set of training data can be used to train a neural net, after which it will be able to acurately deal with situations which weren't covered in the training data. For example, we might teach the ball catching robot where to move under a set of given conditions, but there's no way we'd have time to teach it what to do in every situation. Generalisation will allow the robot to decide what to do based on similar, but not identical experiences.

This article doesn't cover training, just architecture, so there won't be any more talk of training data or generalisation but if you skip to the conclusion you'll find information on where to find out about training, evaluating and using neural networks.

3. Architecture of a Neural Network

Artificial neural networks are composed of a set of neurons, joined together by synapses. Neurons perform a simple computational task, generally a basic yes/no descision. Synapses link neurons together by linking their inputs and outputs.

In programming terms, a synapse is an object which links one neuron connected to its input to another connected to its output. A neuron is a slightly more complex object which can be connected to one or more input synapses and one or more output synapses. The structure of any neural network is therefore defined by the way in which various neurons and synapses are linked together.

Neural network architecture is defined by the way in which neurons (circles) are connected together by synapses (lines)
Figure 1: Neural network architecture is defined by the way in which neurons (circles) are connected together by synapses (lines)

Now we have a basic understanding of how a neural network's structure is defined, we can start to think about how such a network can be used to perform computation or in the case of a natural neural network, think. Natural neural networks are constructed from neurons: cells which connect together and transmit electrical impulses to and fro. The electrical activity of a neuron is dependant on the electrical activity of the neurons which connect to it. In the case of artificial neural networks we use real numbers instead of electrical actvity. Neurons output a value, which is passed to other neurons via synaptic connections and (together with values from any number of other neurons) determines the output of those neurons. Synaptic connections are one-way, so values generally propagate through the network in one direction.

Nerve cells, connected to sensory organs such as the eyes or taste buds form the inputs to natural neural networks and similar nervous connections to muscles and suchlike form the outputs. In an artificial neural network we connect the inputs and outputs to some form of interface software. This software may in turn connect our neural network to sensors, motors or servos or it may link it to a database of stock-market or weather data. Regardless of what data we're using or what problem we're trying to solve, our neural network will accept a set of real numbers as input and return a set of real numbers as output.

4. Inputs and Outputs

Special input neurons (the yellow blobs in figure 1) are used as place holders in the network; input values from our software interface are inserted through the input neurons which are connected to other neurons in the network via synapses. Neurons in the network are generally arranged in layers - groups of neurons which are linked only to similar groups on their left and right - the input neurons form the first such layer. Values from the input layer are fed from left to right through various hidden layers. The last layer of neurons in the network is known as the output layer; the output values of neurons in the output layer are the outputs of the network. This layered architecture is inspired directly by the design of the brain, in which layers of neuron cells are arranged in an onion-skin pattern.

Different people deal with network outputs in different ways. Some read output values directly from the neurons in the output layer while others (me included) use placeholder output neurons. It doesn't really matter which scheme is used, which is exactly the reason that more than one scheme exists! The blue circle in figure 1 represents an output neuron.

5. Computation

Both neurons and synapses output a value which is dependant on the values presented to them at their input (or inputs). Synapses mimic the tiny gap between neuron cells in the brain, across which the maginitude of electrical activity can be altered. Artificial synapses output a value equal to their input multiplied by some weight value.

Various different types of neuron exist, each of which generates its output based on a different process. Some common neuron types are listed here:
  • Input neurons: These provide a place where input values can be presented to the network. The output of an input neuron is equal to the input value assigned to it.
  • Hidden (sigmoid) neurons: These provide the network with the ability to perform calculations. The output of a sigmoid neuron is equal to the sum of its inputs multiplied by a sigmoid activation function.
  • Output neurons: Output neurons provide a place from which network output values can be read.
  • Bias neurons: Bias neurons output a steady value of 1. Bias neurons can be linked to other neurons to provide a constant bias

The most common neuron type found in most neural networks has a sigmoid activation function. Show below, the function is named "sigmoid" because its shaped a bit like an S.

The sigmoid actiation function
Figure 2: The sigmoid actiation function

The neuron's inputs are summed to give a "Weighted Sum of Inputs" (WSI) value. Weighting is the multiplication of synaptic weights, which is looked after by the synapse objects, so we could just call it the "Sum of Inputs". I'm using the traditional name here though. You can use the graph above to work out the output value of the sigmoid function given the WSI value. As WSI increases (left to right along the horizontal axis), the output changes (on the vertical axis).

The sigmoid is almost like a boolean. It is "off" until a given WSI value is reached, then it quickly (but not instantly) switches to "on". This models the behaviour of a natural neuron, which is generally either "firing" or "inactive".

For those who like such things, the equation for the sigmoid function looks like this

So, negative input values lower the WSI and make the neuron inactive, while positive input values increase the WSI and cause the neuron to fire. By altering the weights connecting neurons we can change the relationships between neurons and thus the behaviour of the network.

6. Neural Network Objects

Neural networks lend themselves well to an object oriented coding style. We can very quickly build a simple framework for neural nets using a basic set of classes. Here's some pseudocode for the basic requirements.

The TNeuralNetworkComponent class is the basis for everything in our neural network.

class TNeuralNetworkComponent
    function getOutput() : Float;  virtual; abstract;

Synapses are the simplest component of a neural network. They have an input and output neuron and their output is equal to their input multiplied by a weight value.

class TSynapse(TNeuralNetworkComponent)
    Weight: Float;
    InputNeuron: TSynapse;
    OutputNeuron: TSynapse;
    function getOutput() : Float;  override;

  Result := Weight * InputNeuron.getOutput()

The TNeuron class has an array of input synases and overrides the getOutput method as follows. The "Activation" method is overridden by child classes to implement specific functionality.

class TNeuron(TNeuralNetworkComponent)
    Activation(X : Float) : Float; virtual; abstract;
    InputSynapses : array of TSynapse;
    function getOutput() : Float;  override;

  for each InputSynapse
    SumOfInputs += InputSynapse.getOutput()

  Result := Activation(SumOfInputs)   

The TSigmoidNeuron class is an example of how we can derive classes from TNeuron to implement different functionalities.

class TSigmoidNeuron(TNeuron)
    Activation(X : Float) : Float; override;
    Gradient : Float;

TNeuron.Activation(X : Float) : Float
  Result := 1 / (1 + exp(X * -1 * Gradient)); 

I'm not going to go into any more detail here about the specifics of object oriented neural networks. The simple pseudocode classes should be enough to give you an idea of what is needed and any more complexity would hide the key features that I'm trying to illustrate.

7. A Brief Example

The basic components of a neural network, outlined above, can be combined to create networks of almost any configuration to perform just about any task. In this section we'll look at how they work together.

The host application places the appropriate values into the input neurons of the network, then it reads the output values from the output neurons. When the application requests the value of a network output it starts a chain of events which propagates through the entire network...

A very simple neural network.  See text for description
Figure 3: A very simple neural network. See text for description

  • The output neuron, O, in order to calculate the network output, must request the output of its single input synapse, S0
  • Synapse S0, in order to return its output, must get the value of its input and multiply it by its weight. To do this it must request the output of the neuron to which its input is connected: H0
  • Hidden neurons calculate their output by summing the values of all their input synapses and applying an activation function. Therefore H0 must request the output of three synapses: S1, S2, S3
  • These synapses, being exactly the same as the one connecting the output to the rest of the network, must request the output values of all the neurons to which they connect (B, H1 and H2)
  • Eventually a bias or input neuron will be reached (B or I in this case). Since these neuron types output a constant value they can return their outputs without any further calculation

This recursive approach to the calculation of network outputs has a number of advantages. The most notable of these is that additional neuron classes can be added without having to change the inner workings of the network.

8. Summary

In this article we've seen how the basic components of an artificial neural network operate and how they can be combined to perform basic computational tasks. However, we have not yet examned how we can make neural networks learn, which is their main advantage over traditional computing techniques.

At this point, you may wish to go on to read the articles Training Neural Networks with Back Propagation and/or Training Neural Networks with Evolutionary Algorithms, each of which explains how we can alter the weight values of synapses within a neural network to make it learn to perform a given task.