Can a neural network learn additionally

Predictions with neural networks

Frank M. Thiesing, Department of Mathematics / Computer Science, University of Osnabrück

Artificial neural networks have found a number of successful applications in industry. The purpose of this article is to present their use for forecasting time series. Using sales figures as an example, it is shown how neural networks can predict what the future will bring.


Neural networks are information processing systems that consist of a large number of simple units. These so-called neurons are connected to one another to form networks through directed connections. The neurons use this to send information in the form of their activation.

Neural networks have a rough analogy to the brain, in which information processing takes place through a large number of nerve cells, which are very simple compared to the overall system and which transmit the degree of their excitation to other nerve cells via nerve fibers.

In addition to this motivation by the partial similarity to successful biological systems artificial Neural networks but also a large part of their motivation from the fact that these are massively parallel, adaptive systems that are interesting in themselves. These algorithms can be used in many areas of application in the form of programs, network simulators or also in the form of special neural hardware [1,2].

The essential element of these artificial neural networks is their ability to learn. This ability to learn a classification problem independently from training examples without having to explicitly program it, makes neural networks universally applicable.

The application of neural networks is simple; nevertheless, due to their non-linear structure, they are able to recognize even complex, hidden structures.

Basics of neural networks

Since the first artificial neural networks were designed in the 1940s, the number of models examined has increased from year to year. Different network models are used for different applications: The There is no such thing as a neural network. The models differ considerably from one another, both in the functionality of the individual neurons and in their connection structure [3,4]. For the basics of neural networks, reference is made to the article by V. Sperschneider [5], which provides an overview.

In the following only networks of the successful type are discussed Feedforward multilayer perceptron used. These are adaptive classifiers that can learn the functional relationships between input and output patterns through the presentation of training data without the unknown analytical dependencies being input into them in any way.

Neurons in the multilayer perceptron

A neural network consists of neurons and weighted edges that connect them. In the Feedforward Multilayer Perceptron (MLP), the neurons are structured in layers, and each neuron of a layer is connected to all neurons of the two neighboring layers, with two exceptions: the input layer has no predecessors and the output layer has no successors. The layers between the input and output layers are called hidden layers. The number of hidden layers varies, but is rarely greater than two (see Figure 1).

Figure 1: Feedforward Multilayer Perceptron

How a neuron works

A neuron is activated by its predecessor and passes its output on to all of its successors. This results in a flow of information from the input to the output layer. With the exception of the input layer, all neurons are active. The neurons of the input layer only pass through an applied input value.

Each active neuron in the multilayer perceptron calculates the weighted sum of the outputs of its predecessors. The calculation of the activation of a neuron also includes the addition of a neuron's own threshold value betaj. This activation is transferred to the output of the neuron with the help of a transfer function f. With four predecessors (see Figure 2):

Figure 2: How a neuron works in the perceptron

A sigmoid is often used as the transfer function, which limits the value range to] 0.1 [.

In a network the weights wji and the threshold values ​​betaj changed and thus different functions can be constructed.

Application of an MLP

The typical application of a multilayer perceptron is the approximation of an unknown and analytically elusive mapping rule f. Some arguments x and the associated function values ​​y of this multidimensional mapping are known: f (x) = y. The aim is for the neural network to learn the mapping rule on the basis of the exemplary values ​​and thereby be able to generalize the course of the function so that an untrained x is mapped onto a meaningful y.

For this purpose, the MLP can be viewed as a black box that trains itself by presenting the arguments in the input layer and the values ​​in the output layer.

The back-propagation algorithm

The learning algorithm for MLP networks was developed by Rumelhart et al. presented and has developed into a standard [6]. During the training, the backpropagation algorithm automatically adjusts the network weights during the repeated presentation of all training pairs so that the mapping error is minimized.

Initially, the threshold values ​​of the neurons and the weights are assigned random values. If an input vector x is created, the so-called Forward phase an output vector y * is determined. This differs i.a. from the specified target value y.

So that the MLP can learn the mapping rule f (x) = y, a y * with an error delta = y - y * is determined for each training pair (x, y) in the forward phase. This error is now shown in the Backward phase of the training, from the output layer towards the input layer (see Figure 3).

Figure 3: Forward and backward phases of the backpropagation algorithm

Errors are calculated again in the hidden layers and forwarded to the front. These errors are used to correct the network weights and threshold values.

The type of error minimization is a gradient descent method that minimizes the mean square error. The speed of the process is set by a parameter known as Learning rate referred to as.

By training all training pairs several times, the mapping error is reduced to a local minimum. By using a momentum term that deflects the gradient in the direction of the previous one, such local minima can be left in the hope of finding a better minimum with a smaller error.

What does the future hold?

The forecast of time series plays an important role, especially in the economic sector. For share and stock exchange prices as well as for the consumption of electricity and district heating, analysts try to predict future developments from the data that is regularly collected in order to be able to make well-founded decisions. Recently, neural networks have been increasingly used for this purpose [7,8].

For example, we have developed a system that makes it possible to forecast the weekly sales of items in a supermarket in order to support scheduling and minimize storage costs. For this purpose, neural networks are used to learn the apparently chaotic buyer behavior based on the development of the past. Many factors influence buyer behavior. Therefore, the selection of the right influencing factors and their modeling is crucial. The real data comes from the scanner tills of a supermarket.

For the time series forecast, the MLP networks are trained to approximate the course of the time series and to generalize it for the forecast. The advantage of using neural networks lies in their ability to adapt quickly to changing framework conditions and to independently improve the forecast quality through additional learning.

Time series forecast with MLP

To forecast a time series with an MLP, a sliding time window with n past values ​​is placed over the time series. The training task consists in inferring the next value from n values ​​in the input layer. The training takes place on the basis of the known values. The trained network is used for forecasting by creating the last n known values ​​in the input layer, which the network maps to the forecast value (see Figure 4).

Figure 4: Time series forecast with MLP

This type of procedure only makes it possible to predict the future of this one time series based on the temporal development, so to speak from within itself. Often, however, it is external influences that influence the course of a time series. These influencing factors must also be available in the form of a time series and are entered into the input layer of the MLP via additional neurons. The explanatory influences include, in particular, those that are already known for the point in time t to be forecast.

In the application presented, we want to forecast the sale of items in a supermarket. The data on the number of items sold and their prices are available on a weekly basis. The sale of the products under consideration is subject to strong fluctuations, which can be traced back to partly known and partly unknown influencing factors. Clearly, promotions, price changes and public holidays have a decisive impact on buyer behavior. Therefore, the data on the type and duration of advertising campaigns and price changes are also included in the forecast. The number of opening days in a week is also taken into account.

The weekly sales figures and the price of an exemplary item are shown in Figure 5 along with the promotional and public holidays.

Figure 5: Sale, price, promotions of item 229104, holidays

The strong increase in sales that goes hand in hand with a special price campaign is easy to see.The aim of the forecast is, in particular, to anticipate such outliers in sales during planning in order to have sufficient stocks. For this purpose, the neural network is trained on precisely these relationships between price, promotions and sales.

The decisive factor here is that the information on price, promotions and public holidays is already available for the near future and can be incorporated into the forecast.

Data modeling

Efficient preprocessing of the resulting data is necessary in order to be able to enter them into the MLP. In particular, based on the selected sigmoid function, the different input values ​​should be scaled to the interval] 0, 1 [. This is also necessary in order to balance the qualitative and quantitative influencing factors.

In addition to the necessary normalization, other techniques are used for preprocessing time series data. For example, it can make sense to log the values ​​beforehand. In particular in the case of time series that show constant growth, this is to be eliminated Trends differentiating the values ​​is necessary. As a result, the derived time series then consists of the differences between successive values.

In the example of the sales forecast, the relevant sales information is given as weekly data. The sale is only known up to and including week T-1, the other time series also beyond that. The raw data used for an article in a week t are:

  • SALt: Sale in pieces,
  • ADVt: Number of days of action,
  • PRIt: Price,
  • HOLt: Number of public holidays.

The normalization takes place on the interval [0.1,0.9], since the values ​​0 and 1 are not accepted by the sigmoid.

The sale in week t, SALt, is normalized linearly with the help of its maximum:

The number of action days or holidays in relation to six days of the week is also standardized linearly. The price change is only recorded qualitatively and not quantitatively. The input neuron for the price in week t, prit, has the value

  • 0.1 if the price goes down,
  • 0.5 if the price stays the same,
  • 0.9 if the price goes up.

For each past and therefore known week t, a vector of the values ​​of the normalized time series is used:

vect : = (holt , advt , prit , salt)

For a week t `` in the future '', for which the sale is to be trained or predicted, only the influencing factors are included:

{vect} has: = (holt , advt , prit)

The known data from a time window of n weeks, vect-n until vect-1, related and the influencing factors for weeks t and t + 1. The MLP is trained to apply this input to the normalized value of the sale, salt + 1to map. Figure 6 shows the input and output schematically.

Figure 6: Input for MLP

For practical reasons, the forecast for week T + 1 is determined in week T with the known sales up to and including week T-1, so that there is enough time for the diposition.

For this purpose, the trained network is queried in week T with the data of the past n weeks and the influencing factors of the current and coming week. The output value is salT + 1 expected, which must be transformed back to the value for the forecast sale of the article in week T + 1.

A forecast example

The challenge when using neural networks is to set the correct parameters: the number of hidden layers, the number of neurons in these and the learning rate are just a few variables that must be set by the developer in such a way that satisfactory prognosis results are achieved. Special attention must be paid to the depth of the past and the selection of influencing factors as well as their modeling before the neural network can predict what the future will bring. Optimizing all of these parameters involves a lot of test runs.

Using the example of article 229104, which has 3 actions in the period under review (see Figure 5), the result of the forecast is to be compared with the real data.

The selected MLP network has two active layers in addition to an input layer. The hidden layer has 15 neurons and the output layer consists of one neuron. The 22 input neurons result as follows: With a past depth of n = 4 weeks, there are 6 neurons each for price changes, promotional information and holidays. These dates are known up to and including the forecast week.

On the other hand, the forecast sales for the last two weeks are of course unknown. Therefore 4 neurons are required for this time series.

Figure 5 shows the sales information of article 229104. Figure 7 shows a comparison of the actual sales and the successive forecasts made with the neural network from week 17/95.

Figure 7: Sales and forecast for item 229104

Particularly noticeable is the ability of the neural network to predict the sharp increase in sales for both actions. Buyer behavior during a campaign was learned from the previous campaigns. Even if the amount of the sale increases to different degrees during a campaign, the correct trend is predicted qualitatively by the network.


Neural networks are adaptive classifiers which, especially in non-linear, chaotic systems, automatically learn the analytically unknown relationships between input and output patterns in order to be queried with unknown data after training. No identity between the taught and requested pattern is required, but a similarity of the current data with the training data is sufficient. This property of the neural networks for generalization makes them so interesting for use in the classification and prognosis of fuzzy and noisy data.


  1. A. Zell Simulation of neural networks Addison-Wesley, 1994.
  2. E. Schöneburg Industrial application of neural networks Addison-Wesley, 1993.
  3. R. Rojas Neural Network Theory Springer-Verlag, 1993.
  4. N. Hoffmann Small handbook of neural networks Vieweg, 1993.
  5. V. Sperschneider `` Ingredients and recipes from the neuro kitchen '' unix / mail 13, 1995, 4.
  6. D.E. Rumelhart, G.E. Hinton, R.J. Williams `` Lerning Internal Representations by Error Propagation '' in Rumelhart, McClelland (Ed.) Parallel distributed processing 1, MIT Press, Cambridge, 318-362, 1986.
  7. V.R. Vemuri, R.D. Rogers Artificial Neural Networks - Forecasting Time Series IEEE, 1994.
  8. A.S. Weigend, N.A. Gershenfeld Time Series Prediction Addison-Wesley, 1994.