Media Center

## Chances and challenges for machine learning in highly automated driving, part 2: Theoretical background

Release Date：2018-08-20

Machine learning can be defined as a set of algorithms that facilitate predictions based on past learning.

In a machine learning algorithm, the input data is organized as data points. Each data point consists of features that describe the represented data. For example, size and speed are features that can differentiate a car from a bicycle on the street. Both the size and the speed of a car are usually higher than those of a bicycle. The goal of the machine learning methodology is to convert the input data into a meaningful output, such as classifying the input data into car and non-car data points or objects. The input is usually written as vector `x`, composed from several data points. The output is written as `y`.

Two- or three-dimensional input data can be illustrated and viewed in a so-called feature space, where each data point in `x` is plotted with respect to its features. Figure 8 (a) shows a simplified example of a two-dimensional feature space that describes the car and non-car objects. Figure 8. Classification of car and non-car objects based on their size and speed:feature space (a) and the corresponding separation between the two classes (b).

A so-called learned mapping function or `model,h_θ (x)`, gives the difference between the feature vectors (e.g., the classification into car and non-car data points). The structure of the model ranges from a simple linear function, such as the line dividing car and non-car objects in Figure 8 (a), to a complex non-linear neural network. The goal of the learning methodology is to determine the values of the `θ-` coefficients, which represent the parameters of the model from the available input data. The output of the mapping function is the algorithm’s prediction of what the input data describes.

Machine learning methods can be classified according to how the mapping function is learned (Figure 9). There are three possibilities:

• Supervised learning – The mapping function is calculated from training data pairs where the output, `y`, known in advance, is given to the learning algorithm separately during the training phase. The model can be deployed into the target application once its parameters have been computed. Its output – when it receives an unknown data point – will be the predicted value of `y`.

• Unsupervised learning – In this case there are no feature-label pairs available during the training phase, in contrast to supervised learning. The input to the learning algorithm consists only of unlabeled data points. The goal of this machine learning methodology is to deduce labels for the input features, `x`, directly from their distribution in the feature space.

• Reinforcement (semi-supervised) learning – The training data has no labels in this case, either, but the model is constructed to facilitate an interaction with its environment through a set of actions. The mapping function maps the state of the environment, which is given by the input data to actions. A reward signal indicates the performance of an action on a certain state of the environment. The learning algorithm reinforces the action when the signal indicates a positive influence. The algorithm will discourage the specific action or state of the environment if a negative influence is recognized. Figure 9. Classification of machine learning algorithms based on their training methodology.

# The deep learning revolution

The so-called deep learning paradigm has revolutionized the machine learning field in recent years. Deep learning made a huge impact on the machine learning community by solving challenges that previously could not be tackled with traditional pattern recognition approaches (LeCun et al. 2015). The introduction of deep learning has dramatically improved the precision of systems designed for visual recognition, object detection, speech recognition, anomaly detection, or genomics. The key aspect of deep learning is that the features used to interpret the data are learned automatically from the training data instead of being manually crafted by an engineer. Figure 10. Deep convolutional neural network trained to recognize cars in images.

Until now, the main challenge in constructing a good pattern recognition algorithm has been the manual engineering of the hand-crafted feature vector for classification, such as the local binary patterns used in an earlier version of the traffic sign recognition system as described in part 1. The emergence of deep learning has replaced manual engineering of the feature vectors with learning algorithms that can discover significant features in the raw input data automatically.

Architecturally, a deep learning system is made up from several layers of non-linear units, which can transform the raw input data into higher levels of abstraction. Each layer maps the output of the previous layer into a more complex representation that is suitable for regression or classification tasks. This learning is usually performed on a deep neural network that is trained by the use of a back-propagation algorithm. This algorithm iteratively adapts the parameters or weights of the network in order to mimic the input training data. The network thus has learned a complex non-linear mapping function of the input data points by the end of the training.

Figure 10 shows a symbolic representation of a deep neural network that is trained to recognize cars in images. The input layer represents the raw input pixels. Hidden layer 1 usually mimics the presence or absence of edges in certain locations and orientations of the image. The second hidden layer models object parts using the edges calculated in the previous layer. The third hidden layer builds an abstract representation of the modeled objects, which, in our case, is the way a car is imaged. The output layer calculates the probability that a given image contains a car, based on the high level features of the third hidden layer.

Different network architectures result from the way that the units and layers of a neural network are distributed. The so-called perceptron is the simplest, consisting of a single output neuron. A large number of neural network flavors can be obtained by building on the perceptron. Each of these networks is more suited to a specific application than others. Figure 11 shows three of the most common neural network architectures out of the many that have been created in recent years.

deep feed-forward neural network (Figure 11a) is a structure in which the neurons between two neighboring layers are fully interconnected and the information flow is in one direction only, from the input to the output of the system. These networks are useful as general-purpose classifiers and are used as the basis for all other types of deep neural systems. Figure 11. Deep neural network architectures (Source: www.asimovinstitute.org)

The deep convolutional neural network (Figure 11b) changed the way that visual perception methods are developed. Such networks are composed of alternate convolutional and pooling layers that learn object features automatically by generalization from the input data. These learned features are passed on to a fully interconnected feed-forward network for classification. This type of convolutional network is the basis of the car detection architecture shown in Figure 10 and the use cases described in part 1.

While deep convolutional networks are crucial to visual recognition, deep recurrent neural networks (Figure 11c) are essential for natural language processing. The information in such architectures is time-dependent due to the self-recursive connections between the neurons in the hidden layers. The output of the network can vary depending on the order in which data is fed into the network. For example, if the word "cat" is fed in before the word "mouse," a certain output is obtained. Now, if the input order changes, the output order may change, too.

# Types of machine learning algorithms

Although deep neural networks are among the most often used solutions in complex machine learning challenges, there are various other types of machine learning algorithms available. Table 1 classifies them according to their nature (continuous or discrete) and training type (supervised or unsupervised). Table 1. Types of machine learning algorithms

Machine learning estimators can be classified roughly according to their output value or training methodology. The algorithm is classed as a regression estimator if the latter estimates a continuous value function `y ϵ R` (i.e., a continuous output). The machine learning algorithm is called a classifier when its output is a discrete variable `y ϵ {0,1,…,q}`. The traffic sign detection and recognition system described in part 1 is an implementation of this type of algorithm.

Anomaly detection is one special application of unsupervised learning. The goal here is to identify outliers or anomalies in the data set. The outliers are defined as feature vectors that have different properties compared to the feature vectors commonly encountered in the application. In other words, they occupy a different position in the feature space.

Table 1 also lists some popular machine learning algorithms. These are briefly explained below.

• Linear regression is a regression method used to fit a line, a plane, or a hyperplane to a dataset. The fitted model is a linear function that can be used to make predictions on the real value function `y`.

• Logistic regression is the discrete counterpart of the linear regression method, in which the predicted real value given by the mapping function is converted to a probability output that denotes membership of the input data point to a certain class.

• Naïve Bayes classifiers are a set of machine learning methods built on the basis of Bayes' theorem, which makes the assumption that each feature is independent of the other features.

• Support vector machines (SVM) are designed to calculate the separation between classes using so-called margins. The margins are computed to be as wide as possible in order to separate the classes as clearly as possible.

• Ensemble methods, such as decision treesrandom forests, or AdaBoost combine a set of base classifiers, sometimes called “weak” learners, with the purpose of obtaining a “strong” classifier.

• Neural networks are machine learning algorithms in which the regression or classification problem is solved by a set of interconnected units called neurons. In essence, a neural network tries to mimic the function of the human brain.

• k-means clustering is a method used for grouping together features that have common properties, i.e., they are close to each other in the feature space. k-means iteratively groups common features into spherical clusters based on the given number of clusters to group.

• Mean-shift is also a data clustering technique, which is more general and robust with respect to outliers. As opposed to k-means, mean-shift requires only one tuning parameter (the search window size) and does not assume a spherical prior shape for the data clusters.

• Principal components analysis (PCA) is a data dimensionality reduction technique that transforms a set of possibly correlated features into a set of linearly uncorrelated variables named principal components. The principal components are arranged in order of variance. The first component has the highest variation; the second has the next variation below this, and so on.

Part three evaluates these machine learning algorithms in the context of functional safety requirements.

Share：