1. Introduction
Predicting stock market movements has always been an issue of great interest for investors.
The predictability and the profitability are strictly linked together, and thus enhancing the
predictive capabilities can lead to an advantage to be exploited by adjusting investment stra-
tegies while minimising risk. Recent improvements in the field of machine learning, with
their ability to approximate challenging tasks, continue to question the unbeatability of the
market, and provide a world of innovative solutions for improved predictability in the measu-
rement of excess return.
The analysis of stock prediction, nevertheless, can only be undertaken in conjunction with the
definition of time series analysis. A time series is a sequence of observations ordered in time.
The type of sequence changes depending on whether we are dealing with a continuous time
series, i.e. a system where values change continuously during a time interval, and a discrete
time series, i.e. a system with a finite set of values. In machine learning classification prob-
lems, values are discrete, while in a machine learning regression problem, the variable is con-
tinuous.
The trend of a series can still be characterized by possible seasonal variations and by station-
ary or non-stationary trends. While in the stationary, mean and variance do not change over
time, a stationary trend (in our case the stock market trend) has mean and variance that are not
constant. More broadly, a motion can be defined not constant in the presence of trends, which
are essential when it comes to business, industry or general economic phenomena.
Moreover, it is very crucial to take into account all the "noisy processes", i.e. unexpected
news that can come from different events, such as technological innovations, political
changes, natural disasters, that change from one moment to another and modify the possible
performance of a stock. For this very reason, one of the objectives of this work is to under-
stand how machine learning systems can deal with a series of predictable and unpredictable
events, and how much they are able to explain these market movements in advance.
Therefore, the paper will precisely review the state of the art regarding the techniques em-
ployed in the field of machine learning, the empirically demonstrated predictive performances
from recent literature, and the future applications.
Explaining more precisely what machine learning is, it would be defined as the ability of a
system to learn through experience. A series of data are entered into the system, of which the
algorithm must map the behaviours, to be applied later on to a new series of "unseen data".
Predictive abilities should then be evaluated by comparing the values returned by the al-
gorithm and the actual values expected by the researcher. The smaller the measurement of the
prediction error, the greater the possibility of approximation and reliability of the system. The
term machine learning used in this thesis refers to regression algorithms, and is used with ref-
erence to a series of predictive statistical models, which combined with systems of "regulariz-
ation" are useful to reduce data overfitting and extract from a complex set of data, patterns
and insights useful for the researcher. Their main characteristic, which distinguishes them
from classical econometric methods, is the ability to be flexible and to manage large amounts
1
of data to be approximated in order to obtain a useful description of the data that would be
impossible to achieve through the human eye.
The asset risk premium is in practice a measure of the future expected excess return, and
therefore is well suited to be investigated by machine learning tools. As will be shown in the
following chapters, the list of variables to be faced when dealing with an asset return problem
are hundreds, sometimes very correlated and overlapped. The stock return is composed of
numerous unpredictable components, so even the best-improved method of forecasting will
explain only a minimum part of stock return behaviour. The analytical effort is therefore im-
mense, and difficult to explain even by classical linear statistical methods, given the non-lin-
earity of the data. Machine learning systems instead, using different optimization techniques,
as well as reducing the size and complexity of the data, are appropriate for solving this chal-
lenge.
On the other hand, improvements and greater precision in non-linear machine learning sys-
tems have a drawback. Although they are better at discovering the complex relationships that
can exist between variables, being a non-parametric statistical model, they have the disad-
vantage of not being able to provide interpretations regarding the nature of the results ob-
tained from the model. As Neural networks are unable to explain the nature of the relation-
ship, they also encounter limitations and problems during the model preparation and selection
phase, when it is decisive to settle on a series of optimizations and parameters to be applied.
Precisely for this reason, as will be shown later, the implementation of a machine learning
system does not stop at choosing a random method. The several possible models to imple-
ment, in addition to being numerous, have different specifications to choose from. Apart de-
ciding which machine learning system to use between linear methods such as Ordinary Least
squares, Generalized linear, Penalized linear, and non-linear methods such as Decision tree,
Random Forests, Support Vector Machines and Neural Networks, it is fundamental to optim-
ize the individual parameters, the depth of the network, the mathematical functions through
which the calculations will be carried out, as well as the methods to minimize calculation er-
rors.
For this very reason, the literature has not been in consensus over time in reaching conclu-
sions on the actual extent of these systems, because since there are hundreds, and perhaps
thousands of parameters to choose from, it sometimes becomes complicated to compare dif-
ferent specifications.
Over the years, most empirical research on asset pricing has focused on two main issues: un-
derstanding the differences in expected return between assets, with much research on "Empir-
ical Cross-Sectional Asset Pricing" with Goyal (2011), Nagel (2012), Chordia (2015), and in
terms of aggregate market asset risk premium.
Since the objective of asset pricing is to explain the behaviour of risk premia, even in the
presence of perfectly observed returns, it would still be necessary to explain the reasons for
which those returns occurred. This thesis, therefore, focuses on the ways used to increase the
predictability of future excess returns, leaving to future research the analysis of the relation-
2
ships that exist between the variables and that would serve to explain in its entirety the risk
premia.
1
This analysis will be carried out in the following sequence. In the first part, the machine learn-
ing techniques will be accurately described, with the various facets, their properties, and their
usefulness in describing the phenomenon under study. Greater importance will be given to the
numerous types of neural networks, considered the most suitable and performing machine
learning systems in describing the phenomena, but alternative methods such as support vector
machines and decision trees will also be examined.
In the second part, instead, ample space will be given to the most relevant literature up to
date. In particular, the last work carried out in June 2018 by Kelly, Gu, and Shu, respectively
from the Universities of Yale and Chicago, entitled "Empirical Asset Pricing via Machine
Learning", will be examined in depth. This work has shown with particular precision the pre-
dictive validity of non-linear models of machine learning and in particular of Neural Net-
works. Their work has not only stopped at showing the validity of systems but has also invest-
igated in depth the strength of predictive variables, also starting from previous works by Goy-
al & Welch (2008) and Green et al. (2013), to the extent that recent price trends become the
most critical variable (through short-term reversal, stock momentum, momentum change, in-
dustry momentum), followed by liquidity variables (with turnover, dollar volume, bid-ask
spread), risk measures (with beta, beta squared and volatility of returns), ending with valu-
ation ratios and fundamental signals, such as earnings to price, sales to price, asset growth.
The results achieved in terms of predicting performance will then be used by Kelly et al. to
create portfolios optimized to obtain extra returns with respect to those obtained by a buy &
hold investor, demonstrating how, with the use of neural networks, it is possible to obtain an
extra yield (calculated in terms of Sharpe ratio) with respect to the market.
Finally, In the last part, it will be addressed the future implications and improvements that can
be achieved in the field of machine learning and how they could be exploited by the literature,
to better explain the mechanisms that lead to a performance compared to another.
2. Artificial Neural Networks
Machine learning systems can be split into two different techniques of analysis: Supervised
learning and unsupervised learning. In Supervised learning, training data are a series of
labelled examples, where each example is a collection of features that are labelled with the
correct output corresponding to that feature set, meaning that features and outputs are
provided (training data), and the algorithm will apply then what it learnt from the analysis of
Gu, Kelly, Shihao, “Empirical Asset Pricing Via Machine Learning.” SSRN Electronic Journal, June
1
2018, doi:10.2139/ssrn.3159577.
3
the training data to another dataset called test data. On the other hand, Unsupervised learn
2
-
ing, consist of observations where the feature set is unlabelled, and the algorithm try to ad-
dress data into distinct groups.
Supervised learning, can be further split into two different methods of analysis: regression
and classification.
Regression is typically a field of Artificial Neural Networks, whilst in Classification, the
most important algorithm is the Support Vector Machine (SVM).
Artificial Neural Networks (ANN) «are computing systems made up of a number of simple,
highly interconnected processing elements, which process information by their dynamic state
response to external inputs» , inspired by the information processing model of the human
3
brain, and modeled after the neuronal structure of the mammalian cortex but on smaller
scales. On a biological level, each neuron receives input through dendrites, which in turn are
processed in the nucleus, that through the axon terminals results in an output (a behaviour).
Neurons ultimately emit electrical signals, and the measure of their activity is given by the
frequency with which these signals pass from the nucleus to the axon terminals (synapses).
The same structure is the basis of the imitation made by machine learning. Each neuron is
therefore connected with billions of other neurons, and through the synapses, the signals are
propagated in the network of neurons. A big Artificial Neural Network may have thousands
of processor units, whereas the human brain has approximately 86 billion (neurons), connec-
ted with synapses.
Figure 1: Visualization of human biological neuron
Image source: A Gentle Introduction To Neural Networks Series - Part 1.” Towards Data Science, Towards Data Scien-
ce, 4 Aug. 2017, towardsdatascience.com/a-gentle-introduction-to-neural-networks-series-part-1-2b90b87795bc. Accessed 22
Aug 2018
Brownlee, J. “A Tour of Machine Learning Algorithms.” Machine Learning Mastery, 2017
2
Caudill, Maureen. "Neural networks primer, part I." AI expert 2.12. 1987
3
4
ANN can be used for recognition, classification, clustering, association, regression, optimiza-
tion, but most important for our analysis, for prediction.
The smaller computation unit of ANN is the neuron (node, unit) and the sum of these inter-
connected processing elements, receives input from other nodes, then process the information,
finally producing one or more outputs. The main characteristic of these systems is that they
are designed to be trained through a learning process, acquiring a high predictive power.
The main distinctions between neural networks are the type of flow of the information: Feed-
forward Neural Network, and Recurrent Neural Network.
2.1. The Architecture of Neural networks
The main task in the machine learning field is to find a predictor of an output given one or
more input . This machine learning is so defined as an input-output mappingwith the input
space that is high-dimensional with ,
and the predictors denoted by .
Depending on the problem of the analysis (regression, classification, mixed), the output can
be continuous or discrete.
The output of the machine learning model is obtained passing learned attributes of data
through different layers. In the case of Deep learning the input data are inserted in the model
as the first operation, and then the final output is produced through manifold levels of calcula-
tions, and in each of them, data are transformed.
4
Heaton, J. B., N. G. Polson, and Jan Hendrik Witte. "Deep learning in finance”, 2016. pp. 1-4.
4
5