Matteo Fasano
Generalized thermodynamic description of complex biological systems
3
INTRODUCTION
Computational Systems Biology is evolving rapidly, and not a single group of investigators has already
developed a complete system that integrates both data generation and data analysis, in a way to
allow full and accurate modeling of selected biological agents.
Each new method or database implemented represents one or more steps on the path to a complete
description of biological systems. How these tools will evolve and how they will be ultimately
integrated is an area of intense research and interest.
In this thesis in particular, a thermodynamic approach to complex biological systems description is
attempted. The theme of non-equilibrium thermodynamics and its application to biotechnology can
be considered an innovative way of modeling for a quite recent science field: Systems Biology.
Systems Biology can be defined as the quantitative study of biological systems, supported by
technological progress: in other words, the data-centric quantitative modeling of biological processes
and systems.
Systems Biology is related to three main aspects: it is experimentally driven, computationally driven,
and knowledge driven. It is experimentally driven because the complexity of biological systems is
difficult to penetrate without large-scale coverage of the molecular underpinnings. It is
computationally driven because the data obtained from experimental investigations of complex
systems need extensive quantitative analysis to be informative. Finally it is knowledge driven
because it is not computationally feasible to analyze the data without incorporating all that is already
known about the Biology in question. Furthermore, the use of data, computation and knowledge
must be concurrent.
Researchers have traditionally considered the study of biological systems rather resistant to
quantitative approaches. Two events have occurred to bring the field of computational Systems
Biology to the forefront. One is the advent of high-throughput methods that have generated large
amounts of information about particular systems in the form of genetic studies, gene and protein
expression analyses and metabolomics. The other event is the growth of computational processing
power and tools. Methods used to analyze this kind of large data sets are often computationally
demanding and, as it happens for other areas, the field has benefited from continuing improvements
in computational hardware and methods.
For the purposes of this thesis, Systems Biology is the promise to analyze Biology on a larger and
quantitatively rigorous scale, thanks to a cross-fertilization of knowledge. In fact, this research is
centered on the advantage that biological systems modeling can take from decades of systematic
model reduction research in non-equilibrium thermodynamics
In this thesis, the mathematical notion of slow invariant manifold (SIM) and its convenient
approximation (the Quasi Equilibrium Manifold, QEM) have been exploited in order to study a series
of complex biological systems.
During the last decades several promising methods, for reducing the description of systems with a
large number of degrees of freedom, have been developed in the context of physical and chemical
kinetics. For instance, an intensive effort has been spent in devising such techniques for combustion
mechanisms, where agents are represented by chemical species linked through highly nonlinear
interactions and similar issues to biological systems have been encountered (i.e. a tremendously
4 INTRODUCTION
large number of agents evolves in time with disparate time-scales [see, e.g., D. Goussis et al., in
Turbulent Combustion Modeling, ed. T. Echekki and E. Mastorakos, Springer, 2011]).
Some modern and systematic approaches to complexity reduction often exploit a sophisticated
concept of time-scale separation, and are implemented by seeking for a low-dimensional manifold
(the slow invariant manifold) in a phase-space [Gorban and Karlin, Invariant Manifolds for Physical
and Chemical Kinetics, Springer, 2005]. SIM de facto establishes a link between a "micro-world" of a
detailed (but often too complicated) description and a handier "macro-world".
Inspired by the above model reduction techniques devised for chemical kinetics, here we will work
out a new paradigm for handling complex bio-chemical networks.
More important, unlike traditional approaches, here we aim at reversing the above process. Namely,
we intend to devise a procedure suited to directly link geometric features of SIM to interactions
among agents. To the best of our knowledge, this approach has never been attempted so far, though
such a tool is particularly desirable for investigations on biological systems, where many interactions
are often unknown. In fact, such reversed mapping from a macroscopic description to the micro-
world of a complex phenomenon is what is needed to enhance our understanding of complex
biological systems at a fundamental level, with the support of typical experiments conducted by
varying a fairly few (but dominant) variables.
The thesis preparation has involved Dr. Eliodoro Chiavazzo and Prof. Pietro Asinari (Dep. of
Energetics at the Politecnico di Torino), whose main expertise is the model reduction based on SIM,
and it takes advantage of ongoing collaboration with Dr. Paolo Provero, a biology researcher at the
Molecular Biotechnology Center in Turin.
This thesis is divided into four chapters.
In the opening chapter the model construction process will be discussed. After presenting the
fundamental aspects of the modeling process, the analysis will be focused on the issue of models
classification. Lastly, a systematic approach for the model construction process will be proposed, in
order to face the following problems through a rational procedure.
In the second chapter a theoretical preface will be done; it deals with the issues which will be
afterwards practically treated. Particularly, there are three main disciplines which will be touched
during the essay: Physics, Mathematics and Biology.
Physics is the theoretical base of the quasi-equilibrium models. Concepts as thermodynamic
approach, slow invariant manifold and quasi equilibrium manifold will be widely discussed, for a
complete understanding of the dynamic models based on quasi equilibrium approximation. Quasi
equilibrium dynamic models are based on the knowledge of the steady states of the analyzed
system; therefore a detailed analysis of the characteristics of the equilibrium states, according to the
conserved quantities of the system, will be done. Moreover, it will be explained the principal
component analysis, which will help to calculate the amount of conserved quantities in a system, and
the direct or Montecarlo-like analysis, which will be used for the exact identification of the
conservation laws.
Mathematics provides the tools in order to optimize and “tune” models. In detail, the mathematical
tools explained will be: two optimization algorithms (genetic algorithm and constrained nonlinear
optimization), which will be used for fitting the models to the studied system, and the constrained
Jacobian matrix, which will have a fundamental role for the network construction process.
Finally, a close examination concerning the Biology of the studied systems will make it possible to
identify the utility and the potential field of application of the introduced models.
Matteo Fasano
Generalized thermodynamic description of complex biological systems
5
In the third chapter, the quasi-equilibrium model will be tested through several well-known
biological systems (already modeled thanks to a detailed kinetic approach), following an increasing
order of complexity.
In particular, the model will be used for predicting dynamics of a Michaelis-Menten enzymatic
network, a simplified gene regulation system, the MAPK cascades process and the Calvin cycle. The
best algorithms and Matlab functions will be identified for our tasks, in order to obtain an optimized
QE model before its application on an experimental case. Moreover, a comparison between one-
dimensional manifold models and two-dimensional ones will be attempted. Lastly, constrained
Jacobian matrices will be utilized for the reaction network construction, and the space of equilibrium
states of the above systems (and additionally IкB metabolism and Purine metabolism) will be
explored, in order to obtain the number of conserved quantities for the analyzed systems.
In the last chapter the QE model will be applied to some experimental data obtained thanks to a
collaboration with the MBC (Molecular Biotechnology Center) located in Turin: the transcriptional
regulatory networks in embryonic stem cells will be studied. More precisely, thanks to the principal
components analysis of a cloud of equilibrium states, it will be possible to deduce the amount of
conservation laws; while, thanks to a direct or a Montecarlo-like analysis, the conservation laws will
be precisely identified. Then, species dynamics will be fitted using QE models, and the Jacobian way
of network construction process will be attempted.
Finally, a comparison between the results obtained by the QE models and a model proposed in
literature [Schmidt, Lipson, Distilling Free-Form Natural Laws from Experimental Data, Science,
Vol.324, 2009] will be conducted in appendix C.
Matteo Fasano
Generalized thermodynamic description of complex biological systems
7
1. MODELING
In this starting Chapter, we discuss the model construction process. After presenting the fundamental
aspects of the modeling process, we focus on the issue of models classification. Lastly, a systematic
approach for the model construction process will be proposed, in order to face the following problems
through a rational procedure.
Matteo Fasano
Generalized thermodynamic description of complex biological systems
9
1.1. MODELING
Starting from the definition given by the dictionary, a model is anything used in any way to represent
anything else. Models are used to help people to know and understand the subject matter they
represent.
As the British statistician George E.P. Box said: “All models are wrong but some models are useful”.
Models could not be perfect, but they are surely the only way to understand better the complexity of
reality.
In scientific and technical fields, a model is a representation of an object or a phenomenon when it
reproduces some of its fundamental features or behaviors. In addition to these characteristics, a
model has to be created following precise experimental proofs, and it has also to be formulated
through a verifiable and clear method.
In fact, another important feature is the possibility to study and explain the model anywhere in the
world, even if the modeled object is not available.
In some cases, the constitution of a scientific or technical model stems from a conceptual or
theoretical construction. Generally speaking, a model has to be the result of a rigorous process of
experimentation.
In particular, tests of validation of the model have to be designed in order not to be influenced by the
expectations or the subjective interpretation of the observer.
After having highlighted the main features that a model should have, now a useful guideline for the
model construction will be presented.
Occam's razor is a principle that generally recommends, when faced with competing hypotheses
equal in other respects, selecting the hypothesis making as few new assumptions as possible. It is
often expressed in Latin as lex parsimoniae, translated into law of parsimony, law of economy or law
of succinctness.
In science, Occam’s razor is usually used as a heuristic to guide scientists in the development of
theoretical models [1]. The principles of this theory, reworked for the model construction process,
are:
“Entia non sunt multiplicanda praeter necessitatem.”
Do not multiply the elements (of a model) more than necessary.
“Pluralitas non est ponenda sine necessitate.”
Do not consider the plurality if it is not necessary.
“Frustra fit per plura quod fieri potest per pauciora.”
It is useless to “do with more” what it is possible to “do with less”.
Hence, the razor is a principle suggesting that it is better to tend towards simpler theories when it is
possible to trade some simplicity for increased explanatory power. On the other hand, contrary to
the previous rationalization, the simplest theory available is sometimes a less accurate explanation.
Moreover, philosophers add that the exact meaning of "simplest" can be nuanced in the first place
[2].
In this context, Einstein himself expressed a certain caution when he formulated Einstein's
Constraint: "Everything should be kept as simple as possible, but no simpler". In fact, science has
shown repeatedly that future data often support more complex theories than existing data. In this
sense, according to Occam’s razor, science tends to prefer the simplest explanation that is consistent
with the data available at a given time. In any case, the general scientific principle is that theories (or
10 MODELING
models) of natural laws must be consistent with repeatable experimental observations. This ultimate
arbiter (selection criterion) rests upon the axioms mentioned above [3].
To sum up, the correct approach to a new model consists of a balance between excessive complexity
(implying a more difficult elaboration but higher accuracy) and exaggerate simplicity (implying a
faster elaboration but lower accuracy), according to experimental observations.
This characteristic originates a sort of intrinsic contradiction in the models utilization, which is usually
called modeling paradox.
In fact, a model is different from reality because of its own nature.
Einstein underlined the need to maintain a distance between mathematical models and modeled
reality: "As far as the laws of mathematics refer to reality, they are not certain; and as far as they are
certain, they do not refer to reality".
The modeling paradox has a fundamental implication, concerning the content of truth contained by
models themselves. The fact that a model is similar to reality only in relation to some aspects, but it
differs from reality because of infinite others, leads to two important considerations.
The first one is the possibility to model the same phenomenon in infinite ways, according to the
relevant proprieties and relations considered from time to time.
The second consideration is much harder to accept: a model cannot be evaluated according to a
truth criterion, namely it is impossible to find a perfect adherence between a model and modeled
reality.
These two statements raise the problem of a model evaluation of its adequacy and reliability. The
design of a series of experimentations is the only right way which makes it possible to verify the
correctness of the initial hypotheses and simplifications. The confirmation of the model is possible
only if the predictions of the model are confirmed by experimental results, according to certain
statistical tolerance; in other cases, it will be necessary to revise the initial hypotheses.
Among all the predictive models in the scientific field, the mathematical models are surely the most
significant ones (besides being the kind of models that will be used in the following Chapters), and it
is important to focus the attention particularly on them.
Mathematical models abstractly represent reality through a set of equations that links the physical
quantities involved in the problem.
The economist Malinvaud [4] provides the following definition: “A mathematical model is the formal
representation of ideas or knowledge about a phenomenon”.
This definition involves the three main features of a mathematical model, more precisely:
a mathematical model is a representation of a phenomenon where the logics of the process
are analyzed and described in an analytical way;
a mathematical model represents a phenomenon not in a conversational and qualitative
way, but trough a mathematical language which permits to get the quantitative aspects;
a direct link between reality and math is not obvious. Before the mathematical modeling, it is
always necessary to structure the ideas and knowledge concerning the phenomenon,
considering only those that are really significant for the model construction process as
relevant for mathematization.
Indeed, the difficulty to create a direct and univocal link between math and reality is probably the
trickiest aspect in mathematical model construction, due to many reasons.
First, reality is made of a very inextricable and complex tangle of phenomena, able to obstruct a
relatively simple and schematic description as the mathematical one. A mathematical model
perfectly adherent to reality would not be only too complicated, but also unnecessary, which is
Matteo Fasano
Generalized thermodynamic description of complex biological systems
11
exactly the opposite of the required formal representation. Hence it is necessary, thanks to a central
role of the model creator’s experience and perceptiveness, to choose what aspects are fundamental
and what are not to describe a phenomenon. In fact, the first step for the construction of a
mathematical model is to choose the phenomenon to describe and isolate it from all the others.
During this process, the importance and relevance of each phenomenon has to be correctly
evaluated, in order to avoid confusing primary aspects with secondary ones, and vice versa. The
following construction of the mathematical model will be dramatically influenced by this preliminary
step because, for example, the voluntary or involuntary omitting of important aspects of the
modeled phenomenon (e.g. a chemical reaction not detected, a biological species considered not
influencing the analyzed system, environmental conditions neglected and so on) could lead to a
wrong forecast of the model itself.
Secondly, the previous isolation process is not sufficient: in fact, the chosen phenomena do not
automatically contain the mathematical laws themselves. It is necessary to lean on experimental
observation of the analyzed phenomena, in order to gain an insight into the mathematical laws
which should drive them. The possible problems associated with experiments are that experiments
might be too expensive or too dangerous and the needed system for the experiment might not yet
exist.
For example, these situations occur when time scale of the system dynamics is not compatible with
that of the experimenter (e.g.: it takes millions of years to observe small changes in the
development of the universe).
Then, thanks to this mix between theoretical hypotheses and experimental validations, it is finally
possible to represent phenomena by equations and formulas, obtaining the mathematical model.
However, the definition of the analytical and mathematical structure of a model is not the last
analysis step. Particularly for engineering problems, it is often relatively easy to model a physical
problem mathematically: this can lead to exact solutions without using computers only in case of
simple systems. But, for many real problems, the analyzed systems are just too complex to be solved
with mathematics alone. Each small part can be solved simply, but a computer is needed to tie it all
together with a computational solution. In other words, in many cases the mathematical model of a
phenomenon could be very simple and adherent to reality. On the contrary, if its application to
complex cases involves insurmountable computational burden or inappropriate time to solution, the
approach (and the mathematical model) of the problem has to be changed completely, even if it
entails less adherence to reality.
12 MODELING
1.2. MODEL CLASSIFICATION
After the initial global description of the modeling issue, in this paragraph we address the problem of
model classification. It is possible to identify three classes, according to the field of application, the
means of representation and the aim of modeling.
First, models can be classified depending on the related field. This is surely the simplest but less
significant way of classification but, only for the record, the main areas of applicability are (in
chronological order of appearance):
Mathematics (e.g. from Egyptian numeral system to theorem of Pythagoras, from Fibonacci
series to Nash game theory);
Philosophy (e.g. from atomism of Democritus to rational methods of Descartes, from
Enlightenment rationalism to Wittgenstein analytic philosophy);
Mathematical logic (e.g. from Aristotle’s syllogisms to Bacon’s induction from Boolean logic
to Gödel's incompleteness theorems);
Physics (e.g. from Kepler's astronomical theorems to Newton laws of motion, from Einstein
relativity to quasi equilibrium thermodynamics);
Chemistry (e.g. from Boyle perfect gas law to Lavoisier mass conservation law, from
Mendeleev and Meyer table of elements to Fermi studies on nuclear chemistry);
Politics (e.g. various models of organization of society from Machiavelli to Hegel, from Weber
to Bobbio);
Biology (e.g. from Linnaean classification to Darwin evolution theory, from Watson and Crick
DNA sequencing to modern biotechnological models);
Economy (e.g from Smith liberal model to Marx community model, from Pareto law to Keynes
theories);
Social sciences (e.g. studies and modeling about human behaviors and interactions from
Freud psychology to Chomsky linguistics, from Rousseau sociology to Montessori pedagogy);
Software engineering (e.g from Turing test to Feynman quantum computer, from Google’s
algorithms even to the algorithms implemented in the following Chapters).
However, the list only deals with the most important fields of application: models are everywhere
and they are part of everybody’s life (e.g. a soccer pattern, the recipe for cheesecake, the weather
forecasts and so on). Thus explaining the importance of a good knowledge and familiarity with the
process of model construction and analysis.
The second way of model classification which will be treated is more significant. Models, in fact, can
be classified according to the means of representation.
Models always take origin from the mind of somebody who is trying to model a phenomenon or a
thought, but they can take different shapes. More precisely:
Mental models, which remain in the mind of the designer, helping him/her to manage
reactions and thoughts. For instance, an opinion like "a person is reliable" helps us to answer
questions about that person's behavior in various situations.
Verbal models, which are expressed in words. For example, the sentence "More accidents
will occur if the speed limit is increased" is an example of a verbal model.
Physical models. These are physical objects that mimic some properties of a real system, to
help us to answer questions about that system. For example, prototypes of buildings or
airplanes and so on.
Matteo Fasano
Generalized thermodynamic description of complex biological systems
13
Mathematical models, which are the description of a system where the relationships
between variables of the system are expressed in mathematical form.
Graphic models, which are 2D drawings or 3D digital representations of the reality.
In scientific and technical fields, the most useful models are the mathematical, graphic and physical
ones, because they are able to give a more quantitative description of phenomena.
Among these models, an important aspect is whether the model incorporates dynamic time-
dependent properties or it is static. A static model can be defined without involving time.
Just to give some examples, the CAD simulation of the functioning of an engine is a graphic dynamic
model; the architectural prototype of a skyscraper is a static physical model; the Newton’s law of
motion is a mathematical dynamic model.
After the introduction of informatics and algorithms for the solution of equations, mathematical
models can be classified as analytical or numerical ones. Analytical models are mathematical models
that have a closed form solution. Instead, numerical models use numerical schemes to obtain an
approximation of the models behavior over time; in this case, the mathematical solution is
represented by a generated table and/or a graph.
Finally, some phenomena in nature are conveniently described by stochastic processes and
probability distributions (e.g. noisy radio transmissions or atomic-level quantum physics). Such
models might be labeled stochastic or probability-based models, where the behavior can be
represented only in a statistic sense, whereas deterministic models are able to represent the
behavior without uncertainty. It is important to notice also that stochastic phenomena can be
modeled in a deterministic way as well.
In Figure 1.1 a schematic synthesis of the models classification by means is shown.
Figure 1.1: models classification according to the means of representation
The last way of model classification considers the aim of modeling. According to it, models are
divided into:
Descriptive models, which reproduce reality after a simplification process. These types of
models only fit the experimental data of a phenomenon without caring about the scope of
the models themselves. Then, with no interest, the model is built without trying to explain
the mechanism which the observed phenomenon is based on. This kind of approach is
usually called “Black Box”, because of the disinterest regarding internal laws and structures
of the considered phenomenon.
14 MODELING
Interpretative models, instead, are used to explain the behavior of the phenoMenon under
analysis. In order to achieve this target, interpretative models are based on theoretical
hypotheses and general laws advAnced by a scientist. Then, these kinds of models find the
internal structures justifying the external behavior of the phenomenon: it is somehow like
opening the “Black Box” to understand what iTs internal functioning is.
Predictive models, finally, aim at forecasting the future behavior of a phenomenon. They
refer to a certain time horizon, allowing the possibiliTy to make choices and variations.
According to previous metaphors, predictive models know what the structures inside the
“Black Box” are, using this knowledge not only to interprEt already existing phenomena, but
also to foresee others.