A B S T R A C T
Contact tracing has been proposed as one of the techniques that tries to
solve the problem of containing the spread of a pandemic by identifying
at risk individuals tracing their contacts and promptly suggesting them
to test for positivity. However, during the recent breakout of Covid-19 this
methodology has proven itself to be inconclusive, due to the vast majority
of individuals being asymptomatic. Most of the times individuals have
no way of knowing whether they are infectious if they do not show
symptoms related to the disease, thus increasing the chance of them
spreading the disease to their contacts. In this work, first we study a
framework that models social interactions between individuals using
temporal graphs and finally develop an algorithm that computes the
infectious paths on a given temporal graph, finding the h-hop risky
contacts of the individuals. Using this information we can ultimately
classify each node as sick or not sick depending on the amount of risky
direct and indirect contacts they had in the last 15 days. Lastly we
show how our methodology based on multihop contact tracing is able
to outperform other classification approaches by testing our algorithm
on synthetic and real temporal graphs of various size. In addition, we
provide an approximation algorithm in order to be able to compute the
h-hop risky contacts also on real temporal graphs, by exploiting sets
approximation and proving the capability of the approximated version
to outperform our exact version by maintaining the same classification
accuracy.
1
I N T R O D U C T I O N
Many ideas have been developed in order to contain the risk of propa-
gation of a pandemic, especially during the recent breakout of Covid-19.
Alongside practical solutions, i.e. social distancing, improved hygiene,
wearing masks and prevent people from having mass gatherings, there
have been some ideas to trace contacts between people, in order to
promptly quarantine people who have been in contact with infected
individuals. Traditional ideas for contact tracing during a pandemic have
their focus only on direct contacts between individuals. Intuitively, when
an infected individual enters in contact with another person, it is highly
advisable for this person to test for positivity, since he/she can become
infectious in turn. In theory, this process can surely improve the capability
of a Government to protect citizens, allowing quicker and more precise
identification of possible infectious individuals. On the other hand, the
fact that a significant portion of the infectious individuals appears to
remain asymptomatic while still being highly contagious, drastically in-
validates this idea of contact tracing considering only direct contacts. This
is due to the fact that if persona meets personb while being infectious
and asymptomatic, b is in danger and can easily contract the disease
without even realizing it. It is reasonable to think that the probability of
getting infected is highly correlated with the number of infected individ-
uals met. The main problem lies within the fact that there is no reliable
way of perfectly knowing how big this set of individuals could be in
every moment.
In this regard, multi-hop could definitely improve the capability of
identifying potential infectious individuals. For each day, we will assume
that two sets of positive and negative individuals in a given population
are given, and we will call these two sets respectively P
d
as positive
individuals at dayd andN
d
as negative individuals at dayd. These two
sets can be identified respectively as the positive tests and negative tests
13
14 introduction
over a given percentage of the population within a given dayd. We can
assume this since this is usually the way in which the Government tries to
capture reality and slow the spread of a disease, trying to identify infec-
tious individuals and quarantine them. It is mainly due to the following
two factors, the overall partial information regarding the spreading of the
disease within the population and the possibility of infected individuals
to be asymptomatic, that the direct contacts approach is still considered
to this day ineffective.
Suppose the following scenario: let a, b and c be three individuals
such that a ∈ P
d
. It is important to keep in mind that a can be either
symptomatic or asymptomatic. At timet on dayd
1
,a enters in contact
with b, which contracts the disease. During day d
2
, at a generic time
t
′
> t, b enters in contact with c. If b is asymptomatic, c has no way
of knowing that he is currently in danger and should test for positivity
as soon as possible. Instead, by exploring the multi-hop contacts of c,
we can correctly identify that the risk of being infected is also highly
correlated not only to the direct contacts, but also to the indirect contacts.
This way, we can state thatc is at risk becauseb was in contact witha at
timet, which occurred before the contact betweenb andc. It is important
to note that for this to work it is strictly required that b does not test
negative between the two contacts. In case that happens, it means thatb
will not convey any virus toc during their contact at timet
′
, invalidating
the path.
The idea here is to put some enhanced focus on the indirect contacts
for each specific individual in the last15 days, in order to estimate the
risk of being infected, even taking asymptomaticity into account. We will
do so by identifying potential initiators of what we will call infectious
paths that could create potential spread of the disease to other individuals.
Intuitively speaking, infectious paths are sequence of contacts starting
with an individual inP
d
. Since these paths effectively spread the disease
through the population, keeping track of the indirect contacts makes
sense and should be beneficial. Our model is based on the possibility
of identifying infectious paths, which are dangerous path that carries
the disease starting from a single infected node. On these paths, the
disease is spread through different intermediate nodes that most likely
spread the disease even before developing any symptoms, since usually
individuals do not show any evidence of the disease until many days
after the infection. All of this has the aim of improving the capability
of correctly identifying infectious individuals in order to isolate them
introduction 15
and stop them from spreading the disease in advance compared to only
isolate symptomatic individuals.
Formally, we define this concept of tracing the indirect contacts of
individuals as h-hop or multi-hop contact tracing, where h is the length
of the infectious path from the source individual to the subject of inter-
est. Hence, throughout this work, we ultimately focus on the following
question:
GivenP
d
andN
d
for eachd and givenh, how many
have been my h-hop risky contacts in the last 15 days?
(1.1)
There are different interpretations for differenth values:
• h=0 corresponds to a set of0-hop risky contacts which reflects the
result of a given person’s test: as we will see, this set is empty if a
person did not test positive in a specific day.
• h = 1 is interpreted using the standard contact tracing based on
direct contacts, where the set of1-hop risky contacts is formed by
direct contacts that tested positive in a specific day.
• h⩾2 is what we call h-hop contact tracing, which means that we add
to the already considered direct contacts further indirect contacts
occurred before the considered one. In this work we will show
that measuring these sets can drastically improve the capability
of the system of classifying individuals as infected even when
asymptomatic.
In order to answer question (1.1) we created a model that allows us to
simulate social interactions between individuals in the real world using
temporal graphs, where we see individuals as nodes and interactions
between individuals as temporal edges. This temporal graph is such that
individuals are connected by temporal edges which are labeled with the
time in which they meet, i.e. the time in which the contact takes place.
Based on this model, in this work we present an algorithm that computes
the infectious paths within a given temporal graphG. Infectious paths
correspond to peculiar paths on temporal graphs and they must satisfy
the following properties:
• The source of the path has been tested positive at a certain dayd.
• The time labels with form the path must be increasing, with the
label of the first edge being a time instant belonging to a day d
′
such thatd⩽d
′
.
16 introduction
• Between a contact and the subsequent, no more than15 days can
pass, otherwise the intermediate node conveying the disease has
enough time to heal, thus interrupting the path.
• Along the infectious path, no individual has been tested negative
between a contact and the next one.
By adopting this formalism, we call sources of infectious paths of length
h arriving in nodeu as h-hop risky contacts.
Exploiting the infectious paths we can build sets that will be called
infectious bubbles for each node inG, which correspond to a specific node’s
sets of h-hop risky contacts. Using these sets we will ultimately be able
to identify with high confidence which individual is sick at the end of a
predetermined analysed period of time. We will do so with the aid of two
different algorithm versions, an exact and an approximated one. The two
versions of our algorithm that will be presented compute the infectious
bubbles for each node in G in two different ways, while proving two
different aspects of this work:
• The exact version will prove the utility of this method, showing that
we can achieve better infectious individual discovery rate compared
to other approaches like contact tracing that takes into consideration
only direct contacts. This version will adopt dynamic programming
in order to dynamically grow the sets of the h-hop risky contacts at
each timet using the sets of h-hop risky contacts at timet−1 and
the new information that arrives at timet.
• The approximated version will prove the capability of extending this
approach to a massive population, showing that in theory we could
adopt this approach with a real temporal graph that approximates
the real world interaction between people. This version will be
based on probabilistic counting techniques in order to obtain a
tangible speed-up of the exact version, while maintaining the same
approach.
In this thesis, first we explore all the theoretical aspects required to
understand the concepts that we adopt and exploit during this work.
Within this chapter, the reader can find all the information that is needed
in order to understand what we present. Following, we fix all the new
ideas, concepts, definitions and utilities that we created to create our
model and algorithm. We then present the algorithm itself, explaining
inputs, outputs, structure and complexity, while showing the pseudo
introduction 17
code of both versions. After this we illustrate how we produced and eval-
uated the results required to measure how well the algorithm effectively
performed compared to other similar approaches. We then show some
of the improvements that we think might be helpful in order to enhance
this work. Lastly we draw our conclusions about this work.