Development of a human computer interface based on hand gestures

Gratis L'anteprima di questa tesi è scaricabile gratuitamente in formato PDF.
Per scaricare il file PDF è necessario essere iscritto a Tesionline. L'iscrizione non comporta alcun costo: effettua il Login o Registrati.

Mostra/Nascondi contenuto.

Chapter 1
Introduction
Since the early years of computers the keyboard and the mouse has been the
most popular way of human-computer interaction; however in recent years it
has been observed a new trend in searching for more natural and immersive
kind of interfaces, making users able to interact with their whole body or
just touching or moving input devices.
Touchscreen display is a well known technology, as the rst exemplary
were introduce in the market in the second half of the 1960s
1
but its mas-
sive consumer use started since the introduction by Apple Inc. of the
iPhone in 2007
2
; after this event, touchscreen devices became more and
more popular and other interesting application were launched in the market,
like Microsoft Surface from Microsoft Corporation, an interactive table
developed as a software and hardware combination of technology that allows
multiple users to manipulate digital content with gestures performed touch-
ing the table, and that can interface to other physical devices leant on its
plane. A revolution in the Human-Computer Interaction (HCI) eld were
introduced by Nintendo Co,Ltd. that launched the Wii in the late 2006
3
, a last generation gaming console with a set of controller devices that make
the user able to interact with the system with his body, thanks to motion
sensors and infrared transmitters and receivers that allow to estimate the
1
http://en.wikipedia.org/wiki/Touchscreen#History
2
iPhone is a new generation smartphone that presents a multitouch interface as main
interaction way. More information at http://en.wikipedia.org/wiki/IPhone
3
http://en.wikipedia.org/wiki/Wii
8
CHAPTER 1. INTRODUCTION 9
3-D position of the controller.
A further step in this direction was announced by Microsoft that in June
2009 revealed its Project Natal, whose aim is to enable users to interact
with the system in a natural way, without any kind of physical device or
colored marker, just with gestures made with the whole body and with vocal
commands
4
.
In this context, the analysis of video streams in order to infer information
about the captured scene, became an interesting activity as video capture de-
vices and other connected technologies like storage devices and the internet
access got more and more cheap in recent the years; the decreasing prices
enable the development of advanced interaction systems with low cost hard-
ware. In HCI eld, for example user’s hands position and conguration can
be an highly informative piece of knowledge to make the system able to re-
spond to specic conguration or to a certain gesture trajectory. However
implementing such an interacting method is more challenging then imple-
menting interaction by an ad-hoc input device; in the last case information
is directly provided from the device, while in the rst case no information
is provided about hands position or conguration. In order to estimate this
knowledge, a huge quantity of noisy data from image observations needs to
be analyzed, ltered and interpreted; in addition, device-dependent limits,
like sensor noise or poor quality images, can degenerate the usability of such
a system, without requiring the user to wear any colored marker like gloves.
The general problem addressed in this thesis is the discovery of data
patterns embedded in a bigger set of data; more specically there are mainly
two kind of searched patterns:
 specic trajectories performed by the user with his hands;
 specic hand positions, or congurations.
Both these problems require the previous localization of user’s hands, thus
the overall task involves the analysis of low level features as well as the
intepretation of higher level information. The goal of this thesis work is
to implement a human-computer interface based on hand gestures, using
4
http://en.wikipedia.org/wiki/Project_Natal
CHAPTER 1. INTRODUCTION 10
algorithms that represent the state of art and proposing new in the case of
need, and nally test the developed implementation; hands must be located
analyzing images taken from a webcam and without requiring the user to
wear any kind of device or colored marker.
The overall framework is composed of the following modules:
 hands conguration models learning : the system needs to be
trained on the congurations it must recognize;
 trajectory models learning: as for conguration, the system must
learn the trajectories that it is requested to spot;
 hands detection: the system must detect the location of all hand
instances that appears in the processed images;
 hand tracking: in case of multiple detection, each hand occurence
located in a frame needs to be correctly associated with one detected
in the next frame;
 nger tips detection : this module, given a hand image, locates the
ngertips;
 hand features extraction: given a ngertips map, it builds features
needed in the conguration matching process;
 hand conguration matching : detected hand conguration are eval-
uated against stored conguration models in order to get the best
match;
 trajectory features extraction: hands locations are used to build
features necessary to the matching and spotting modules;
 on-line trajectory matching (with pruning): the observed hand
trajectory is compared with those previously learned, in order to nd
the model that is the most similar;
 spotting the recognized trajectory: this module state if a known
trajectory, reported by the matching module, actually occured.
CHAPTER 1. INTRODUCTION 11
Figure 1.1: System activity diagram
In gure 1.1 it can be seen the activity diagram of the system. The sys-
tem takes in input a video stream that is processed according the stages
listed above. If a particular hand conguration is pointed out or a trajectory
is spotted, the system can raise an event that will be processed in a sec-
ond moment. The precise tracking of hands allows to connect some kind of
"analogic" commands, like moving mouse or dragging elements in a virtual
environment, in addition to "spotted" commands, like mouse click.
In order to discover the hand conguration, the system matches features
extracted from the on-line video stream with models, learned during an o-
line phase. To learn models, system is fed with hands images in which hands
CHAPTER 1. INTRODUCTION 12
assume the desired conguration and nger tips are depicted manually.
In a similar way, trajectories are matched with models learned o-line: here
models are learned from recorded video streams in which the user performs
a gesture trajectory wearing a colored glove against a neutral background
(neutral is relative to the color of the glove), enabling an easy detection of
the gesturing hand.
During the on-line input stream, user is not required to wear any kind
of marker to track hands; hands detection module searches for "skin that is
moving", while hand tracking module couples detection of a frame with them
of the next. Once a hand is segmented, ngertips detection module returns
possible locations of ngertips, used as features to assign a conguration la-
bel.
Locations of hands are known frame by frame, thus it is possible to ex-
tract position and motion and start the trajectory matching process; during
this phase, unlikely matching hypothesis are rejected by pruning classiers
learned in the o-line training; thanks to pruning, performance are enhanced
both in accuracy and speed. Finally spotting module states if a gesture have
been actually performed by the user and, if so, gesture class label, matching
cost, start frame and end frame are returned.
Once gestures labels are known (both trajectory gestures or hand cong-
uration gestures), it is straightforward to connect system directives or pack
informations in an event to build complex client-server architectures.
As an application example, they were used gestures to take control of a mul-
timedia player: the user must be able to choose next or previous media le
from a playlist, increase or decreace sounds volume, run along the le, play
or stop it and close the application. The multimedia player used is VLC
(VideoLan Client)
5
.
5
http://www.videolan.org/vlc/
Chapter 2
Hands tracking
As introduction to this chapter a general, and high level, formulation of the
problem of object tracking is provided:
Denition 1 given an ordered images input stream (frames) and a target
class of objects, it is asked to locate each istance of such an object frame by
frame and, for each detection of the target class in a given frame, to couple
this occurence with a corresponding one in the following frame, acquired an
evidence that they represent the same object istance.
From this denition, it is possible to distinguish two subproblems, and a
problem dimension for each of them:
- detection of objects in a single frame (spatial dimension);
- tracking of a single object along the entire input stream (temporal
dimension).
Processing an image in order to solve the spatial problem, requires the def-
inition of some features characterizing the object class it is asked to search
for; typical features used in such a task are color distribution, shapes, edges,
moving information and so on. These are low level kind of features, easier
to codify with a computer then higher level information, like the context
where the object can be found or its interactions with the rest of the world.
13
CHAPTER 2. HANDS TRACKING 14
The objects tracked in the context of this thesis work, are those that can be
classied as "human hands".
Human hand is a very articulated object with many degree of freedom,
that can assume a big number of shape congurations: some works [17], [24],
[5] models the hand in a 3-D euclidean space, and then estimates its posture
synthesizing the model and varying parameters until its 2-D projection and
the real hand image appears similar enough. However these approaces works
when the hand is already segmented, but due to the non rigid nature of hu-
man hand it is not possible to rely on shape features to ecently locate it
in a complex scene. Another feature commonly used in literature ([4], [26])
is skin color distribution, that it is also adopted in this work. However, skin
color detection methods are not precise enough to grant that only skin-made
objects will be located: these methods are very sensible to ambient light
variations and it is quite common that wooden doors, pink owers and other
similar skin colored objects will be included as well.
There is another feature that helps to choose between unanimated skin col-
ored objects and human hands (and faces): doors and owers usually appear
still, while hands quickly change their appearance; from this consideration,
follows that if only moving skin colored object are searched, then the detector
become more robust. In this way, typically also face will be deteted (despite
hands move usually faster than head) but, if necessary, its location can be
rejected during further processing.
In order to locate objects that quickly change their appearance (like
hands), in [26] it is described the idea of residue image. Such an image
is computed partitioning in blocks the current gray level valued image and
assigning to each block a scalar value, proportional to how dierent its area
appears in the current frame, compared to the appearance of the most similar
area in the next frame.
Once hands have been located in a single frame, the timing problem still
remains: it is necessary to track all movements of a single instance of a hand
along the whole video sequence, but skin colored moving objects detected in a
single frame can be more than one. Some methods are provided in literature
to properly couple detections of a frame with detections of the following one.
In [26] authors use a probabilistic method to nd the best current match,
CHAPTER 2. HANDS TRACKING 15
analyzing the history of hand locations, while in [4] authors rely on sucient
frame rate to suppose that consecutive hand locations will be very near one
each other, such that the best match is the nearest detection.
In this thesis it is implemented the approach given in [4], that has the
limit to eectively detect hands if it is guaranteed a very good skin detection;
this condition means that if other skin colored objects are visibles in the
background, the detection has an high probability to produce unsatisfaing
results.
The contributions of this thesis work is to extend the detection method
described in [4], removing from the scene those areas that appears very similar
from one frame to the next, using information given by residue images, and
then applying the skin detection algorithm only on this ltered result. In this
way the detection of the areas where hands (and faces) are located is more
robust even in presence of many skin colored objects in the background. A
problem introduced by this modication is analyzed in section 2.4 and a way
to handle it is proposed in the same section.
2.1 Residue image
As already mentioned, residue image, rst described in [26], reports informa-
tions about how much an object changes its appearance as the time ows.
In order to infer such kind of knowledge, it is necessary to compare at least
two consecutive frames, as appearance changes happen along the time axis.
Residue image is based on the key idea that due to the non-rigid nature of
hand, its appearance changes more frequently in time, compared to that of
others objects, hence it is possible to exploit this feature and search for re-
gions in a frame that doesn’t have good matches in the next one. To nd best
matches among blocks partitions of two sequential frames it can be used the
block matching method (applied also to compute the optical ow): for each
couple of consecutive images, the rst one is partitioned into several blocks
and then, the best match is searched in the next frame, by translating the
current block within a search area and selecting the region that minimizes
the appearance dierence, according to a distance measure. The algorithm
returns a matrix contaning for each block, its motion vector. Once "block
CHAPTER 2. HANDS TRACKING 16
ow" information is obtained, the residue R
B
is computed for each block B
of dimension m n and its match M
B
of the same size as
R
B
=jB M
B
j (2.1)
whereB andM
B
are the average values of pixels inB andM
B
, respectively;
in other words this is the absolute dierence between the average value of
gray level pixels of current block and the average value of pixels of the area
that as been matched in the next frame. Because of the non-rigidity of hands,
residues tends to have higher values in hand regions.
Residue image is a good choice to nd non rigid moving objects, because
it returns a lled area that can be easily turned into a blob, while typical
methods to estimate motion, like frame dierencing or optical ow computa-
tion, tends to have the highest values distributed along edges. An example
of residue image can be found in gure 2.1.
2.2 Skin detection
The problem of skin detection is very challenging; to identify skin, the most
natural feature that can be exploited is color, that varies among a wide
range of values and has the big disadvantage to be very sensible to light
variations. In other words, the appereance of skin is not the same under
dierent illuminants.
In literature, it is possible to distinguish between two dierent approaches
to this problem:
- proposing color value thresholds;
- analyzing the skin color distribution.
For example some works suggest possible bounds, that depends on the color
space adopted, in which the colors assumed by skin are constrained; in [9] ge-
netic algorithms are used to estimate bounds for seven dierent color spaces.
Starting from proposed bounding models found in literature, they re-estimate
thresholds, in order to achieve a precision, recall or trade-o strategy using
fitness =
recall precision
  recall +  precision
(2.2)

Anteprima dalla tesi:

Development of a human computer interface based on hand gestures

CONSULTA INTEGRALMENTE QUESTA TESI

La consultazione è esclusivamente in formato digitale .PDF

Acquista

Informazioni tesi

Autore:	Luca Piccinelli
Tipo:	Laurea II ciclo (magistrale o specialistica)
Anno:	2008-09
Università:	Università degli Studi di Milano - Bicocca
Facoltà:	Scienze Matematiche, Fisiche e Naturali
Corso:	Informatica
Lingua:	Inglese
Num. pagine:	108

FAQ

Come consultare una tesi

Per consultare la tesi è necessario essere registrati e acquistare la consultazione integrale del file, al costo di 29,89€.
Il pagamento può essere effettuato tramite carta di credito/carta prepagata, PayPal, bonifico bancario.
Confermato il pagamento si potrà consultare i file esclusivamente in formato .PDF accedendo alla propria Home Personale. Si potrà quindi procedere a salvare o stampare il file.
Maggiori informazioni

Perché consultare una tesi?

Ingiustamente snobbata durante le ricerche bibliografiche, una tesi di laurea si rivela decisamente utile:

perché affronta un singolo argomento in modo sintetico e specifico come altri testi non fanno;
perché è un lavoro originale che si basa su una ricerca bibliografica accurata;
perché, a differenza di altri materiali che puoi reperire online, una tesi di laurea è stata verificata da un docente universitario e dalla commissione in sede d'esame. La nostra redazione inoltre controlla prima della pubblicazione la completezza dei materiali e, dal 2009, anche l'originalità della tesi attraverso il software antiplagio Compilatio.net.

Clausole di consultazione

L'utilizzo della consultazione integrale della tesi da parte dell'Utente che ne acquista il diritto è da considerarsi esclusivamente privato.
Nel caso in cui l’utente che consulta la tesi volesse citarne alcune parti, dovrà inserire correttamente la fonte, come si cita un qualsiasi altro testo di riferimento bibliografico.
L'Utente è l'unico ed esclusivo responsabile del materiale di cui acquista il diritto alla consultazione. Si impegna a non divulgare a mezzo stampa, editoria in genere, televisione, radio, Internet e/o qualsiasi altro mezzo divulgativo esistente o che venisse inventato, il contenuto della tesi che consulta o stralci della medesima. Verrà perseguito legalmente nel caso di riproduzione totale e/o parziale su qualsiasi mezzo e/o su qualsiasi supporto, nel caso di divulgazione nonché nel caso di ricavo economico derivante dallo sfruttamento del diritto acquisito.

Vuoi tradurre questa tesi?

L'obiettivo di Tesionline è quello di rendere accessibile a una platea il più possibile vasta il patrimonio di cultura e conoscenza contenuto nelle tesi.
Per raggiungerlo, è fondamentale superare la barriera rappresentata dalla lingua. Ecco perché cerchiamo persone disponibili ad effettuare la traduzione delle tesi pubblicate nel nostro sito.
Per tradurre questa tesi clicca qui »
Scopri come funziona »

DUBBI? Contattaci

Contatta la redazione a
[email protected]

Ci trovi su Skype (redazione_tesi)
dalle 9:00 alle 13:00

Oppure vieni a trovarci su

Parole chiave

computer vision

gesture recognition

hand tracking

hci

human computer interface

interazione uomo-macchina

object tracking

optical flow

riconoscimento gesti

skin detection

tracciamento mani

visione artificiale

Tesi correlate

Non hai trovato quello che cercavi?

Abbiamo più di 45.000 Tesi di Laurea: cerca nel nostro database

Oppure consulta la sezione dedicata ad appunti universitari selezionati e pubblicati dalla nostra redazione

Ottimizza la tua ricerca:

individua con precisione le parole chiave specifiche della tua ricerca
elimina i termini non significativi (aggettivi, articoli, avverbi...)
se non hai risultati amplia la ricerca con termini via via più generici (ad esempio da "anziano oncologico" a "paziente oncologico")
utilizza la ricerca avanzata
utilizza gli operatori booleani (and, or, "")

Idee per la tesi?

Scopri le migliori tesi scelte da noi sugli argomenti recenti

Come si scrive una tesi di laurea?

A quale cattedra chiedere la tesi? Quale sarà il docente più disponibile? Quale l'argomento più interessante per me? ...e quale quello più interessante per il mondo del lavoro?

Scarica gratuitamente la nostra guida "Come si scrive una tesi di laurea" e iscriviti alla newsletter per ricevere consigli e materiale utile.

Leggi la guida

La tesi l'ho già scritta,
ora cosa ne faccio?

La tua tesi ti ha aiutato ad ottenere quel sudato titolo di studio, ma può darti molto di più: ti differenzia dai tuoi colleghi universitari, mostra i tuoi interessi ed è un lavoro di ricerca unico, che può essere utile anche ad altri.

Il nostro consiglio è di non sprecare tutto questo lavoro:

È ora di pubblicare la tesi

Scopri di più

Development of a human computer interface based on hand gestures

Anteprima dalla tesi:

Development of a human computer interface based on hand gestures

CONSULTA INTEGRALMENTE QUESTA TESI

La consultazione è esclusivamente in formato digitale .PDF

Informazioni tesi

FAQ

Come consultare una tesi

Perché consultare una tesi?

Clausole di consultazione

Vuoi tradurre questa tesi?

DUBBI? Contattaci

Parole chiave

Tesi correlate

Non hai trovato quello che cercavi?

Ottimizza la tua ricerca:

Idee per la tesi?

Come si scrive una tesi di laurea?

La tesi l'ho già scritta,ora cosa ne faccio?

Login

La tesi l'ho già scritta,
ora cosa ne faccio?