A Framework for Sign Language Recognition using Support Vector Machines and Active Learning for Skin Segmentation and Boosted Temporal Sub-units

Gratis L'anteprima di questa tesi è scaricabile gratuitamente in formato PDF.
Per scaricare il file PDF è necessario essere iscritto a Tesionline. L'iscrizione non comporta alcun costo: effettua il Login o Registrati.

Mostra/Nascondi contenuto.

model and segment these subunits, then try to learn the informative combinations
of subunits/features using a boosting framework. Our results reached above 90%
recognition rate using very few training samples.
8
Chapter 1
Introduction
1.1 Introduction
In daily life, human beings communicate with each other and interact with computers
using gestures. As a kind of gesture, sign language (SL) is the primary communi-
cation media for deaf people. Everyday, millions of deaf people all over the world
are using SL to get useful information and exchange ideas. Therefore, in recent
years, SL recognition has gained a lot of attention and a variety of solutions have
been proposed. Sign gestures might be treated as a composition of hand shape, mo-
tion, position, and facial expression. Thus, SL recognition requires knowledge of all
of these. Generally, a SL recognition system should contain three major modules:
skin segmentation and tracking (SST), feature extraction, and recognition. The ﬁrst
module is to acquire and locate hands and face across the video frames. The purpose
of the second module is to prepare useful features for classiﬁcation.
Fig. 1.1 demonstrates a general system architecture overview for a SLR system.
Based on segmented hands and face, we can extract the hand shape, orientation, and
facial expression features. Through analyzing the tracked skin objects, we obtain the
hand motion trajectories, hand position, and lip movement. Finally, classiﬁers are
trained to recognize the signs.
9
Figure 1.1: System architecture
1.2 Device vs Vision approach to SLR
According to the means of capturing features, SL recognition techniques can be clas-
siﬁed into two groups: glove-based and vision-based. The former group of approaches
requires users to wear data or colour gloves. The glove enables the system to avoid or
simplify the segmentation and tracking task. However, its disadvantages are appar-
ent. On the one hand, users have to carry a hardware device, which makes them feel
uncomfortable. Sometimes, they cannot perform accurate gestures with the gloves.
On the other hand, the glove-based methods might lose the facial expression infor-
mation, which is very important for the SL recognition as well.
In comparison, the vision-based methods rely on computer vision techniques without
needing any gloves, which is more natural for users. However, one diﬃculty is how
to accurately segment and track hands and face. SST plays an important role in
vision-based SL recognition. Only after skin objects have been acquired, useful de-
scriptions such as hand shape, motion, and facial expression, and further recognition
are possible. In other words, SST is the cornerstone of SL recognition. In order to
produce high quality SST, two techniques must be developed: a powerful skin colour
model and a robust tracker. The skin colour model oﬀers an eﬀective way to detect
and segment skin pixels. It should be able to handle illumination and human skin
variations. The tracker is responsible for locating skin objects. For SL recognition,
it should be capable of predicting occlusions that frequently happen in real world SL
10
conversations. The purpose of occlusion detection is to keep track of the status of
the occluded parts, which helps to reduce the search space in the recognition phase.
1.3 Overview of the proposed SLR system
This work aims to provide an SST framework for SL recognition, then given that we
can acquire the required useful features we propose a novel solution for SLR based
on boosting SL subunits. To achieve precise skin segmentation, we introduce a novel
skin colour model integrating SVM active learning and region segmentation. This
model consists of two stages: a training stage and segmentation stage. In the train-
ing stage, ﬁrst, for the given gesture video, a generic skin colour model is applied
to the ﬁrst few frames, which obtains the initial skin areas. Afterwards, a binary
classiﬁer based on SVM active learning is trained using obtained initial skin areas
as the training set. In the segmentation stage, the SVM classiﬁer is incorporated
with the region information to yield the ﬁnal skin colour pixels. The contribution
that distinguishes the proposed model from other existing skin colour algorithms is
twofold. First, the SVM classiﬁer is trained using the training data automatically
collected from the ﬁrst several video frames, which does not need human labour to
construct the training set. More importantly, the training is performed for every
video sequence. It is adaptive to diﬀerent human skin colours and lighting condi-
tions. The skin colour model can also be updated with the help of tracking to deal
with illumination variation. Second, region information is adopted to reduce noise
and illumination variation. Moreover, active learning is employed to select the most
informative training subset for the SVM, which leads to fast convergence and better
performance.
As for the tracker, we extend the previous work of our group in three ways. First, in
the previous work they used the colour glove to avoid the segmentation issue. In this
work, we are more interested in improving SL recognition in natural conversation.
Three features, skin color, motion, and position, are fused to perform accurate skin
object segmentation. Additionally, the previous work tracked two hands wearing
11
color glove only. However, the proposed work can segment and track two hands
and face. The obtained face information deﬁnitely could facilitate the recognition.
Second, we apply a Kalman ﬁlter (KF) to predict occlusions in the same way as the
previous work. Nevertheless, our KF is based on the skin colour instead of colour
glove. Third, in the proposed work, tracking and segmentation tasks are approached
as one uniﬁed problem where tracking helps to reduce the search space used in
segmentation, and good segmentation helps to accurately enhance the tracking per-
formance.
Despite the great deal of eﬀort in SLR so far, most existing systems can achieve good
performance only with small vocabularies or gesture datasets. Increasing vocabulary
inevitably incurs many diﬃculties for training and recognition, such as the large
size of required training set, signer variation and so on. Therefore, to reduce these
problems, some researchers have proposed to decompose the sign into subunits. In
contrast with traditional systems, this idea has the following advantages. First, the
number of subunits is much smaller than the number of signs, which leads to a small
sample size for training and small search space for recognition. Second, subunits
build a bridge between low-level hand motion and high-level semantic SL under-
standing. In general, a subunit is considered to be the smallest contrastive unit in a
language in the ﬁeld of linguistics. A number of researchers have provided evidence
that signs can be broken down into elementary units. However, there is no generally
accepted conclusion yet about how to model and segment subunits in the computer
vision ﬁeld.
This work investigates the detection of subunits from the viewpoint of human motion
characteristics. We model the subunit as a continuous hand action in time and space.
It is a motion pattern that covers a sequence of consecutive frames with interrelated
spatio-temporal features. In terms of the modelling, we then integrate hand speed
and trajectory to locate subunit boundaries. The contribution of our work lies in
three points. First, our algorithm is eﬀective without needing any prior knowledge
like the number of subunits within one sign and the type of sign. Second, the tra-
jectory of hand motion is combined so that the algorithm does not rely on clear
12
pauses as in some previous related work. Finally, because of the use of an adaptive
threshold in motion discontinuity detection and reﬁnement by temporal clustering,
our method is more robust to noise and signer variation.
After segmenting the SL subunits, we attempt to develop an eﬀective SLR system
using the AdaBoost algorithm which tries to learn informative subunit and fea-
ture combinations needed to achieve good classiﬁcation performance. To our best
knowledge, very little work has been done using Adaboost in SLR. We present two
variations for learning boosted subunits. In the ﬁrst case, we train the sign classes
independently, and in the second case, we train the classes jointly, which permits the
various classes to share the weak classiﬁers to increase the overall performance.
The presented work enables us to eﬃciently recognize SL with a large vocabulary
using a small training dataset. One important advantage of our algorithm is that
it is inspired by human signing behaviour and recognition ability so it can work in
a manner analogous to humans. Experiments on real-world signing videos and the
comparison with classical HMM-based weak classiﬁers demonstrate the superiority
of the proposed work.
In this thesis, we aimed to provide new diﬀerent techniques that can be applied
in SLR applications. Our goal was to contribute to research in skin segmentation,
hand and face tracking, modelling and recognizing signs eﬃciently based on human
behaviour in performing and recognizing signs using informative subunits of the signs.
1.4 Overview of the Thesis
The next chapter introduces the reader to the literature review of the diﬀerent SLR
systems proposed by diﬀerent research groups to solve the problem of SLR.
Chapter 3 gives a review of current skin segmentation techniques and discusses
our proposed skin segmentation algorithm with various evaluation results.
Chapter 4 introduces our proposed SST system and provides some experimental
results for skin segmentation and tracking.
Chapter 5 introduces the subunit modelling and segmentation algorithm and ends
13
with some evaluation experiments.
Chapter 6 introduces our SLR system based on learning boosted subunits and
presents the experimental results of the classiﬁcation.
Chapter 7 concludes with a summary, and gives some future work directions.
14
Chapter 2
Sign Language Recognition
Literature review
2.1 Introduction
In taxonomies of communicative hand/arm gestures, Sign Language (SL) is often
considered as the most structured form of gesture, while gestures that accompany
verbal discourse are described as the least standardized. SL communication also in-
volves non-manual signals (NMS) through facial expressions, head movements, body
postures and torso movements [Ong and Ranganath 05].
SLR therefore requires observing these features simultaneously together with their
synchronization, and information integration. As a result, SLR is a complex task
and understanding it involves great eﬀorts in collaborative research in machine
analysis and understanding of human action and behaviour; for example, face and
facial expression recognition [Kong et al. 04, Pantic and Rothkrantz 00], tracking
and human motion analysis [Gavrila 99, Wang et al. 03], and gesture recognition
[Pavlovic et al. 97].
As non-SL gestures often consist of small limited vocabularies, they are not a useful
benchmark to evaluate gesture recognition systems. However, SL on the other hand
can oﬀer a good benchmark to evaluate diﬀerent gesture recognition systems because
it consists of large and well-deﬁned vocabularies, which can be hard to disambiguate
15
by diﬀerent systems.
In real life, we can imagine many diﬀerent useful applications for SLR such as:
• sign-to-text/speech translation system or dialog systems for use in speciﬁc pub-
lic domains such as airports, post oﬃces, or hospitals
[McGuire et al. 04, Akyol and Canzler 02].
• In video communication between deaf people, instead of sending live videos,
SLR can help to translate the video to notations which are transmitted and
then animated at the other end to save bandwidth [Kennaway 03].
• SLR can help in annotating sign videos [Koizumi et al. 02] for linguistic anal-
ysis to save a lot of human labour manually in ground truthing the videos.
SL gesture data is mainly acquired using cameras (vision-based) or sensor devices
(glove-based) [Sturman and Zeltzer 94]. We are interested here in the vision-based
approach as the glove-based approach has the limitation of being an unnatural way of
performing signs. However, it can simplify a lot the tasks of segmentation (especially
in the presence of occlusions) and tracking. But it ignores the fact that we need
the facial expression as an important feature. In the next sections, we will try
to summarize the related work done by diﬀerent research groups in SLR. We will
cover the three main tasks of hand detection and tracking, feature extraction and
classiﬁcation.
2.2 Hand detection and tracking
In almost all SLR systems, the hand(s) must be detected in the image sequence, and
this is usually based on features like colour, motion, and/or edge. The colour cue is
used by skin colour detection or using colour gloves such as in [Sweeney and Downton 96,
Sutherland 96, Bauer and Kraiss 02, Assan and Grobel 97, Bauer and Kraiss 01].
When skin colour is used, the user is usually required to wear long sleeves to avoid
the skin colour of the arm area. Skin colour was combined with a motion cue in
16
[Akyol and Alvarado 01, Imagawa and Igi 98, Yang et al. 02] and with edge infor-
mation in [Terrillon et al. 02]. Diﬀerent assumptions were used to distinguish the
hands from the face such as that the head is relatively static compared to the hands
[Akyol and Alvarado 01, Imagawa and Igi 98] or that the head is bigger than the
hands [Yang et al. 02].
A common requirement for the motion cue, is that the hand must be continuously
moving as in [Huang and Jeng 01] where the hand was detected by logically AND-
ing diﬀerence images with edge maps and skin-color regions. In [Cui and Weng 00,
Cui and Weng 99] a hierarchical nearest neighbour decision rule was used to map
partial views of the hand to previously learned hand contours to obtain an outline
of the hand.
In [Huang and Huang 98] the hands were detected assuming that it is the only object
moving against a stationary background and that the head is relatively stationary.
In [Ong and Bowden 04] they used a boosted cascade of classiﬁers to detect hand
shapes, where dark backgrounds were used and signers were asked to wear long-
sleeved dark clothing. Other related work also tried to localize body parts such as
the body torso [Bauer and Kraiss 02, Assan and Grobel 97], or elbow and shoulders
[Hienz et al. 96] along with the hands and face, based on the body geometry and
colour cues. This helps to reference the position and movement of the hands to the
signer's body.
Hand tracking can be done either in 2D or 3D. In 2D tracking, approaches can be
classiﬁed to boundary-based [Huang and Huang 98, Cui and Weng 00], view-based
[Huang and Jeng 01], blob-based [Tanibata et al. 02, Imagawa and Igi 98], and match-
ing motion regions [Yang et al. 02]. One of the hard problems in tracking is occlu-
sion. Generally speaking, in most systems that are based on skin colour, occlusion
handling is poor and not satisfactory. Some systems try to predict the hand location
based on the model dynamics and previous frame positions with the assumption of
small constant hand motion [Starner et al. 98, Imagawa and Igi 98].
In [Starner et al. 98] they subtracted the face region from the merged face/hand
blob, but unfortunately this method only can handle small overlaps. In [Imagawa 00]
17
they applied a sliding observation window over the merged blob of face/hand and
the likelihood of the window subimage was calculated to classify it to one of the
possible hand shape classes. The overlapping hands and face were distinguished in
[Tanibata et al. 02] by using hand and face texture templates. This method is not
robust to change in hand shape, face orientation, or large change in facial expres-
sion.
Another interesting approach that does not track the hands and face separately
[Zieren et al. 02, Sherrah 00], but rather applies probabilistic reasoning (such as
heuristic rules [Zieren et al. 02] or Bayesian networks [Sherrah 00]) for simultane-
ous assignment of labels to the possible hand/face regions, assuming that the skin
blobs can only be assigned to the hands, thus not allowing for other skin regions
in the background. This allows more robust tracking that can deal with high over-
lapping and fast hand movement, along with complex hand interactions. Multiple
features were used such as motion, colour, orientation, size and shape of blobs, dis-
tance relative to other body parts, and Kalman ﬁlter prediction.
In [Assan and Grobel 97, Bauer and Kraiss 02, Huang and Huang 98] uniform back-
grounds were used to simplify the constraints. However, a few systems such as in
[Chen et al. 03], allow complex cluttered background that include moving objects
and apply background subtraction to extract the foreground under the assumption
that the hand is constantly moving. In contrast to the above approaches, there are
some systems that use 3D models [Vogler and Metaxas 97, Downton and Drouet 92]
by using multiple cameras to estimate the body parts and/or avoid occlusions but
of course with a great computational cost.
As skin segmentation is one of the main research areas that we will address in this
work, the next section reviews the major techniques used to detect skin pixels in
images or videos.
2.2.1 Skin segmentation review
In general, skin detection methods [Vezhnevets et al. 03] can be classiﬁed into two
groups. Pixel-based methods that classify each pixel as skin or non-skin indepen-
18

Anteprima dalla tesi:

A Framework for Sign Language Recognition using Support Vector Machines and Active Learning for Skin Segmentation and Boosted Temporal Sub-units

CONSULTA INTEGRALMENTE QUESTA TESI

La consultazione è esclusivamente in formato digitale .PDF

Acquista

Informazioni tesi

Autore:	George Awad
Tipo:	International thesis/dissertation
Anno:	2007
Università:	Dublin City University
Corso:	Computing
Lingua:
Num. pagine:	163

FAQ

Come consultare una tesi

Per consultare la tesi è necessario essere registrati e acquistare la consultazione integrale del file, al costo di 29,89€.
Il pagamento può essere effettuato tramite carta di credito/carta prepagata, PayPal, bonifico bancario.
Confermato il pagamento si potrà consultare i file esclusivamente in formato .PDF accedendo alla propria Home Personale. Si potrà quindi procedere a salvare o stampare il file.
Maggiori informazioni

Perché consultare una tesi?

Ingiustamente snobbata durante le ricerche bibliografiche, una tesi di laurea si rivela decisamente utile:

perché affronta un singolo argomento in modo sintetico e specifico come altri testi non fanno;
perché è un lavoro originale che si basa su una ricerca bibliografica accurata;
perché, a differenza di altri materiali che puoi reperire online, una tesi di laurea è stata verificata da un docente universitario e dalla commissione in sede d'esame. La nostra redazione inoltre controlla prima della pubblicazione la completezza dei materiali e, dal 2009, anche l'originalità della tesi attraverso il software antiplagio Compilatio.net.

Clausole di consultazione

L'utilizzo della consultazione integrale della tesi da parte dell'Utente che ne acquista il diritto è da considerarsi esclusivamente privato.
Nel caso in cui l’utente che consulta la tesi volesse citarne alcune parti, dovrà inserire correttamente la fonte, come si cita un qualsiasi altro testo di riferimento bibliografico.
L'Utente è l'unico ed esclusivo responsabile del materiale di cui acquista il diritto alla consultazione. Si impegna a non divulgare a mezzo stampa, editoria in genere, televisione, radio, Internet e/o qualsiasi altro mezzo divulgativo esistente o che venisse inventato, il contenuto della tesi che consulta o stralci della medesima. Verrà perseguito legalmente nel caso di riproduzione totale e/o parziale su qualsiasi mezzo e/o su qualsiasi supporto, nel caso di divulgazione nonché nel caso di ricavo economico derivante dallo sfruttamento del diritto acquisito.

Vuoi tradurre questa tesi?

L'obiettivo di Tesionline è quello di rendere accessibile a una platea il più possibile vasta il patrimonio di cultura e conoscenza contenuto nelle tesi.
Per raggiungerlo, è fondamentale superare la barriera rappresentata dalla lingua. Ecco perché cerchiamo persone disponibili ad effettuare la traduzione delle tesi pubblicate nel nostro sito.
Per tradurre questa tesi clicca qui »
Scopri come funziona »

DUBBI? Contattaci

Contatta la redazione a
[email protected]

Ci trovi su Skype (redazione_tesi)
dalle 9:00 alle 13:00

Oppure vieni a trovarci su

Parole chiave

adaboost learning

face and hand tracking

sign language recognition

skin segmentation

subunit segmentation

Non hai trovato quello che cercavi?

Abbiamo più di 45.000 Tesi di Laurea: cerca nel nostro database

Oppure consulta la sezione dedicata ad appunti universitari selezionati e pubblicati dalla nostra redazione

Ottimizza la tua ricerca:

individua con precisione le parole chiave specifiche della tua ricerca
elimina i termini non significativi (aggettivi, articoli, avverbi...)
se non hai risultati amplia la ricerca con termini via via più generici (ad esempio da "anziano oncologico" a "paziente oncologico")
utilizza la ricerca avanzata
utilizza gli operatori booleani (and, or, "")

Idee per la tesi?

Scopri le migliori tesi scelte da noi sugli argomenti recenti

Come si scrive una tesi di laurea?

A quale cattedra chiedere la tesi? Quale sarà il docente più disponibile? Quale l'argomento più interessante per me? ...e quale quello più interessante per il mondo del lavoro?

Scarica gratuitamente la nostra guida "Come si scrive una tesi di laurea" e iscriviti alla newsletter per ricevere consigli e materiale utile.

Leggi la guida

La tesi l'ho già scritta,
ora cosa ne faccio?

La tua tesi ti ha aiutato ad ottenere quel sudato titolo di studio, ma può darti molto di più: ti differenzia dai tuoi colleghi universitari, mostra i tuoi interessi ed è un lavoro di ricerca unico, che può essere utile anche ad altri.

Il nostro consiglio è di non sprecare tutto questo lavoro:

È ora di pubblicare la tesi

Scopri di più

A Framework for Sign Language Recognition using Support Vector Machines and Active Learning for Skin Segmentation and Boosted Temporal Sub-units

Anteprima dalla tesi:

A Framework for Sign Language Recognition using Support Vector Machines and Active Learning for Skin Segmentation and Boosted Temporal Sub-units

CONSULTA INTEGRALMENTE QUESTA TESI

La consultazione è esclusivamente in formato digitale .PDF

Informazioni tesi

FAQ

Come consultare una tesi

Perché consultare una tesi?

Clausole di consultazione

Vuoi tradurre questa tesi?

DUBBI? Contattaci

Parole chiave

Non hai trovato quello che cercavi?

Ottimizza la tua ricerca:

Idee per la tesi?

Come si scrive una tesi di laurea?

La tesi l'ho già scritta,ora cosa ne faccio?

Login

La tesi l'ho già scritta,
ora cosa ne faccio?