Moreover there is a higher statistical correlation between the data and we can use many different techniques to
achieve our results.
1.1 BACKGROUND AND RELATED WORK
To obtain a general model for a face several approaches have been attempted. The first attempt was made over
a hundred years ago by the British photographer Galton (1878, 1979). He combined different faces
photographically using multiple exposure of film where he manipulated ( as we ll see later we can say he
warped the images ) the orientation and size of component photographs so that the centre of the left and right
eyes were constant across the set of faces. Galton’s technique has been replicated digitally (Langlois and
Roggman 1990; Burson 1992). These digital versions align component images on only two image points (the
eye centres). Galton tried to show that a set of faces of criminals shared the same characteristics and that a set
of faces of respectable citizens had common features that differentiated them from the criminals. Obviously
he could not prove his point as physiognomic has proved to be unreliable but his working methodology was
really clever and his approach has been used broadly since.
Bledsoe in 1966 tries the first attempt to model a face space with a system that uses landmarks at crucial points
of the face and then tries to extract the model by looking at the normalised distances and ratios between those
points. This is quite a primitive approach and it doesn t give a model that can be compared with the face
complexity
Carey & Diamond in 1977 try to decompose the face in its characteristic features using a semiautomated
technique where the operator is asked to mark the relevant points of the faces and then the computer would
define a model by analysing the relationships among those features. This proves to be a fragile system that is
strongly affected by the operator.
After these early approaches a lot of work has been done to create semiautomated systems based on a few
landmarks where the main differences were focused on the functions and the algorithm used to create a model
from those few points and the histograms of the faces.
There are few attempts to produce a fully automated system, this is probably due to the intrinsic complexity of
the task. The main one is from Kanade in 1973 where he uses a multiple set of statistical analysis to get all the
parameters he needs from a single face and then he compares them with other faces to obtain a general model.
This approach is interesting in its effort to remove any human and subjective interference ( the landmarking of
the images ) and replacing it with several statistical functions. Unfortunately the system appears not to be really
robust and is highly subject to noise and any sort of variation in the lighting conditions, pose, gender etc, so it s
not really useful in real life applications.
The eigen technique for face analysis ( 1 ) has been broadly used since its publication, the approach is robust,
quite simple and gives the possibility to achieve many goals with the same piece of code.
But there are some limitations due to the assumption that all the images are perfectly aligned. This problem has
been exploited in various ways, the most obvious is an accurate positioning of the images ( 1).
In ( 4 ) they use a single flexible model ( non linear point distribution model ) for the eigen faces to analyse the
shape and a different models ( with a different analysis associated ) that isolates the lighting changes.
In ( 5) there is an approach that is quite similar to the one we re using for this project as they use a warping
technique but it’s executed after a generation of a mean shape model for all the faces. They deform all the faces,
with the warping technique, using the mean shape face they generated.
1.2 OVERVIEW OF THIS WORK
The aim of this work is to create a robust and flexible model of face images. The approach is based on a
statistical analysis of the face space known as eigen face technique. This technique finds and codes the main
sources of variation through the face space.
We want to improve the performance of the eigen faces by exploring the benefits of the introduction of warping
as a pre-processing technique to improve the result of the eigen faces statistical analysis; Therefore we apply the
Douglas Smythe ( 1990 ) algorithm to warp a training set of twenty one colour images as a pre-processing
technique of the eigen-faces analysis.
This approach is motivated by the intrinsic nature of the eigen-faces where all the important features of the faces
are supposed to be aligned. In real life this is unlikely to happen and the result is blurred. Although, if we apply
a warping technique to all the images in our training set and we warp them to any arbitrary image within this set
(before performing the eigen faces), the images will have exactly the same features in the same spatial
coordinates.
This method can be divided into four logical steps:
1) We deform all the images
To do that we choose a training set of twenty one images and we record the geometrical locations of the main
features in each face by positioning a set of landmarks on each image. Then we apply the warping technique to
align all the marked features in the same geometrical location.
2) We calculate and code the main variances of our training set by performing a Principal Component analysis
and the eigen face technique.
3) We reconstruct one of our starting images using the eigen faces.
4) We invert the geometrical deformation introduced with the warping before the eigen faces.
2. WARPING
The term image warping means a generic geometric transformation of a source image into a new one. This
transformation can be achieved by redefining the geometric relationships between points of the image.
At the present time we are used to associating this word with high distortions in the source image, but a warp
can be something as simple as a rotation or a translation.
The source image consists of a set of known pixels and the final image will be the result of this starting set and
of a mapping function we apply to it. We can write the mapping function expressing it in the source ( input )
coordinate system or in the final ( output ) coordinates.
Suppose a generic point in the input image has coordinate [u,v] and the corresponding point in the output image
has coordinate [x,y] . If we have X,Y,U,V arbitrary mapping functions that specify our spatial transformation
then we obtain:
(1) [x,y]=[X(u,v),Y(u,v)] (2) [u,v]=[U(x,y),V(x,y)]
(1) is expressed in the output coordinates and as X,Y map the input into the output, it s referred as
forward mapping
(2) is expressed in the input coordinates and as U,V map the output into the input it s referred as
inverse mapping
A A A A
B Forward B B Inverse B
C Mapping C C Mapping C
D D D D
Fig 2.1a Fig 2.1b
For our purpose we need a warping technique that allows us to perform a deformation of the input image to an
output image using a mapping function and two set of points ( landmarks ). The landmarks are determined by a
human operator that marks the input image and the destination image.
A landmark is a point having not only a Cartesian location in the picture plane but also a homologous
(biologically corresponding ) point in every other picture of the data set
It s important to notice that the number of points must be the same in both the images and that after the warping
all the landmark points of the output image will be in the coordinates indicated by the second set; the mapping
function resamples all the pixels of the input image according to the geometric transformation defined by those
two set of points. The set of points could be selected from the same image in order to create a special effect but
are more likely to come from two different images that might not contain similar images ( the first digital
warping was used with a goat image and an ostrich one ).
2.1 WARPING TECHNIQUES
We can identify four different families of warping techniques:
1) The Radial basis warping ( N. Arad )
2) The thin plate splines warping ( F. Bookstein )
3) The 2 pass mesh warping ( Douglas - Smythe 1990 )
4) The field warping ( Beier - Neely )
The first three techniques use a set of points while the fourth one uses a set of lines .
The first two present more than one similitude, they use different mapping techniques and different resampling
algorithms but use similar principles.
Here there is a brief description of the algorithms. I do not enter in details as it would be outside the scope of
this report, full references can be found at the end :
1) the radial basis warping uses the idea that given a univariate function g:R
+
Æ R we can interpolate the
d- dimensional data (x
i
,F
i
) where x
i
∈ R
d
and F
i
∈ R i∈ N by a radial function S(x) represented by
Sx ag x x
i
N
i() (| |)=−
=
∑
1
i
where ||.|| is the Euclidean norm on R
d
. The interpolation of the sums is possible
if the system of linear equations Ga=F, G=g
ij
=g(||x
i
- x
j
||) where a=( a
1
, a
2
,...., a
n
)
T
F= (F
1
, F
2
,..., F
n
)
T
has a
unique solution
2) The thin plate spline warping uses an elegant algebraic approach for the definition of the deformation using
the following function:
Z(x,y)=-U(r)= - r
2
log r
2
where rxy=+
22
is the distance from the Cartesian origin
3) The 2 pass mesh warping uses a 2 pass algorithm, a cubic spline and the Fant s resampling algorithm (I ll
give a more detailed description later on as this is the algorithm I used )
4) Field warping is the newest technique and is mainly used in computer graphics. It uses a set of lines
instead of a set of points, this means that the transformation is achieved by rotating, scaling and translating
the pixels congruently with the lines. The resampling can than be calculated using any resampling algorithm.
It offers a more expressive warping as we have the total control of the warping zones but it is slower and it is
definitely not suitable for our task as it tends to create unpredictable ghost artefacts.
3. EIGEN-FACES
We have already said that a face is a complex object do describe but we can also notice that there are many
similarities in different faces; for example the nose is, usually, in the middle of the face and it is unlikely that
we ll find the eyes under the mouth.
We can try to find an average face and a set of weights that represent the differences between each face and
the average faces. To achieve that we have two different problems. The first one is how to find the average
face and the second one is how to code the differences between the images in order to reconstruct our original
images.
The first question might look trivial as the natural answer would be just to take the mean of the set. But how can
we be sure that this mean is representative of all the faces? As a matter of fact we are not indeed. We can prove
( and we will later in this paper ) that a mean face of about twenty different individuals can be a good
representation of an high number of faces but it is possible to find a face that we cannot code efficiently using
the mean face and a linear combination of its variations. We must be ready to amend the mean face when such a
case arises, reshaping our model to include this new face ( and all its linear combinations ).
To code the variations we need to find the principal components of the distribution of faces or in other words,
we look for the set of features that have the highest variance within the set of all the possible features. We can
extract these features by finding the eigen vectors of the covariance matrix of the set of face images where every
image of our training set is a vector in a highly dimensional space.
Each eigen vector stores the information about the variations of some features through our set of images and it
creates what we call an eigen face. If we compute a linear combination of the first n eigen faces we can
reconstruct the original image or if we wish, we can create a complete new one just changing the linear
combination of the weights .
Our aim is to use the minimum possible set of eigen faces to obtain the best possible result. Each eigen face has
the exact dimension of an image so if we use all the eigen faces we can find ( the number of eigen faces must be
less than the number of images in the training set ) we do not achieve any result as we are merely computing the
linear combination of all the components of the image we wish to reconstruct, that means we just decompose it
and recompose it without performing any interesting analysis nor saving disk space. That s why we are
interested in experiments of reconstruction of faces that use the minimal set of eigen faces.
The basic algorithm for the eigen faces is:
1. Acquire a set of image faces ( the training set )
2. Calculate the eigen faces from the training set keeping only the M images that correspond to the highest
eigenvalues . This set of M faces is our new face space and it can be modified when new faces are added
allowing the program to learn and recognise new faces.
3. Calculate the corresponding distribution in an M dimensional weight space for each known individual by
projecting their face images onto the face space.
After we have performed these steps we have our set of eigen faces; at this point we can, if we wish, create new
faces using all the possible linear combinations of the existing eigen faces. If we want to perform a face
recognition we need to add the following steps to all the images we want to recognise:
1) Calculate a set of weights based on the input image ad the M eigenfaces by projecting the input image onto
each of the eigen faces.
Eigen faces and pca : where we perform our statistical analysis of the face space
1 square = 1 pixel
vector of dimension m*n*1 with m*n=dimensions of the image
=
(Note that the vector is vertical, we draw it horizontal to save space so suppose this is image X the vector is X
T
)
Matrix containing all the faces of the training set, its dimensions are
((m*n),number of images )
We have supposed that each image is m*n
2) Determine if the image belongs to the face image by checking if it is close enough to the face space
3) If it s a face check the weight patterns and decide if it s known or unknown.
= *W=
fig 1.1d
with W=weights
4) If it s a face but it s unknown Update the eigenfaces and/or the weight pattern
The main problem with this approach is the high computational cost required to find the eigen faces, but as it s
evaluated just once it is a small fee to pay for a robust algorithm that offers a good model.
The mathematics involved come straight forward from the definition of P.C.A and the eigen matrix with a
simple algebraic trick. Suppose a face image is of size N*M with N=200 and M=225 and suppose it s a colour
image so each pixel is coded with 3 Bytes. We can either consider it a as a vector of dimension
200*225*3=135,000 or equivalently a point in a 135,000 dimensional space. If we consider a training set of
twenty one images we find a huge matrix of dimensions 135,000*21=2,835,000 ( these numbers are not random
but are extracted from my training set )
Our aim is to find a lower dimensional subspace of this high dimensional matrix that contains the relevant data
as we know that the faces are not randomly distributed therefore we can find the vectors that best account for
the distribution of faces images inside our face space.
We can easily extract the average vector of the matrix simply by adding all the rows together and then dividing
by the number of the rows.
Suppose we have j images Γ
1
,Γ
2
,....,Γ
j
The average will be : ΨΓ=
=
∑
1
1
j
n
n
j
Every face differs from the average vector by a vector Φ
i
= Γ
i
- Ψ We obtain a set of J vectors with the same
size of the original ones. Now we use the principal component analysis and we try to find a set of J orthonormal
vectors u
n
the k
th
vector u
k
is such that λ
kkn
n
J
J
u=
=
∑
1
2
1
()
T
Φ is a maximum with u
l
T
u
k
=1 if k=1 0 otherwise.
The vectors u
k
are the eigen vectors and the scalars λ
k
are the eigenvalues of the covariance matrix
C
J
AA
nn
n
J
==
=
∑
1
1
ΦΦ
TT
where A is the matrix =[ Φ
1
,Φ
2
,..,Φ
j
]. This is obviously an impossible task in a
normal image space such the one we are using for this work. We need a method that is computationally
reasonable.
We can find a more feasible solution to our problem by observing that if the number of data points in the image
space is less than the dimension of the space ( J<M*N) there will be only J-1 rather than M*N significant
eigenvectors being the other eigenvectors obviously all with eigenvalues 0.
We can solve the n*m dimensional eigenvectors by solving at first a J*J matrix ( 21*21) rather than an M*M
(135,000*135,000) matrix and then taking linear combinations of the face images Φ
i
Consider the eigenvectors
v
i
of A
T
A such that A
T
Av
i
=µ
i
v
i
premultiplying both sides by A we have AA
T
Av
i
=µ Av
i
from which we easily
see that Av
i
are the eigenvectors of C=AA
T
then we construct the J*J matrix L=A
T
A where L
Jn
=Φ
J
T
Φ
n
and we
find the J eigenvectors v
i
of L . These vectors determine the linear combinations of the J training set face images
to obtain the eigenfaces uv
iikk
k
J
=
=
∑
Φ
1
with i=1...J
With this system the complexity is reduced from (N*M)
2
(135,000
2
) to the order of the number of images of the
training set ( 21*21=441) where the calculation is still heavy but manageable.
The experiments with the given data set show that the whole procedure of loading the images, calculate the
average face, calculate the difference, perform the PCA, calculate the first nine eigenfaces, normalise the values
of the eigenface matrix, calculate the weights of all the faces and reconstruct them to files requires around 120
seconds on a silicon graphics O
2
3.1 IMPROVING THE EIGEN FACES WITH WARPING
The eigenface technique relies on the fact that all the faces are perfectly aligned. This is a weak assumption. It s
easy to align all the training set on an horizontal line and a vertical one, that might pass on an important feature
of the face : in the figure 1 below, for example, we have used the eyes and the middle of the nose.
Figure 3.1 Figure 3.2
This system allows us to position all the images by scaling and translating in such a way that they all have the
eyes at the same height and the nose at the same width. This is not enough to get the best performance from our
PCA analysis. What we really need is to maximise all the correlation between all the faces, this is possible only
if we have all the features of the faces exactly at the same geometric coordinates.
If you see figure 3.1 and 3.2 you can notice that the eyes and the middle of the nose might have the same spatial
coordinates but the edges of the nose or the mouth have not. This is quite obvious as all the faces have a
different size and shape. Moreover even if there are some correlation between the dimensions of the single
features inside the face there is a certain range of variation in the dimension and the shape.
As we are in a 2-D world we cannot use more than a two axis constraint without introducing some deformation
in the faces. This deformation must be, somehow, controlled and guided otherwise we ll analyse data that have
no sense. We introduce a warping algorithm as a pre-processing technique to the eigen faces. This approach
guarantees a perfect geometric correspondence of all the key features of the images.
A similar technique as been exploited with some success in (5).
warping the training set : where we align all the images
original training take one image mark all the images in warp all the images
set of images and mark it on correspondence of the to the prototype
its main features main features of the prototype
digxy digxy warp
Inverse warping: Where we remove the geometrical deformation introduced with the warping
Set of all the Set of all
reconstructed the faces
faces
inverse warping
4. WARPING THE TRAINING SET
We have chosen the Smythe -Douglas algorithm because it s well known ( it has been around since 1990 ) and
it s proved to be quite robust and effective.
The code comes from Wolberg s book but I had to make some corrections and some changes to make it work
according to the requirements of this project. Before discussing its performance for our modelling task it s
better to give some more details on how it works.
4.1 THE 2 PASS MESH WARPING ALGORITHM
The algorithm requires as input a source image and two matrices of coordinates. The first matrix gives the
position of the landmarks in the source image; the second one specifies their corresponding positions in the
output image. Both the matrices must have the same m*n dimensions in order to establish a one to one
correspondence. With this system each landmark can be referenced by integer indices therefore we can fit any
bivariate function in order to produce a continuos mapping from the discrete set of landmarks given in the
matrices.
The algorithm supposes we have grey level images but it works fine on R.G.B. images provided we split the
channels before we run it and we run it three times, once for each separated channel.
Fig 4.1
an example of a mesh
The two matrices that contain the landmarks form two rectangular ( or squared ) meshes and from now on we
will use this notation rather than the matrix one. The two meshes have the only constraint to be topologically
equivalent which means not only being of the same size but also that we do not allow any foldovers or
discontinuities.
We allow the coordinates in the source mesh to differ from the coordinates of the destination mesh as far as we
want provided that there is no self intersection.
.