Chapter 1: Introduction
_________________________________________________________________________________________________________
2
quickly retrieve the images that are similar to a given image, or a user-created image
on the basis of the features. The user may be looking for an image he or she has been
before, for another image of the same scene. Such queries would be enhanced if the
user could access the content of the images: the pattern they contain, the colours,
texture or shape of image objects and related layout and position information. To
support content-based techniques, the video data must be processed in a reasonable
time; histogramming analysis methods are mainly used because they allow us to have
good speed performance.
Prior to storage within the database, the system must first identify the desired
objects, and then calculate a descriptive representation of these objects in a features
extraction phase. Scene Segmentation is the method to obtain these ’object images’ to
represent the story of a long video footage. Previously alphanumeric databases have
ignored this problem, as data for insertion in the database was supplied as simple
entities such as the title of a book, or the address of an employee. Visual information
systems operate on a different type of information: images. Therefore this problem
becomes the extraction of relevant images in order to obtain a group of images that
can represent the ’image index’ or ’image abstract’ of the video stream. The goal of
Scene Segmentation is to identify the scene changes in order to collect one image per
scene and represent the story of the video by this sequence of collected images. It is
assumed that a scene could be represented by a single image. This is generally valid
for news or serial videos, however videos with long scene shots, e.i. home videos,
require more images per scene to be collected. Our work is concentrated on the news
and serial videos. The project has been realised to obtain an image extraction method
for a BBC news stream. BBC news is a good source because it provides a well defined
spatial and temporal structure that can be identified with image processing techniques.
Chapter 1: Introduction
_________________________________________________________________________________________________________
3
To identify a scene change we have to recognise when there is a change in the
content of the image, therefore when a large change in the grey level distribution
occurs. The grey level distribution is represented by the histogram vector h(A)={h0
,h1 ,.. .. ,hn } in a n-dimensional space where hj represents the number of pixels of
grey level j in the image A. An operator distance d(A,B) between the histogram
vectors can be defined to represent the ’distance between the images’.
To detect a scene change we have to analyse a sequence of images and detect when
the distance between the current and the previous images is large. This indicates that a
scene change has occurred. This project investigate the choosing of a the distance
operator and a thresholding method to established when the distance has ’a large
value’. Fig.1.1 shows a diagram of the distance between consecutive images; the scene
changes are shown in a different colour.
fig.1.1 Distance diagram for scene segmentation
The distance measure has to detect grey level distribution changes due to scene
changes and not due to other effects such as intensity shift within the same scene
A B
C D
E F
G H
I
Group of
images
belonging
to the same
scene B
Chapter 1: Introduction
_________________________________________________________________________________________________________
4
shot. A good operator should also limit the effects of the high frequency components
of the grey level distribution. The tolerance of several distance metrics to intensity
shift has been investigated. This report introduces a novel distance metric based on
histogram shape variations which is proven to be tolerant to intensity shift. The new
method also posses other advantages in terms of noise limitation and speed of the
algorithm.
The Scene Segmentation method is a real time application therefore all operations
have to be executed as quickly as possible. We have to note that an ideal scene
segmentation analysis method should analyse all 50 frames per second that form the
television signal. In this way consecutive images within the same scene have small
distance because they are very similar and large distance values easily represent the
scene changes. To analyse all the frames in the video stream is not practical because
the histogram analysis operation requires a finite amount of time for reading and
processing the data. Generally, this processing time depends largely on the
performance of the hardware available. However this time can be reduced by reducing
the time needed to read the data by simply reducing the quantity of data read. A study
has been made to find a minimal limit of data that must be read to have acceptable
performance of speed and results for scene segmentation.
Another relevant problem is how to choose a right thresholding method. It is not
possible to find a constant threshold T valid for all kinds of videos because the
medium amplitude of the distance diagram strongly depends on the particular video
stream considered. A threshold that could be a good scene detector for one case may
not be so good for another. A method has been developed to find an adaptive
threshold that can obtain good results in a variety of different situations.
Chapter 1: Introduction
_________________________________________________________________________________________________________
5
To study real time scene segmentation it is helpful to build a graphic interface
between the researcher and the application in order to have a direct interaction . Using
this interface the images collected by the program can be immediately visualised on
the screen one beside the other and the researcher can easily and quickly judge the
results and adjust the parameters for the next analysis. In fact the research in this field
is executed also by empirical way and therefore it is fundamental to build a powerful
and flexible tool that allows the researcher to try different solutions to different
situations. A software implementation of all the algorithms has been developed. It is
an application for X-Window System and it has been realised using C-programming
language, Motif to build the graphic interface and ITEX100 frame grabber to interface
to the video signal.
Chapter 2: Image Processing
___________________________________________________________________________________________
6
CHAPTER 2: IMAGE PROCESSING
Images are extremely important and widely used carriers of information, not only
in every day life, but also in medicine, remote sensing industry and in many fields of
scientific research. Digital image processing is concerned with the manipulation of
images using computers. Generally, image processing task include the following
operations on the image:
• to capture it from a source: television, video camera, tape, disk ...
• to transform it in order to obtain it in a convenient form
• to elaborate it by changing: brightness, contrast, colour, geometric characteristic ...
• to analyse it to obtain information about objects or features that are shown in it
and their shape, dimension and position
Image processing involves a large amount of data, therefore a convenient way to
do all these operations is to use a digital computer , this is due to its capability to:
• process information in a simple and quick way
• store and retrieve information
• visualise the information using screens, printer ....
Chapter 2: Image Processing
___________________________________________________________________________________________
7
2.1 Image digitisation
An image to be processed by computer should be represented using an appropriate
discrete data structure, for example, a matrix. An image captured by a sensor is
expressed as a continuous function f(x,y) of two co-ordinates in the plane.
Image digitisation means that the function f(x,y) is sampled into a matrix with M
rows and N columns. The image quantization assigns to each continuous sample an
integer value. The continuous range of the image function f(x,y) is split into n
intervals. The finer sampling (i.e. the larger M and N ) and quantization ( the larger n
) the better the approximation of the continuous image function f(x,y) .[4][2]
2.1.1 Sampling
A continuous image function f(x,y) can be sampled using a discrete grid of
sampling points in the plane. Grids used in practice are mainly square (fig.2.1).
One infinitely small sampling point in the grid corresponds to one picture element
in the digital image called a pixel; the set of pixels together covers the entire image.
The image is sampled at points :
x = j∆ x j = 1..M
y = k∆ y k = 1..N
Chapter 2: Image Processing
___________________________________________________________________________________________
8
Two neighbouring sampling points are separated by distance ∆ x along the x axis
and ∆ y along the y axis. Distances ∆ x and ∆ y are called sampling intervals , and the
matrix of samples constitutes the discrete image. The ideal sampling s(x,y) in the
regular grid can be represented using a collection of Dirac distributions:
sxy x j xy k y
k
N
j
M
(,) ( , )=−−
==
∑∑
δ ∆∆
11
The sampled image A(x,y) is the product of the continuous image f(x,y) and the
sampling function s(x,y):
Axy f xy x jxy ky
k
N
j
M
(, ) (,) ( , )=−−
==
∑∑
δ ∆∆
11
It is interesting to note that the effect of sampling frequency reduction is
immediately obvious. Figure 2.1a shows a monochromatic image (of 256 grey levels)
with 128x128 pixels; Figure 2.1b shows the same scene digitised into a reduced grid
of 64x64 pixels; Figure 2.1c into 32x32. Deterioration in image quality is clear from
2.1a to 2.1c.
(a) (b) (c)
fig.2.1 Different sampling: (a)128x128, (b) 64x64, (c) 32x32
Chapter 2: Image Processing
___________________________________________________________________________________________
9
If quality comparable to an ordinary television image is required ,
sampling into 512x512 grid is used; this is the reason why most image
frame grabbers use this high resolution.
2.1.2 Quantization
The magnitude of a sampled image A(j∆ x,k∆ y) is expressed as a digital value in
image processing. The transition between continuous values of the image function and
its digital equivalent is called quantization. The number of quantization levels should
be high enough for human perception of fine details in the image.
Most digital image processing devices use quantization into n equal interval. If b
bits are used to expressed the values of pixel brightness then the number of intensity
levels is :
n = 2
b
Eight bits per pixel are commonly used, therefore in terms of C programming
language a pixel is represented by an unsigned char. Figure 2.2 shows the decline of
the quality of the image using different numbers of quantization levels.
(a) (b) (c)
fig.2.2 Different quantization: (a) 256gl, (b) 16gl, (c) 2gl
Chapter 2: Image Processing
___________________________________________________________________________________________
10
2.2 Overview on image processing techniques
The goal to be achieved by processing images usually falls into one or more of the
following categories[1]:
• digital image coding for the efficient and robust transmission or storage of
images sequences by using data compression or data reduction techniques
• digital image restoration and enhancement to remove or at least reduce the
effects of distortion and noise which may corrupt the image (restoration) or to
amplify specific features of the image (enhancement)
• digital image analysis to extract information from the image in the form of
measurements concerning the imaged phenomena, and perhaps leading to
classification of objects (pattern recognition), a description or even a complete
interpretation of image scene (image understanding)
In the first two categories every kind of transformation can be described as an
image-to-image transformation. In the third case, the result is no longer an image, but
data extracted from the image by an analysis process, for example: edges, positions,
models of objects. The last category can be described by the main operations:
• image segmentation to partition the image into the various objects and
background regions due to some common feature or property
• edge detection to extract the edges of an object to give it a geometric
representation
• image analysis , which may include various types of measurements on the imaged
phenomena, and eventually leading to object classification or scene interpretation
Chapter 2: Image Processing
___________________________________________________________________________________________
11
2.3 Basic operations of digital image processing
The term "basic operation" represent all point to point operations which can be
performed on a digital image.[2]
An image operator transforms an input image A(x,y) into an output image B(x,y) in
accordance with a law f:
B(x,y) = f [A(x,y)]
Each pixel of the output image B(.,.) is function of an another pixel or a group of
pixels of the input image A(.,.).
For a point operator, each output value B(i,j) is some function f of the
corresponding input pixel A(i,j) in the input image (or image domain):
Ai j Bi j
f
(, ) (, )→
An example of point operator is the threshold operator:
Bi j
Ai j T
Ai j T
iff
iff
(, )
(, )
(, )
=
→ <
→ ≥
0
1
For a local or neighbourhood operator the output value B(i,j) is some function of
the pixel values in some neighbourhood around the input pixel A(i,j).
Chapter 2: Image Processing
___________________________________________________________________________________________
12
Ai j Ai j Bi j
f
( , ), ( , ),...... ( , )++ →12
An example of local operator is the average operator:
∑∑
==
⋅
=
M
i
N
j
jiA
MN
jiB
00
),(
1
),(
2.4 Histogram of an image
It is assumed that the image has been digitised and is sampled into M x N pixels,
each of which has been quantized into n intensity grey levels in the range: 0,1,2 .... n-
1. The histogram is an n-dimensional vector, where the generic element h j is :
h j = Yj for j = 0, ..., n-1
being Yj the number of pixels which have quatization level j. In simple terms, an
image histogram is a measure of the tonal values spectra within an image. Therefore,
the histogram of an image A is generally represented by the n-dimensional vector:
h(A) = { h0 ,h1 ,h2 ,....,hn-1 }
The histogram is more commonly displayed in graphical forms as a bar diagram of
fig.2.3a.[5]
Chapter 2: Image Processing
___________________________________________________________________________________________
13
2.4.1 The analogy with calculating probabilities
The histogram of an image shows the frequency of occurrence ( or probability ) of
all different grey levels n . Considering a normalised histogram the integral of
histogram curve is equal to 1 :
hA
hA
S
normalized
original
()
()
=
where S = N x M the integral area of the original histogram.
0
500
1000
1500
2000
2500
3000
fig.2.3 Histogram of an image
1%
0.5%
0%
fig2.4 The same histogram normalized
Chapter 2: Image Processing
___________________________________________________________________________________________
14
2.4.2 Other kind of histograms
It is also possible to consider a cumulative histogram:
hY
ki
i
k
=
=
∑
0
the graphic representation , anyway, is not significant as the previous one in terms of
description of grey level distributions:
The concept of histogram can be generalised to second or higher orders[21] . In a
given image A(x,y) the grey level intensity of every pair of pixels adjacent in one
direction can be used. For instance: the second order histogram h(j,k) shows the joint
probability of the first pixel having intensity j and the second pixel ( adjacent in one
direction: for example x) having the intensity k.
0
50000
100000
150000
200000
250000
300000
fig.2.5 The cumulative histogram
Chapter 2: Image Processing
___________________________________________________________________________________________
15
2.4.3 Uses of histograms
There are many uses, but the most important are :
• Improve digitisation quality ( by histogram equalisation )
• Choice of threshold for segmentation technique
Histogram equalisation
The structure of discrete histogram makes it possible to judge whether an image
has been correctly digitised from the point of view of matching the dynamic range of
the image to the dynamic range of the quantized levels ( 0 to n-1 ).
The histogram shown in fig.2.6(a) shows that the grey level are concentrated
toward the dark end of the grey level range ( level 0 ). Thus this histogram
corresponds to an image with overall dark characteristics. The histogram in fig.2.6(b)
shows that the grey levels are concentrated toward the brighter end of the range ( level
n-1 ) thus the image has overall bright characteristic.
Histogram with a narrow dynamic range means that the original image has a low
contrast, this could be the result of a poor digitisation or ,simply, an image taken in a
too dark, or too bright scene (e.g. the result of a bad camera exposition time).
All these problems can be solved by the enhancement technique of histogram
equalisation; this technique aims to spread the histogram values uniformly on whole