2
and a method to find the edges in noisy picture. Concludes the chapter the description of a new
technique for automatic discrimination of text images.
3
Publications
Part of the work presented in this thesis has been published in international journals or
conferences proceedings. More precisely:
• LAZA algorithm presented in section 2.3 has been presented at the Signal Processing
and Communications Conference SPC2000 [11], Marbella (Spain); a more advanced
version of LAZA has been published on ELSEVIER Image and Vision Computing
Journal [10] in 2002;
• Algorithms proposed in chapter 2 and experimental results presented in chapter 3 will
be presented at SPIE Electronic Imaging: Sensors, Cameras, and Applications for
Digital Photography Conference [9], in January 2003, San Josè (CA USA).
• Re-indexing algorithm presented in 4.1 has been presented at IEEE Spring
Conference on Computer Graphics SCCG2001 [7], Bratislava (Slovak Republic); a
more detailed version has been submitted to IEEE Transactions on Image Processing
[8];
• The edge finding algorithm [20] has been presented at the Spring Conference on
Computer Graphics SCCG2000, Bratislava (Slovak Republic);
• The automatic discrimination methods for text images [5] will be presented at SPIE
Electronic Imaging – Sensors, Cameras, and Applications for Digital Photography
Conference in January 2003, San Jose (CA USA).
4
Chapter 1: Image Acquisition Devices
1.1 Introduction
Before focusing, in the next chapter, on zooming algorithm, it is useful to review the main
hardware and technological details of today’s image acquisition devices. This review helps to
assess the relevance of the problems that have prompted some parts of the research reported in
this dissertation and provides the usefulness of some of the proposed approaches.
All the algorithms studied in this thesis are applied to digital images. It is hence important
briefly discuss the process of creation of this kind of images. There are two commonly
available methods for creating a digital image:
• Take a picture using a film emulsion, process it chemically, print it onto photographic
paper and then use a digital scanner to sample the print.
• Use a device that will sample the original light that the subject bounces off to create a
digital image (digital camera, mobile phone…)
The reduction of the price, and the increasing of the quality of the digital cameras, have
increased the popularity of the second method. Some market predictions suggest that digital
cameras will become as popular as film-based cameras by 2005.
The main difference between a digital camera and a film-based camera is that the digital
camera has no film. Instead, it has a sensor that converts light into electrical charges. The image
sensor employed by the largest share of digital cameras is a charged-coupled device (CCD).
Some low-end cameras (like the popular webcams) use complementary metal oxide
semiconductor (CMOS) technology. The differences between these sensors are discussed in
section 1.4.
The output of a digital camera is stored in a removable device (floppy disk, flash memory
card, etc.). As with a film camera, it possible to replace the storage device when it’s full and
continue to store pictures over another support. The difference is that it doesn’t need to develop
digital picture. They can be download directly to the computer and then they are ready to be
used.
With many cameras, it is possible to review images stored in memory on a LCD (Liquid
Crystal Display) built in the camera. The same LCD is often used as a viewfinder.
5
Most of today’s cameras store their images in JPEG
1
format; and it is possible to select
between “fine detail mode” and “ normal mode”. Higher-end cameras may also support the
TIFF (Tagged Image File Format) format. While JPEG compresses the image, TIFF does not,
so TIFF images take a lot of memory space. The advantage of TIFF storage is that no data is
lost to the compression process (lossless image compression).
1.2 The structure of a digital still camera
A common Digital Still Camera has a lens system through the image’s light and the
ambient light pass. This is directed to an eyepiece by a mirror and a prism (see Figure 1-1).
Figure 1-1: A digital still camera structure.
When a picture is being taken, the mirror is pivoted up so as to allow light to strike a
recording medium. Optical signals passing through the lens are transformed into electric signals
through an optical low-pass filter (LPF), a color filter array (CFA), and a charge-coupled
device (CCD), respectively. The sensor device (CCD) outputs an analog electric signal, which
passes through the correlated double-sampling (CDS) for reducing thermal noise, and its gain
is adjusted by the automatic gain control (AGC). The output of the AGC is γ -compensated, and
1
JPEG (Joint Photographic Experts Group) is a lossy compression method standardised by ISO.
6
then converted into digital signals by the analog-to-digital converter (ADC). The luminance
(Y) and chrominance (C) components coming from the digital signal are produced by the
digital camera signal processor (DCP). These signals are employed to generate the output file
JPEG or TIFF. If it is necessary to obtain a TV signal, the digital Y and C signals are
transformed into the corresponding analog signals by the digital-to-analog converter (DAC),
and then mixed to form composite TV signals [65], [66].
Figure 1-2: Digital still camera pipeline.
The image quality of single-CCD color camera or camcorder is mainly determined by the
characteristics of the DCP. Main signal processing in the DCP is divided into three parts. The
detection module (DM) performs auto-exposure (AE), auto-focus (AF), auto-white balance
(AWB), CDS, AGC, etc. The CDS and the AGC are performed digitally in some camcorders,
and analogically from the others. After the DM, the signal is converted into RGB and Y
components in the CPM. The encoding module (EM) produces the standard Y signal and the
digitally modulated C signal, too.
Figure 1-3: The DCP structure.
7
1.3 Characteristic parameters
During the pre-capture phase the sensor is read continuously and the output is analyzed in
order to determine three parameters, which determine the quality of the final picture [12]:
• Auto-white Balancing (AWB) automatically compensates the dominant “color”
of the scene. The human eye is able to compensate colors automatically through a
characteristic known as Color Constancy, by which the color white is always perceived as
white independently of the spectral characteristic of the light source illuminating the
scene. When a scene is captured on a picture, the illuminating context is lost, color
constancy does not hold anymore, and white balancing is required to compensate colors.
AWB relies on the analysis of the picture in order to match the white with a reference
white point. White balance adjustment attempts to reproduce colors naturally so images
are not affected by surrounding light. To do that, classical techniques use either simple
global measure of energy of the scene analyzing the relative distribution of the various
chromatic channels or try to adapt the white color to the particular light condition (sunset,
cloudy, …). Auto-white-balancing is sufficient for most conditions, but if there is no near
white color in the picture, colors that are not originally white may appear white in the
image and the white balance of the image may not be correct. Also, Auto-white-balancing
may not have the expected effect when shooting under white fluorescent or other
fluorescent lights. In such cases, some cameras give the possibility to use a white surface
and quick reference white balance to achieve the correct white balance, or use preset
white balance to select a color temperature for the incident light. Alternatively, it is
possible to use preset white balancing to reproduce more red in a picture of a sunset, or
capture a warmer artistic effect under artificial lighting [59].
• Auto Exposure determines the amount of light hitting the sensor and,
differently than traditional cameras, the sensor itself is used for light metering. The
exposure - the amount of light that reaches the image sensor - determines how light or
dark the resulting photograph will be. When the shutter opens, light strikes the image
sensor inside the camera. If too much light strikes it, the photograph will be overexposed-
washed out and faded. Too little light produces an underexposed photograph-dark and
lacking in details, especially in shadow areas. To measure the light reflecting from the
scene, a camera uses to built in light meters. The part of the scene they measure makes a
great difference. Most camera read the entire image area but give more emphasis to the
bottom part of the scene because this reduces the possibility that the bright sky will cause
8
the picture to be underexposed. They also emphasize the center of the image area based
on the assumption that the major subject is placed in. This is called a center-weighted
system. Some system allows the user to select a small area of the scene and meter it
directly using a spot meter. In this mode, only the part of the scene in the center of the
viewfinder is metered [4].
• Auto-Focus techniques are more proprietary and vary from one manufacturer to
another. The Auto-Focus algorithm directly affects picture sharpness. Essentially, it
consists on extracting a measure of the high frequency content of the picture and
changing the focus setting until this measure reaches a maximum [54].
Once the picture is taken a number of different techniques such as Defect Correction,
Noise Reduction and Color Correction are applied to compensate/enhance the sensor output
data [12].
• Defect Correction manages pixel defects related to the sensor and/or to the
memory storing the picture. When systems on a chip solution for DSC are considered,
both sensor and memory can be part of a more complex device. Exploiting the
redundancy of image data, these defects can be corrected in a complete transparent way
for the DSC manufacturer.
• Noise Reduction is performed to limit the visible effects of an electronic error
(or interference) in the final image from a digital camera. Noise is a function of how well
the sensor (CCD/CMOS) and digital signal processing systems inside the digital camera
are prone to and can cope with or remove these errors.
• Color Correction simply adjust the RGB components of a color separation by
mathematical operations and creates a new RGB output based on the relative values for
the input components. It is also called color matrixing or color mixing.
1.4 Difference between CCD and CMOS
Both CCD and CMOS image sensors have to convert light into electrons at the photosites.
A simplified way to think about the sensor used in a digital camera is to think of it as having a
2-D array of thousands or millions of tiny solar cells, each of which transforms the light from
one small portion of the images into electrons. Both CCD and CMOS devices perform this task
using a variety of technologies [67].
9
The next step is to read the value (accumulated charge) of each cell in the image. In a
CCD device, the charge is actually transported across the chip and read at one corner of the
array. An analog-to-digital converter turns each pixel’s value into a digital value. In most
CMOS devices, there are several transistors at each pixel, which amplify and move the charge
using more traditional wires. The CMOS approach is more flexible than CCD because each
pixel can be read individually. CCDs use a special manufacturing process to create the ability to
transport charges across the chip without distortion. This process leads to very high-quality
sensors in terms of fidelity and light sensitivity. CMOS chips, on the other hand, use a
complementary normal manufacturing process to create the chip. Because of the manufacturing
differences, there are several noticeable differences between CCD and CMOS sensors:
• CCD sensors create high-quality, low-noise images. CMOS sensors, traditionally, are
more susceptible to noise;
• Because each pixel on a CMOS sensor has several transistors located next to it, the
light sensitivity of a CMOS chip is lower. Many of the photons hitting the chip hit the
transistors instead of the photodiode;
• CMOS sensors traditionally consume a little of power. Implementing a sensor in
CMOS yields a low-power sensor. CCDs, on the other hand, use a special process that
consumes lots of power. CCDs consume as much as 100 times more power than an
equivalent CMOS sensor;
• CMOS chips can be built on just about any standard silicon production line, so they
tend to be extremely inexpensive compared to CCD sensors;
• CCD sensors have been mass-produced for a longer period of time, so they are more
mature. They tend to have higher quality pixels, and more of them.
Based on these differences, it can be seen that CCDs tend to be used in cameras that are
aimed to produce high-quality images with lots of pixels and excellent light sensitivity. CMOS
sensors usually have lower quality, lower resolution and lower sensitivity. However, CMOS
cameras are much less expensive and have longer battery life. Over time, CMOS sensors will
improve to the point in which they reach near parity with CCD devices in most applications and
probably will not reach the same quality before several years [12].
10
1.5 Resolution
Resolution is perhaps a confusing term in describing the characteristics of a visual image
since it has a large number of competing terms and definitions. Researches in optics define
resolution in terms of modulation transfer function (MTF) computed as the modulus or
magnitude of the optical transfer function (OTF). MTF is not used only to give a resolution
limit at a single point, but also to characterize the response of the optical system to an arbitrary
input. On the other hand, researches in digital image processing and computer vision use the
term resolution in other three different ways [24]:
• Spatial resolution refers to the spacing of pixels in an image and is measured in pixel
per inch (ppi). The higher the spatial resolution is, the greater the number of pixels in
the image and, correspondingly, the smaller the size of individual pixels will be. This
allows for more detailed and subtle color transitions in an image.
• Brightness resolution refers to the number of brightness levels that can be recorded at
any given pixel. This relates to the quantization of the light energy collected in a
photo-receptor element. A more appropriate term for this process is quantization. The
brightness resolution for monochrome images is usually 256 implying that one level is
represented by 8 bits. For full color images, at least 24 bits are used to represent one
brightness level, i.e., 8 bits per color plane (red, green, blue).
• Temporal resolution refers to the number of frames captured per second and is also
commonly known as the frame rate. It is related to the amount of perceptible motion
between the frames. Higher frame rate results in less smearing due to movements in
the scene. The lower limit on the temporal resolution is directly proportional to the
expected motion during two subsequent frames. The typical frame rate suitable for a
pleasing view is about 25 frames per second or above.
In this thesis the term resolution univocally refers to the spatial resolution. In reality the
number of pixels and the maximum available resolution are different. For example, a camera
claims to be a 2.1 megapixel camera and it is capable of producing images with resolution of
1600x1200 (i.e. 1.920.000 pixels). This is not an error, but there is a real discrepancy between
these two numbers. If a camera nominally has 2.1 megapixels, this means that in reality there
are approximately 2.100.000 photosites on the CCD. What happens is that some of the
photosites are not being used for imaging. This because the CCD is an analog device. It’s
necessary to provide some circuitry to the photosites so that the ADC can measure the amount
11
of charge. This circuitry is dyed black so that it cannot absorb any light and distort the image
[67].
1.6 Color filter array
Due to the cost and packing consideration, in most of the DSC only a single electronic
sensor for each pixel is used to capture a color image instead of three sensors to capture three
primary colors. This is usually achieved by covering the surface of CCD with a filter mosaic
called filter color array (CFA). Each filter in the CFA covers a single pixel in the sensor plane
and passes only a specific spectral band in order to capture a specific color component in that
pixel location. A typical widely used CFA pattern, proposed by Bryce Bayer, is known as a
Bayer pattern [16]. Row 1 starts with G and alternates with R. Row 2 starts with B and
alternates with G. The subsequent rows alternate with GRG and then BGB. It possible to notice
that the number of G elements is equal to the double of the sum of the number of R and B. Half
of all pixels are green versus a quarter of blue and a quarter of red pixels. This particular
arrangement relies on the higher sensitivity of our eyes to the green color.
Figure 1-4: Bayer Pattern.
Figure 1-5: Stripes Pattern.
12
Another example of a CFA pattern is known as a “stripes” pattern [1]. Column 1 is all G;
column 2 is all R; and column 3 is all B. Columns are then always repeat G, R, and then B.
Some cameras allow exportation of data in RAW format. In this case, data is technically
formatted in proprietary ways and it describes the picture in the Bayer check board pattern
mentioned above. Such a feature can be used from a professional photograph working with
original input data in order to apply his own enhancement techniques.
If the DSC pipeline is not interrupted to obtain a RAW image, as mentioned above, in the
CPM are recovered the missing two colors in each pixel location. Usually, they are estimated
using the color information of the neighboring pixels. The methodology to recover these
missing colors in every pixel location from sub-sampled image is popularly known as “color
interpolation”. The best color interpolation algorithm allows to improve the quality of the final
image without a high associated computational complexity.
In the following paragraphs two classical color interpolation algorithms are reported:
replication and bilinear.
Figure 1-6: A simplified Digital Still Camera pipeline with reference to CFA.
13
1.6.1 Nearest neighbor interpolation - replication
Each interpolated output pixel is assigned by the value of the nearest pixel in the input
image. The nearest neighbor can be one of the upper, lower, left and right pixels. An example is
illustrated below.
Figure 1-7: Replication.
1.6.2 Bilinear interpolation
The average of the upper, lower, left and right pixel values is assigned as the G value of
the interpolated pixel. For example: G8=(G3+G7+G9+G13)/4.
Figure 1-8: Bayer Pattern.
Interpolation of red/blue pixels at a green position: the average of two adjacent pixel
values in corresponding color is assigned to the interpolated pixel. For example:
B7=(B6+B8)/2; R7=(R2+R12)/2.
14
Interpolation of a red/blue pixel at a blue/red position: the average of four adjacent
diagonal pixel values is assigned to the interpolated pixel. For example:
R8=(R2+R4+R12+R14)/4; B12=(B6+B8+B16+B18)/4.
More generally, in a green position, the green value is present and the red/blue must be
calculated. In a red position, green and blue are compulsory; a blue position needs green and
red values.
Red position:
Green position:
Blue position:
Figure 1-9: Bilinear color interpolation.
15
1.6.3 Some considerations
Replication and bilinear methods are the simplest color interpolation techniques. Despite
the simplicity of the idea, their performance is not the best. Replication gives images with
“stairs”; bilinear returns pictures with high smoothing effects. The commercialized devices use
more sophisticated techniques. These color interpolation algorithms are edge sensing and
performs color correction and error reduction ([1], [23], [26], [52]).
Figure 1-10 reports different results obtained using different methods. The best visual
results are obtained with a more sophisticated algorithms, which preserve the high frequency
related to the edges and reduce the error computation.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 1-10: (a) ideal image; (b) bayer pattern; (c) replication; (d) bilinear; (e) edge sensing
interpolation; (f) interpolation with color correction.
1.7 Conclusions
In this chapter the main features of today’s digital acquisition devices have been
reviewed. The acquisition format for digital images and color interpolation have been discussed
in great detail because these elements are relevant for the results proposed in the next chapters.