Thesis defended on 2nd March 2015
in front of a Board of Examiners composed by:
Prof. Giovanni Busatto - Università di Cassino
Prof. Gian Carlo Cardarilli - Università di Roma Tor Vergata
Prof. Luigi Zeni - Seconda Università degli Studi di Napoli
Design techniques for secure cryptographic circuits in deep submicron technolo-
gies
Ph.D. thesis. Sapienza – University of Rome
ISBN: 000000000-0
© 2015 Simone Bongiovanni. All rights reserved
1
Chapter 1
Physical security in submicron
technologies
1.1 Introduction
Following the considerations done in the introduction to this thesis, the most
important question in physical observable cryptography is how the security level
of a cryptographic device re-maps in modern technologies. The trend of reducing
dimension and power consumption of a circuit leads to the need of re-evaluate and
possibly update the level of security of the entire system. Indeed, physical leakage of
a certain implementation designed some years ago can be dramatically different from
the leakage of the same implementation designed with a more modern technology.
The reason is that the devices are optimized under the perspective of improving
the performances in terms of area and power, but these improvements do not take
security into account.
In this chapter we recall some introductory concepts about digital design flow
and cryptography, with the purpose of highlighting the main issues of hardware
security for practical applications. Under this perspective, we first provide a brief
introduction to the digital VLSI design; more specifically, we will focus on the ASIC
design, which is probably the most useful design methodology to describe, in order
to better understand and address the most relevant issues about hardware security.
After describing the standard design flow for digital circuits, we introduce the topic
of physical attacks against cryptographic circuits, with a particular emphasis on Side
Channel Attacks (SCAs) [61]. Among SCAs, we focus on Power Analysis Attacks
(PAAs) [62], which are the most common and popular examples of SCAs and have
been chosen as guide line theme of this thesis work. Then, we will recall some
well-known symmetric key encryption schemes as case studies, and the hardware
design strategies to build secure digital implementations. Under this perspective, we
will provide a brief discussion on the most common countermeasures against PAAs,
with a particular focus on circuit level countermeasures, which are based on the
adoption of a specific circuit architecture at cell level. For this purpose, we will show
that in order to obtain a secure cryptographic circuit, the standard digital design
flow needs to be rearranged by introducing some additional steps, which allow to
enhance the level of hardware security of the chip, conducting to the so called secure
2 1. Physical security in submicron technologies
digital design flow.
1.2 Foundations on ASIC design
1.2.1 Digital design flow strategies for VLSI circuits
The exponential growth of the scale of integration, exemplified by the Moore’s
law [84] (i.e. the increase in the number of transistors per unit area, or, more
generally, the performance of the devices, doubling every 18 months), led to a radical
innovation in the design flow of digital systems. If in the early 70’s the project was
being carried by hand drawing the layout on large sheets of paper in accordance
to the geometrical representation of the circuit on silicon, today this is no longer
feasible due to the huge amount of active devices in the integrated circuits [101],
which is represented by the acronym VLSI, Very Large Scale Integration.
The VLSI design represents the combination of two types of skills, architectural
and circuital:
1. The architecture design skills correspond to the design of block diagrams
modeled at various abstraction levels; they can comprise parts of software that
must be performed by dedicated microprocessors or described in a hardware
description languages (HDL). In this case, only the logical functionality is
considered.
2. The circuit design skill consists in the connection of many transistors or logic
cells, which are implemented at layout level through CAD tools. In this case,
the electrical operation is taken into account.
In general, there are two typical approaches for the realization of digital circuits:
full-custom and semi-custom.
In the full-custom (or, simply, custom) design the chip is implemented from
scratch, so that the individual blocks composing the system are designed up to the
transistor level. More specifically, in this design strategy no third parts modules are
used, and all the entities are modeled and designed for that application. There are
specific software tools which guide the designer along the custom project, allowing
to semi-automate the layout of the chip and reduce the design time. Custom design
has the advantage of optimizing the design in terms of power and performance, at
the expenses of time and money. For this reason this approach is adopted mostly
for the design of few functional blocks (e.g. floating point units and memory cells)
or in general for a large number production.
The semi-custom design instead makes use of logic circuits that have already
been implemented, usually by third parts. They can be divided into two categories:
cell-based or pre-fabricated units and array-based or pre-designed elements. FPGA,
Sea of Gate, PLD, CPLD are all examples of array-based elements. Among the
cell-based units, three different design strategies can be categorized: standard-cells
based, macro-cells based and compiled-cells based.
In the standard-cells based design, each cell corresponds to an elementary com-
ponent, such as a logic gate or a flip-flop, or slightly more complex blocks such as
multiplexers or other arithmetic circuits (i.e. full-adder). For every technological
1.2 Foundations on ASIC design 3
Figure 1.1. A schematic representation of the different possible methodologies to design
digital VLSI circuits.
process there is a cell library composed of a number of logic gates characterized up
to the layout level. In macro-cell based design, each macro-cell corresponds to a
complex block such as an Arithmetic Logic Unit (ALU) or a register-file, but also
small CPUs or memory banks. Finally, the compiled-cell based design is charac-
terized by the usage of a particular class of standard-cells and macro-cells, which
are automatically generated by means of suitable design tools in accordance to the
technological process that the designer wants to use, but not described at layout level.
The above discussed design strategies are represented in the diagram of Fig. 1.1.
1.2.2 ASIC vs FPGA
Cryptographic primitives as any other digital circuit can be implemented in both
software and hardware, according to the target applications. In this paragraph an
overview on the possible cryptographic implementations is briefly discussed [63].
Software implementations are designed and coded in programming languages,
such as C, C++, Java, and assembly language, to be executed, among others,
on general purpose microprocessors, digital signal processors, and smart-cards.
Implementing a cryptographic algorithm in software was typical in the first era of
cryptographic smart-cards, given that the flexibility of microcontrollers is suitable
for portable applications.
The technology scaling as well as new issues in the field of hardware security
facilitated a new hardware-oriented approach, with the development of new optimized
algorithmsforultra-constraineddevices(e.g. RFIDtags). Hardwareimplementations
are in general designed and coded in hardware description languages, such as VHDL
and Verilog HDL, and are intended to be realized using ASIC or FPGA.
ASICs are designed all the way from the behavioral description down to the
physical layout and then sent for a fabrication in a semiconductor foundry. FPGAs
can be bought off the shelf and reconfigured by designers themselves. With each
reconfiguration, which takes only a fraction of a second, an integrated circuit can
perform a completely different function. FPGA consists of thousands of universal
reconfigurable logic blocks, connected using reconfigurable interconnects and switches.
Additionally, modern FPGAs contain embedded higher-level components, such as
memory blocks, multipliers, multipliers–accumulators, and even microprocessor
cores. Reconfigurable input/output blocks provide a flexible interface with the
4 1. Physical security in submicron technologies
outside world. Reconfiguration, which typically lasts only a fraction of a second, can
change a function of each building block and interconnects among them, leading to
a functionally new digital circuit.
As anticipated in the introduction, in this thesis work we focus on the design of
cryptographic ASICs. Basically there are two strategies to design an ASIC: using a
full custom design flow or a semi custom design flow. According to the discussion in
previous paragraph, in the first case the circuit is implemented by designing each
single functional units down to the transistor level, whereas in the second case the
standard-cells available in a specific technology library are used.
In the following section we are going to describe more in detail the design of an
ASIC following a semi-custom design approach.
1.2.3 Description of the semi-custom design flow for ASIC
An ASIC corresponds to a digital system conceived to perform a specific task.
In general, it differs from SoC (System on Chip) by the fact that, while in an ASIC
the designer tends to design "from scratch" the large part of the system, in the
SoC he/she tries to reuse macro-blocks that are available from foundry handbooks.
Furthermore a SoC can be used to carry out multiple functions, whereas an ASIC is
used to execute a single activity (such as numerical signal processing, fingerprints
detection, etc). Today, thanks to the technological improvements, the boundaries
between these two device typologies have largely vanished, so many modern SoCs
integrate some functional blocks conceived for specific task execution. At the same
time, most ASICs contain CPUs and other general-purpose parts, so a modern ASIC
is also a SoC and viceversa.
The design flow of a digital ASIC consists of the following key points [101]:
1. Architectural and electrical specifications: represented by an algorithm de-
scribing what the system must do. At this level (“system level”), this attempts
to identify what are the functional blocks which will make up the device, what
they must to do, but there is no information on how they will do it. At this
level also information relating to delays, power consumption, cost and clock
frequency are known.
2. RTL coding in HDL: the RTL (Register Transfer Level) also called the archi-
tecture level, lies a step below the system layer. After defining the system
architecture, the designer will see in more detail how each module must carry
out its own operations. The RTL specific returns, for each block, the processing
performed on data and data transfers between the memory elements with the
accuracy of the clock cycle. Today the specifications at RTL level are written
using an Hardware Description Language (HDL).
3. Architecture Dynamic Simulation: it allows to evaluate the functionality of
the specifications described in HDL. In particular, through the use of an
appropriately designed testbench it will provide stimuli to the block to be
simulated, and it will show if the data obtained are congruent to those expected.
Dynamic simulation is performed with specific software, NCSim or ModelSim.
1.2 Foundations on ASIC design 5
4. Design constraints and synthesis with standard cell: the design constraints
such as frequency, maximum fan-out of the gates, delays, maximum area, etc.
are defined. Then, each sub-block composing the system is synthesized using
a specific software such as Design Compiler. The constraints are used to guide
this tool in the synthesis.
5. Static Timing Analysis (STA) on each block: corresponds to the heart of the
design of digital integrated circuits. The timing analysis allows to identify
critical paths and improve them, estimate the maximum clock frequency
achievable by the chip, thus identifying the "bottlenecks" of the architecture in
order to eliminate them. Also for this step, there are specific programs.
6. Pre-layout static timing analysis: the timing analysis is performed on the
whole project.
7. Initial floor-plan through cells guided placement: by means of STA a report
file is generated with all constraints necessary to meet the specifications; this
file allows to help the layout tool to place and route the cells.
8. Insertion of the clock tree.
9. Extraction of delays from the layout after the global routing, and consequent
STA.
10. Detailed routing of the cells.
11. Extracting the actual delays by specific software, such as NCSim and Prime-
Time.
12. Post-layout STA.
All these steps must be performed iteratively, so, for example, if the architectural
simulation (step 3) is not satisfactory, the designer must return to the RTL coding
(step 2) and reiterate the flow. So, in general, chip design is an iterative process, and
it may happen that a single step must repeated several times, until all constraints
are satisfied and everything is working properly. The block diagram of the digital
flow is shown in Fig. 1.2.
1.2.4 Main issues in the design of submicron integrated circuits
The use nanometer CMOS technology encountered several problems when the
active devices have been used to create complex digital systems such as ASIC and
SoC using processes below 90nm. While I am writing this thesis, for the most
advanced devices the scaling of transistors has arrived beyond the threshold of 20nm.
The reduction of the size of active devices, combined with the increase in the number
of levels of metal to be used for the interconnections, led to a higher density in the
active area of integrated circuits and to a decrease of the switching delays of the
logic elements. For this reason, now it is possible to integrate multiple functions in
the same chip and achieve higher frequencies.
However, designing digital circuits in deep submicron technologies has revealed
6 1. Physical security in submicron technologies
Constraints
(SDC)
IP DesignWare
Library
Technology
Library
SDF
PDEF
HDL
HDL Compiler
Design Compiler
Optimized netlist
Place & route
Timing & power
analysis
Formal
verification
Back-annotation
Timing
optimization optimization optimization
optimization
Datapath Power
Area Test
synthesis
Timing
closure Symbol
Library
Figure 1.2. Block diagram of a standard digital design flow [119].
some outstanding drawbacks which must be adequately identified and discussed. The
main disadvantage of scaling is that also the power supplied to the digital system
must unavoidably decrease for physical reasons, and this has a negative outcome on
the switching delays, as it takes longer to charge and discharge load capacitances.
This increase of delay, however, is partially mitigated by the decrease of the threshold
voltages. Another problem related to the high scaling of integration is the increase
of process variability, which can impact the functionality of the test-chip and must
be properly addressed. In following paragraphs we briefly describe crosstalk, loss
of synchronization, leakage currents, and voltage drop, which represent the most
common issues in deep submicron design and are described in every book of digital
VLSI design [101].
Crosstalk on the local interconnections
Crosstalk is the interference on the signal propagating on a wire due to the
coupling with an adjacent wire. It is a critical issue in the design of nanoscaled
circuits, because with the technology scaling the metal wires are designed with
increasingly reduced distances, and therefore this issue must be adequately described
and assessed. Crosstalk has a strong impact on the local interconnection wires of
the standard cells, where the most important problem is provided by the capacitive
coupling between the wires. Local interconnect wires are typically short and numer-
ous per area unit, leading to a great density; therefore, the amount of cross-coupled
capacitance on each wire can be noticeable.
The physical reason of cross-talk is the presence of dielectric (SiO
2
) which sep-
arates different metal and causes the appearance of parasitic capacitances; when
a signal propagates along a wire, the cross coupled capacitance between that wire
1.2 Foundations on ASIC design 7
and a wire in proximity is charged, and it may create a charge flow in the near wire
which represents noise for the signal propagating there. This phenomenon can be
critical in particular for dynamic circuits, where wires are floating during a certain
period of time. The "victim" wire may also assume another logic value according to
the amount of charge injected by the "killer" wire, leading to errors of functionality
in the circuit.
The problem of cross-talk must be addressed already during the design phases of
dynamic circuits; for this purpose design tools are provided with specific analysis
algorithms. A possible technological solution to reduce the effect of crosstalk is to
use new dielectric materials, which have a reduced dielectric constant and are called
for this reason low-k materials. Moreover, a good design strategy is to reduce the
number of floating dynamic wires in the points of high density within the chip, and
possibly reducing the cross-coupling by separating them by a sufficient distance,
even if this unavoidably impacts the area overhead; also the width of the metal wires
can impact the amount of cross-coupled capacitances.
Delay and loss of synchronization on the global interconnections
Another problem related to the technology scaling is the increase of probability
of having loss of synchronization among signals, which is further emphasized by the
fabricationdefectsofthedevices. Ingeneral, theproblemofsynchronizationiscritical
ontheglobalroutinginterconnectwires, which arelongandarecharacterizedbyabig
amount of overall parasitic capacitances. Cross-talk as well as parasitic capacitances
to other metal layers are causes of this problem. Furthermore, differential dynamic
circuits are more critical, because the reciprocal delay between two differential signals
may impact the functionality of the circuit, but can also reduce the level of security
of a circuit, as it will be clear in next section.
In order to avoid such problem, logic gates must be designed so to balance
the amount of input capacitance to the driving gate, and prevent that a signal is
loaded with a much more different capacitance than its dual. Furthermore, long
interconnections must be in general avoided; from this perspective, small design
allows to obtain a better balance of the propagation times of the signals. Finally,
process variations must be adequately studied and analyzed during the design steps
through statistical simulations (e.g. Montecarlo, statistical delay modelization, etc.).
Leakage currents
Prior to the submicron era, the overall amount of power consumption was
dominated by the dynamic power consumption, given that as a first approximation
the load of a CMOS logic cell is the input capacitance of another CMOS logic
cell, which is an infinite impedance at low frequencies and does not draw current.
Moreover, there is no conductive path between V
DD
and GND. The overall effect is
that there is no static power consumption in CMOS.
However, the reduction of the dimension of the transistors led to the decrease of
the thickness of the gate oxide and a certain amount of charge can be injected in
the gate region, creating a current flow directly proportional to the width W of the
8 1. Physical security in submicron technologies
device and inversely proportional to the oxide thickness. As a first approximation,
it has been estimated a minimum gate oxide thickness of about 30Å.
A possible solution to reduce this phenomenon is to substitute silicon oxide with
novel dielectric oxides, named high-k materials, with a higher dielectric constant.
Nevertheless, static power consumption is given also by the sub-threshold currents
and by the reverse-biased currents of a pn junction; as it will be shown in next
sections, the overall amount of leakage in submicron devices is inversely proportional
to the length L, and therefore it can reach noticeable values. In general, higher
is the number of transistors and lower is the technology node, higher is the static
power consumption of a circuit.
Voltage drop
Voltage drop is the percent variation of the nominal value of the supply voltage
due to high currents drawn by the devices: higher is the number of devices, higher is
the drop.The reduction of the supply voltage of digital circuits is a direct consequence
of the technology scaling. In order to have a perspective of how much voltage supplies
decreased in last 10 years, we can consider that a 0.6μm was supplied with a voltage
of 5V, passing through 1.2V for the 0.13μm technology, down to less than 1V of
modern 18nm technologies.
Furthermore power consumption increased considerably due to the fact that the
supply voltage decreases less than the increase of the adsorbed current, given that
the number of transistors in a chip continues noticeably to grow with the reduction
of dimensions. The adoption of higher working frequencies is another important
factor.
For all these reasons, the voltage drop has become an important issue in submi-
cron circuits. Increasing the width of the power supply wires, the number of metal
layers for the power supply wires, and the thickness of the power supply nets are
possible solutions for the problem of voltage drop.
1.3 Physical security of cryptographic circuits: a re-
view of Side-Channel Attacks (SCAs)
After having introduced the topic of hardware design in deep submicron tech-
nologies, in this section we are going to discuss the problem of physical security of
cryptographic devices.
A cryptographic primitive can be considered from two points of view: on a side,
it can be seen as an abstract mathematical object or black box, which elaborates
a specific amount of data through a secret key; on the other side, this primitive
is implemented on a real device, with a specific circuit architecture on which a
set of instructions are executed. Classical cryptanalysis has the purpose of reverse
engineering an algorithm (i.e.: recovering the key of the algorithm with reasonable
computational efforts) by studying its mathematical properties. Several efforts have
been done with the aim of implementing more robust algorithms able to resist to all
the most known cryptanalytic attacks, up to the point that very strong schemes have