Sommario
[2], dall’intelligenza artificiale [3] [4], alla progettazione hardware [5] [6] fino
all’automazione e verifica di progettazione di circuiti elettronici [7] [8].
Sulla base di cio` molti solver sono stati sviluppati negli ultimi anni allo
scopo di allargare il portafoglio dei problemi risolvibili. Di questi ne sono stati
esaminati alcuni tra quelli allo stato dell’arte quali zChaff, MiniSat, Siege,
RSat, BerkMin e MiniMarch.
L’obiettivo di questo lavoro e` lo sviluppo e l’analisi sperimentale di alcune
tecniche innovative per il solving. Per questo scopo e` stato progettato un
solver, chiamato StaGe, che e` stato quindi implementato da zero e non come
estensione di un altro preesistente. Questa scelta e` dovuta principalmente
al fatto che nessun solver allo stato dell’arte presentava una struttura dati
idonea allo sviluppo delle diverse tecniche. Per garantire la massima efficien-
za possibile del codice e` stata utilizzato il linguaggio di programmazione C++.
In StaGe sono state implementate diverse tecniche gia` esistenti in solver
allo stato dell’arte, come ad esempio la Score Initialization [9] delle euristiche
e il Progress Saving [10], ed altre nuove, come la propagazione dei letterali
puri (Mlf, Monotone Literal Fixing) [11] e il metodo degli Implicanti.
La tecnica Mlf consiste nel semplificare la formula assegnando un letterale a
“vero” qualora il suo negato non occorra all’interno della stessa.
Il metodo degli Implicanti ha lo scopo di assegnare il maggior numero possibile
di variabili per ogni assegnamento determinato dall’euristica. In questo modo
e` possibile aumentare le probabilita` di trovare un conflitto nelle clausole
binarie e di conseguenza ridurre il numero di operazioni di Bcp nelle clausole
n-arie.
Per comparare l’efficacia di queste tecniche sono state utilizzate classi
di istanze provenienti dalle ultime Sat-Competition e Sat-Race. Queste
ultime, di rilevanza a livello internazionale come strumento di valutazione
dei Sat-solver, hanno permesso di osservare le variazioni delle performance
con classi di problemi provenienti da diversi domini. Una stima del tem-
iii
Sommario
po di macchina utilizzato per questo scopo conta approssimativamente 100 ore.
Nell’ultima parte sono analizzate le sopracitate tecniche preesistenti in
combinazione a quelle degli Implicanti e dell’Mlf.
Dai risultati si evince che non tutte le tecniche implementate riescono a
migliorare le prestazione del nostro solver. I migliori si hanno con la tecnica
del Progress Saving in combinazione con la Score Initialization in quanto si
riscontra un aumento del numero di problemi risolti unitamente alla diminu-
zione del tempo totale di esecuzione. Per contro, le tecniche degli Implicanti
e Mlf non hanno dato i risultati sperati non portando a miglioramenti sulla
maggior parte delle formule considerate.
iv
Chapter 1
Introduction
The Boolean Satisfiability problem (Sat) is a decision problem of the com-
plexity theory, which instance is a Boolean expression written using only And,
Or, Not, variables and parentheses. The question is: given an expression, is
there some assignment of False or True values to the variables that can
make the entire expression true? Equally important is to determine whether
no such assignments exist, which imply that the function expressed by the
formula is identically False for all possible variable assignments. In this
latter case, the function is unsatisfiable.
A formula of propositional logic is said to be satisfiable if logical values
can be assigned to its variables in a way that makes the formula True. The
propositional satisfiability problem (PSat, [12]), which decides whether a
given propositional formula is satisfiable, is of central importance in various
areas of computer science, including theoretical computer science, artificial
intelligence [2] [1] [3] [4], hardware design, electronic design automation and
verification [5] [7] [6] [8]. These are a small pool of hypothetical applications
of this kind of problems. From complexity theory we know that the boolean
satisfiability problem is Np-complete [12] [13] [14]. If P is a proper subset of
NP , there exists no algorithm g such that for some polynomial function, f ,
g decides any given Sat instance in time bounded above by f applied to the
instance length.
The problem can be significantly restricted while still remaining Np-
1
Introduction
complete. By applying De Morgan’s laws, we can assume that Not operators
are only applied directly to literals and not to expressions; we refer to either
a variable or its negation as a literal. For example, both x1 and ¬x2 are
literals, the first is a positive literal and the second a negative literal. If we
Or together a group of literals, we get a clause, such as {x1 ∨ ¬x2}. Finally,
let us consider formulas that are a conjunction (And) of clauses. This is the
conjunctive normal form (Cnf).
By sufficiently restricting an intractable problem, one can always obtain a
tractable problem. Some Sat instance classes are solvable in polynomial time.
For example, instances containing no clause with more than two different
literals can be solved in linear time [15]. Instances containing no clause with
more than one positive literal are solvable in linear time [15]. Algorithms
that take exponential time for some classes may take polynomial time for
others. We are interested to test different new techniques in order to increase
the efficiency to solve many instances generated through the application of
Sat or practical problems arising in contemporary industry.
1.1 Objectives
As above described, the problem of boolean satisfiability has central impor-
tance in various areas of computer science, including algorithm languages,
artificial intelligence, and hardware design. Efficient and scalable algorithms
for Sat that were developed over the last decade have contributed to dramatic
advances in our ability to automatically solve problem instances involving tens
of thousands of variables and millions of clauses. Examples of such problems
in Electronic Design Automation (Eda) include combinational equivalence
checking, model checking, formal verification of pipe-lined microprocessors,
automatic test pattern generation, routing of Fpga (field-programmable gate
array), and so on. In fact, a Sat-solving engine is now considered to be an
essential component in the Eda toolbox and all Eda vendors provide such a
capability. In recent years industries such as IBM, Intel and Microsoft have
begun to use with great success the Formal Verification Methods to validate
combinatoric and sequential electronic circuits [5] [7] [6] [8], and software
2
Introduction
systems [2] [1] [3] [4].
For these reasons, different works have been developed in the last years.
The results are, sometimes, not very good. The development of a new Sat-
solver refers to the latest and best techniques in literature in this field of
research. The differences between the solvers do not exclusively consist in the
different introduced features but also in how these features are implemented.
In the next chapter are described and analyzed several modern state-of-art
solvers used in our experimental analysis as basis for comparison. Based on
these considerations, we have developed a new Sat-solver, called StaGe. In
this solver, we have introduced new techniques and inserted the best features
extracted from the state-of-art Sat-solvers. Our work has been focused on
the experimental analysis of the different techniques. With these analysis it
is possible to individuate which combinations of features can obtain the best
results.
Our solver StaGe has been developed from scratch using the base tech-
niques from MiniSat [16] and RSat [17], the two Sat-solvers with the best
results in last Sat-Competition [18] and Sat-Race [19]. Other solvers have
been used as basis for development. One of these ones is Siege[20], a solver
developed taking into account every possible aspect of the use of memory.
To this purpose, in StaGe binary clauses are stored in Database similarly to
Siege, in order to improve performance during data access.
1.2 Thesis Structure
In Chapter 2 are explained the modern Sat-solvers as MiniSat, RSat, Siege,
zChaff [21], and MiniMarch[22]. It is described the implementation of the
different features used as Progress Saving [10], Expensive conflict clause mini-
mization [16], and Look-ahead technique [22]. In particular are analyzed the
performances of these solvers and the relative techniques with the benchmark
instances used in the Sat-Race 2006 1.
1Sat-Race 2006 is a competitive event for solvers of the Boolean Satisfiability (Sat)
problem. For further information see http://fmv.jku.at/sat-race-2006/
3
Introduction
In Chapter 3, it is described our solver StaGe, that has been developed
to test the innovative techniques implemented. The core part of StaGe is the
Database structure, which includes every information necessarily for every
part of the solver as the formula and the learned clauses.
The Parser section describes the adopted solution to read the input formula
from the file and to store these information into the Database. After this
section, Chapter 3 reports the Heuristics part. As in the Database and Parser
part, the solution is a specific adjustment of the version used in MiniSat, and
RSat.
For the Boolean Constraint Propagation (Bcp), it is traced the concept
behind the two literal watching (Tlw) scheme introduced in Chaff [21]. The
Bcp operations is computed in two different strategies: one with a breadth
search in each clause of the formula and the other one with two separated
breadth searches. The first search visits the binary clauses and the second
one the N -ary clauses. The second strategy tries to find the conflict in the
binary clauses in order to reduce the size of the learned conflict clauses.
The Backtracking procedure is developed with the implementation of the
so-called First Unique Implication Point (First-Uip or 1-Uip) introduced in
[23] from the authors of Chaff. At the moment, this is the best method to
implement the backtracking considering all the most important aspects: size
of conflict clauses, memory occupation, and execution time [23].
Chapter 4 describes the new techniques developed and their combined use.
At first, we analyze the Implicate technique. This feature allows to assign as
many variables as possible for each assignment suggested by the heuristic. In
this way, during the Bcp procedure, it is easier to find a conflict in a binary
clause and then reduce the size of the conflict clause.
The Monotone Literal Fixing finds the pure literals in the formula [11]. In
our solver, this feature is introduced with a specific implementation derived
from the work in [11].
Score Initialization is derived from that one introduced in MiniMarch and
Actin[9]. With the initialization of variables (or literals) it is possible to make
more efficient the search.
4
Introduction
The expensive conflict clause minimization is based on the work in MiniSat
and in RSat [16]. The other considered solvers do not include this characteristic.
The advantage to reduce the size of the learned clauses may negatively affect
the computational time.
The most innovative feature introduced in RSat is Progress Saving (Ps)
described in [10]. For each variable, it is stored the value of its last assignment.
When the heuristic selects a variable, it uses this information to choose its
sign and then is used to reduce the amount of work repetition that is inherent
in backtracking Sat-solvers.
Another feature describes in Chapter 4 is the Flip Last Decision (Fld)
technique that consists in a modified version of Look-ahead procedure intro-
duced in MiniMarch. In our solver, StaGe, it is possible to flip (i.e. change
the sign of the variable) the assignment of the last variable if its value does
not affect the implication graph.
In Chapter 5 the results of our experiments and techniques are summa-
rized. The conclusions about our work report that in this area there is not a
perfect solution. The continuous improvements are possible with the use of
new features or with a more efficient logic implementation. Our work opens
more question about the use or not of some characteristics.
The last section of our thesis is dedicated to the future work. For example,
it could be possible to integrate other important techniques used in RSat as
the heuristic for ordering the implications. Even more interesting, it could
be the introduction of the Reassignments feature that could improve the
performances of a Sat-Solver.
5
Chapter 2
State of the art
Boolean satisfiability is well studied, and Sat-solvers have a long history.
The Davis Putnam (Dp) algorithm [24], published in 1960, is typically known
as the first entry in the solver time-line. It is very inefficient in finding
a satisfying assignment to solve real world instances and, moreover, the
memory consumption is explosive in practice. Because of that, the new
Davis-Logemann-Loveland (Dll) algorithm [25]) was developed.
Over the past decade, several powerful Dll solvers has been introduced,
including Posit [26], Satz [27], and Kcnfs [28]. These solvers are different
from the original Dll algorithm in the fact that they have better decision
heuristics. Such solvers are still at the state of art for proving random
instances unsatisfiable (incomplete local-search programs, such as UnitWalk
[29], are more successful on satisfiable random instances).
Conflict-driven clause learning was first used in the solvers Grasp [30] and
RelSat [31]. By appending implicates to the formula and using these implicates
to perform non-chronological backtracking, these solvers outperform the best
Dll solvers on non-random instances. These techniques pose the basis of all
the most successful modern solvers for instances coming from industry.
The solver Chaff made Grasp and RelSat obsolete. The most important
innovation of Chaff was a decision strategy that allows it taking better
advantage in conflict-driven clause learning. Chaff also introduced a variation
of two-pointer Bcp algorithm published several years earlier as part of Sato
6
State of the art
solver [32]. Because of Chaff was designed to be applied in industrial setting,
heavy emphasis was placed on an efficient implementation.
2.1 The DP and DL implementation
Most competitive modern Sat solvers, are, in most cases, extensions of Davis-
Logemann-Loveland (Dll) algorithm [25]. The Dll algorithm is a variation
of the Davis-Putnam (Dp) algorithm [24].
2.1.1 The Davis Putnam Algorithm
The input formula is F .
1. If the input contains an empty clause, return Unsatisfiable.
2. Remove from F all clauses that contain at once both literals x and ¬x
of some variable x. If F is now empty, return Satisfiable.
3. If F contains both the clauses [x] and [¬x], return Unsatisfiable.
4. If there is a unit clause [x] in F , delete from F all clauses containing x,
and remove the negation of x from all clauses where it appears. If F is
now empty, return Satisfiable.
5. If F contains a unit clause go back to 4.
6. While there is a pure literal in F , delete all clauses where it appears. If
F is now empty, return Satisfiable.
7. Select a variable, x, that has a literal in one of the shortest clauses in
F . Replace F with F ∧ [x]. Go back to step 2.
Step 2 removes the clauses that are trivially satisfied. Steps 3 and 4 constitute
the boolean constraint propagation (Bcp). In step 3, the formula is checked for
contradictory unit clauses. The presence of both clauses x and ¬x makes the
formula trivially unsatisfiable. In step 4 all instances of x are removed through
unit-subsumption. If x is False then the unit clause [x] is unsatisfiable. Thus
7
State of the art
x being True is a necessary consequence of any truth assignment that satisfies
all clauses in F . The clauses containing x are satisfiable and it is possible to
remove them. If the negation of x appears, it cannot contribute to satisfying
a clause where it occurs. Therefore it may be removed from such clauses
that will necessarily be satisfied through other assignments. Step 6 purges
every pure literal from the formula. A literal is pure in F if its negation does
not occurs in the formula. It is safe to make these ones True because the
assignment with False cannot contribute to satisfying any clause. Most Sat
solvers do not actually implement this procedure, because it is expensive to
monitor if a literal occurs in a satisfied clause. This rule is developed in the
Monotone Literal Fixing (Mlf) strategy [11]. Whenever the execution reaches
the step 7, all previous strategies can not go on simplifying the formula and
then a variable has to be selected. The policy of selection, or decision strategy,
plays an important role in determining the overall performance of the search.
2.1.2 The Davis-Logemann-Loveland Algorithm
The Davis-Putnam algorithm may make the formula growing up exponentially.
Since the addition space used by Dll is only linear in the number of variables,
it is preferred to the Dp algorithm.
The input formula is F .
1. Let [x] be a unit clause in F . Remove from F all clauses that contain
the literal x. Remove the negation of x from all clauses where it appears.
2. If F contains a unit clause go back to 1.
3. If F contains an empty clause, return Unsatisfiable.
4. While there is a pure literal in F , delete all clauses where a pure literal
occurs. If F is now empty, return Satisfiable.
5. Select variable x that occurs in F .
6. Call the formula F ∧ [x]. If the call returns Satisfiable then returns
Satisfiable.
8
State of the art
7. Call the formula F ∧ [¬x]. If the call returns Satisfiable then returns
Satisfiable, else returns Unsatisfiable.
Steps 1 and 2 implement boolean constraint propagation. In step 5 a decision
strategy is employed to choose x. As for the Dp algorithm, the policy of
choosing variables for removal can have a great impact on overall performance.
In step 6, F ∧ [x] forces x to be True and in step 7, F ∧ [¬x] forces x to
be False. F is satisfiable if and only if either it has a satisfying assignment
under which x is True, or it has one under which x is False.
2.2 Benchmarks
The benchmarks used to test the performances of various state-of-art solvers
and in experimental analysis have been selected by Sat-Race 2006 and
SatLib.
As mentioned in Chapter 1, Sat-Race is a competitive event for solvers
of the Boolean Satisfiability (Sat) problem. It contains benchmarks from the
industry and it is strongly updated.
SatLib is a collection of benchmark problems, solvers, and tools regard-
ing the Boolean Satisfiability (Sat) problem. The main objective of SatLib
is to provide a uniform test-bed for Sat-solvers as well as a site for collecting
Sat problem instances, algorithms, and empirical characterisations of the
algorithms’ performance 1. This collection is not really up to date, and many
benchmarks, are solvable from every modern Sat-solvers in just a few seconds.
However, there are still some benchmarks that are not solved by any state of
art Sat-solvers.
In our tests, for every solver was given 15 minutes to decide each instance
on an 1.73 GHz Pentium Mobile M with 1 gigabyte of RAM. The industrial
benchmarks from pipe-lined machine verification (manol-pipe, velev) derive
1For further information see http://www.satlib.org/
9
State of the art
Inst. family(#,Sat,Unsat) V C CV Bin LitsN-Clauses
aloul (1, 0, 1) 260.00 1586.00 6.10 98.36% 10.00
een (3, 0, 3) 83468.00 251647.00 3.01 59.98% 3.05
goldb (4, 0, 4) 15271.50 95211.00 6.23 22.98% 2.84
grieu (3, 3, 0) 643.33 80498.67 125.13 19.64% 3.02
hoons (4, 0, 4) 18287.00 54661.50 2.99 66.65% 3.00
manol-pipe (35, 0, 35) 93858.91 278301.31 2.97 66.67% 3.00
mizh (3, 2, 0) 60395.00 221424.33 3.67 44.25% 3.62
narain (3, 1, 2) 1741952.00 6808991.33 3.91 76.19% 4.04
schup (3, 0, 3) 338220.00 1069494.00 3.16 48.00% 2.07
simon (4, 3, 1) 9113.50 32073.50 3.52 48.42% 4.37
stric-bmc (1, 1, 0) 39598.00 194641.00 4.92 73.92% 4.51
vange (1, 1, 0) 117426.00 432787.00 3.69 8.04% 3.14
velev (34, 18, 16) 112890.65 3753762.56 33.25 80.79% 9.71
Table 2.1: Industrial benchmarks selected from former Sat-Race 2006. Each
row represents a different istance family, with the following information:
“V”, average number of variables; “C”, average number of clauses; “C / V”,
average ratio between number of clauses and number of variables; and “Lits /
N -Clauses”, average number of literals for nary clauses.
from the encoding of electronic design automation problems.
The benchmarks selected from SatLib are shown in Table 2.2. Are in-
cluded the following instance families: BF (circuit fault analysis: Bridge
Fault), blocksworld and logistics (planning), BMC (Sat-encoded bounded
model checking), flat (“Flat” Graph Colouring), GCP (large Sat-encoded
Graph Colouring Problems), hanoi (Sat-encoding of Towers of Hanoi), ind-
inference (from a problem in inductive inference), parity (from problem in
learning the parity function), phole (Pigeon hole problem), pret (encoded
2-colouring), QG (Sat-encoded Quasigroup), SSA (Circuit fault analysis:
Single-Stuck-At fault), and SW100 (“Morphed” Graph Colouring).
2.3 zChaff
zChaff is a Sat-solver that targets the industrial category and hopes to
be reasonably successful in the handmade category. It implements the well
known Chaff algorithm [33] which includes the Vsids decision strategy.
The principal feature introduced in zChaff are:
10
State of the art
Inst. family(#,Sat,Unsat) V C CV Bin LitsN-Clauses
ais (4, 4, 0) 155.00 2729.50 17.61 49.13% 3.16
BF (4, 0, 4) 1697.50 5037.25 2.97 57.16% 3.20
blocksworld (7, 7, 0) 1644.29 29877.86 18.17 75.10% 3.05
BMC (13, 13, 0) 30077.46 160334.38 5.33 78.60% 5.25
flat (100, 100, 0) 90.00 300.00 3.33 90.00% 3.00
GCP (4, 4, 0) 3843.75 206255.50 53.66 99.91% 19.75
hanoi (2, 2, 0) 1324.50 9693.00 7.32 29.18% 3.06
ind-inference (41, 41, 0) 655.90 7994.73 12.19 92.65% 18.32
logistics (4, 4, 0) 1881.25 11682.25 6.21 70.19% 3.78
parity (30, 20, 0) 1044.47 3542.40 3.39 22.79% 3.00
phole (5, 0, 5) 74.00 322.00 4.35 97.20% 8.00
pret (8, 0, 8) 105.00 281.00 2.68 0.00% 3.00
QG (22, 10, 12) 1043.18 29078.36 27.87 42.24% 3.38
SSA (8, 4, 4) 2351.00 6674.38 2.84 60.53% 3.18
SW100 (100, 100, 0) 500.00 3100.00 6.20 96.77% 5.00
Table 2.2: Benchmarks selected from SatLib.
Decision Strategy - Vsids and BerkMin Type
Boolean Constraint Propagation - Two literal watching
Non-chronological Backtracking - Learning the First-Uip Conflict Clause
Conflict Clause Based Assignment Stack Shrinking
Decision Strategy - Vsids and BerkMin Type
The use of a heuristic called Variable State Independent Decaying Sum is the
most important contribution of Chaff. It is a literal count heuristic. It allows
Chaff (and zChaff ) to solve difficult industrial Sat problems much faster,
with far fewer decisions than solvers like RelSat, Grasp, and Sato.
Vsids is realized in zChaff as follows. Each literal l has a score s(l), and an
occurrence count r(l). When a decision is necessary, a free literal with the
highest score is set True. Initially for each literal, l, s(l) = r(l) = 0. Before
search begins, s(l) is incremented for each occurrence of a literal, l, in the
input formula. When a clause c is learned during search, r(l) is incremented
for each literal l ∈ c. Every 255 decisions, the scores are updated: for each
literal l s(l) becomes r(l) + s(l)/2, and r(l) becomes zero. zChaff uses this
decision strategy with another one: BerkMin Type Decision Strategy. This
11
State of the art
latter decision strategy is used with the most recent unsatisfied conflict clauses.
The main ideas of this approach are described by the author of BerkMin [34].
Boolean Constraint Propagation - Two literal watching
zChaff uses the two literal watching scheme [33] for Bcp as proposed by Chaff
[33]. A key benefit of the two literal watching scheme is that at the time of
backtracking, there is no need to modify the watched literals in the clause
database. This reduces the total number of memory access. This scheme is
discussed in Chapter 3.
Non-chronological Backtracking - Learning the First-UIP Conflict
Clause
Conflict driven clause learning along with non-chronological backtracking were
first incorporated into a Sat-solver in Grasp and RelSat. These techniques
are essential for efficient solving of structured problems. This technique is
described in Chapter 3.
Conflict Clause Based Assignment Stack Shrinking
This is related to one of the techniques used by the JeruSat solver [35]. When
new learned First-Uip clause exceeds a certain length L, it participates in
the decision strategy.
2.3.1 Benchmarking
The results of zChaff, compared to the other solvers (MiniSat, RSat, BerkMin,
MiniMarch and StaGe) are not so relevant. It is normal that the performances
are not comparable to other solvers because MiniSat, RSat, BerkMin, and
MiniMarch are more recent.
In Table 2.3 are shown the results of zChaff on industrial benchmarks selected
from Sat-Race 2006. The best performances are with the manol instance
12