2015-08-14

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512

*by Javantea**Aug 14, 2015*

I solved my first SAT problem and in doing so, I had to write a valuable improvement to satispy, a simple and powerful Python frontend to minisat and lingeling. I'm making a pull request to netom, the author of satispy and I am publishing my code on Github since the original was published on Github. Allow me to show you the problem I was working on and the possibly elegant solution.

```
def f1(x):
v = x ^ 32
w = v & 99
return w ^ x
#end def f1(x)
```

The function f1(x) is a trivially simple function that behaves so simply that a person could easily guess the approximate output. However, it's complicated enough that if I asked you whether 0x3333 could be produced as an output, you'd have to test a few thousand runs, right? This is a problem that brute force shouldn't have to solve. Either it's satisfiable or it's not. If it's satisfiable, we should be able to present a solution in a reasonable amount of time. Note that 0x3333 is 16 bits long, which means that brute force could be very expensive. Using algebra to limit the range that you have to brute force works on this problem, but what if I made the problem 64-bit (instead of Python's multi precision integer implementation) and made the constants 64-bit variables. Would you be able to solve this in a reasonable amount of time using algebra and brute force? I don't know, but let's find out if a SAT solver can solve this first.

Writing DIMACS CNF (or even CNF) by hand is a terrible experience. Even when you're working on a handful of variables it gets messy. A 16-bit problem like f1 ends up with 64 variables as 272 clauses which are not easy to compute in the slightest. Enter **satispy**. Use similar algebraic operations as the original code and now we have a pretty amazing SAT system. First of all, remember that SAT is about boolean, so when you have a 16-bit variable, you have to deal with 16 variables. But with Python, that's just a for loop. I could have created a class, but I'm not doing that just yet. I want a solution, not a framework to solve anything in n-bits right now.

```
import satispy
bits = 16
vars_x = [satispy.Variable('x%i' % i) for i in range(bits)]
#For this to work... we need to set v and w's constants.
k = 32
vars_k = [satispy.Variable('k%i' % i) for i in range(bits)]
exp_k = satispy.Cnf()
for i in range(bits):
if k & (1<<i):
exp_k &= vars_k[i]
else:
exp_k &= -vars_k[i]
#end if
#next i
m = 99
vars_m = [satispy.Variable('m%i' % i) for i in range(bits)]
exp_m = satispy.Cnf()
for i in range(bits):
if m & (1<<i):
exp_m &= vars_m[i]
else:
exp_m &= -vars_m[i]
#end if
#next i
expected_output = 0x3334
if len(sys.argv) > 1: expected_output = int(sys.argv[1])
vars_cmp = [satispy.Variable('cmp%i' % i) for i in range(bits)]
exp_cmp = satispy.Cnf()
for i in range(bits):
if expected_output & (1<<i):
exp_cmp &= vars_cmp[i]
else:
exp_cmp &= -vars_cmp[i]
#end if
#next i
exp_eq = satispy.Cnf()
for i in range(bits):
#v_exp = vars_x[i] ^ vars_k[i]
#w_exp = v_exp & vars_m[i]
#out_exp = w_exp ^ vars_x[i]
# This line eats all memory. When I don't negate, it doesn't.
#exp_eq &= -(out_exp ^ vars_cmp[i])
exp_eq &= -((((vars_x[i] ^ vars_k[i]) & vars_m[i]) ^ vars_x[i]) ^ vars_cmp[i])
#print(len(exp_eq.dis))
#next i
solution_cnf = exp_k & exp_m & exp_eq & exp_cmp
print(len(solution_cnf.dis))
io = satispy.io.DimacsCnf()
open('easy_problem1a.cnf', 'w').write(io.tostring(solution_cnf))
solver = satispy.solver.Minisat()
res = solver.solve(solution_cnf)
if res.success:
print('\n'.join(['%s %r' % (v.name, res[v]) for v in vars_x]))
x = int(''.join([str(int(res[v])) for v in vars_x][::-1]), 2)
print('x =', x)
elif res.error:
print(res.error)
else:
print("Not satisfiable.")
#end if
```

The first thing to note is that the constants k and m are variables and they have trivial expressions that are appended to the solution CNF. What we're doing is we're setting those constants such that everything is a variable. In the future we could use the fact that we know the value of these to solve equations in the CNF before it gets to the solver, but that's probably not going to be much faster than the solver (which is written in C and has been optimized).

The logic is done on the line below.

` exp_eq &= -((((vars_x[i] ^ vars_k[i]) & vars_m[i]) ^ vars_x[i]) ^ vars_cmp[i])`

As you can see I've taken the equation

` 0x3333 == ((x ^ 32) & 99) ^ x`

and I've made it into a true/false and when it's true, we get x (the input).

` not ((((x ^ 32) & 99) ^ x) ^ 0x3333)`

And done? No. At this point, I had an almost working piece of code. If you run the code with satispy from netom, you will quickly see why I made a few comments to the effect that SAT is difficult and takes hours upon hours to debug. The CNF **blows up** when you negate `((((a ^ b) & c) ^ a) ^ e)`

. **Blows up** is a technical term for when a piece of code tries to eat gigabytes, terabytes, or exabytes of memory. So now we look at what's going on in the negate function in `satispy.Cnf`

.

```
class Cnf(object):
def __init__(self):
self.dis = []
@classmethod
def create_from(cls, x):
if isinstance(x, Variable):
cnf = Cnf()
cnf.dis = [frozenset([x])]
return cnf
elif isinstance(x, cls):
return x
else:
raise Exception("Could not create a Cnf object from %s" % str(type(x)))
def __and__(self, other):
other = Cnf.create_from(other)
result = Cnf()
result.dis = self.dis + other.dis
return result
def __or__(self, other):
other = Cnf.create_from(other)
if len(self.dis) > 0 and len(other.dis) > 0:
new_dis = []
for d1, d2 in [(d1,d2) for d1 in self.dis for d2 in other.dis]:
d3 = d1 | d2
new_dis.append(d3)
elif len(self.dis) == 0:
new_dis = other.dis
else:
new_dis = self.dis
c = Cnf()
c.dis = new_dis
return c
def __xor__(self, other):
return (self | other) & (-self | -other)
def __neg__(self):
cnfs = []
for d in self.dis:
c = Cnf()
for v in d:
c.dis.append(frozenset([-v]))
cnfs.append(c)
ret = cnfs.pop()
for cnf in cnfs:
ret |= cnf
return ret
def __rshift__(self, other): # implies
return -self | other
def __str__(self):
ret = []
for d in self.dis:
ret.append(" | ".join(map(str,d)))
return "(" + ") & (".join(ret) + ")"
def __eq__(self, other):
return self.dis == other.dis
def __hash__(self):
return hash(self.dis)
```

The `__neg__`

function loops over the clauses and then over each variable in the clause creating a new `Cnf`

object with the inverted variable as a clause. It puts those `Cnf`

objects into a list. Then it ors them together. Surprisingly, the for double loop was not the issue. It easily gets past that. The problem comes in the or function. Let's look at `__or__`

now.

```
for d1, d2 in [(d1,d2) for d1 in self.dis for d2 in other.dis]:
d3 = d1 | d2
```

That's the problem. How do we go about solving such a difficult problem though? It seems almost unsolvable from the perspective of how do I reduce these functions. The answer is to look at the logic. Take a look at this.

```
import satispy
try:
satispy.cnf.cnfClass = satispy.NaiveCnf
except:
pass
#end try
a = satispy.Variable('a')
b = satispy.Variable('b')
c = satispy.Variable('c')
d = (((a ^ b) & c) ^ a)
print(str(d))
```

Output:

` (a | b) & (-a | a | -b) & (a | c) & (-a | a | -c) & (-a | b | -c) & (-a | a | -c | -b) & (-a | b | -c | -b)`

As you can see `((a ^ b) & c) ^ a)`

produces a pretty small output. But look at the clauses for a moment.

```
(a | b) & # Fine
(-a | a | -b) & # Not fine
(a | c) & # Fine
(-a | a | -c) & # Not fine
(-a | b | -c) & # Fine
(-a | a | -c | -b) & # Not fine
(-a | b | -c | -b) # Not fine
```

Look closely at `(-a | a | -b)`

. Do you see any logical flaw here? `-a | a`

is a worthless statement. You can just replace it with `True`

. Thus we get `(True | -b)`

. But that is a worthless statement as well because `True | x`

is *always* `True`

. Allow me to illustrate for the non-programmers. Either I am a person or I am not a person or you are an automobile. It doesn't matter whether you are an automobile or not, because the statement has already said "I am a person" or "I am not a person" which is certain to be true. Therefore the entire statement can be removed. The way that CNF works is that you have clauses anded together. Each clause is a set of variables or negated variables or'ed together. This very strict regular format means that all logic has to be converted into this format. It also means that when you naively convert (like satispy
does) you end up with a massive collection of unnecessary statements. I wrote a very small function called `reduceCnf`

that filters out any clause that has `(v | -v)`

in it. For example, running `reduceCnf(d)`

on our d above gives us the following:

```
str(satispy.reduceCnf(d))
'(a | b) & (a | c) & (-a | b | -c)'
```

As you can see a 7 clause CNF now becomes a 3 clause CNF. Using a modified negate and or that uses the reduceCnf function liberally, we are able to run the easy_problem1a.py very quickly. It actually makes sense that instead of writing negate and or naively, we should instead write them in such a way to never allow redundancy because of the memory problem. Anyone using satispy on a serious SAT problem has probably run into this (I can't imagine someone not running into this).

One might suggest that what I've done is very inefficient and should only be considered a workaround. You can see on this problem is wastes a tiny bit of time, but that could add up. But if you look at what I've done, I've actually only run this function where it absolutely has to be. There are improvements that could be made, but until someone writes them, this is the correct solution. I recommend that anyone trying to get more performance out of satispy take a look at improving the or and negate functions more than I have.

SAT is not nearly as difficult as it appears. I recommend spending a full day working with the tools and getting a virtual environment set up to safely run any SAT solver that you may need. When you have the environment, write a simple logical function in Python and see if you can solve it with the code.

Future work will of course involve writing an integer format and a constant format for satispy which will allow a user to write the following code:

```
bits = 64
var_x = Int64Variable('x')
k = Constant64(32)
m = Constant64(99)
expected_output = Constant64(0x3334)
exp_eq = -((((var_x ^ k) & m) ^ var_x) ^ expected_output)
solution_cnf = k.exp & m.exp & exp_eq & expected_output.exp
print(len(solution_cnf.dis))
io = satispy.io.DimacsCnf()
open('easy_problem1a.cnf', 'w').write(io.tostring(solution_cnf))
solver = satispy.solver.Minisat()
res = solver.solve(solution_cnf)
if res.success:
print('\n'.join(['%s %r' % (v.name, res[v]) for v in vars_x]))
x = int(''.join([str(int(res[v])) for v in vars_x][::-1]), 2)
print('x =', x)
elif res.error:
print(res.error)
else:
print("Not satisfiable.")
#end if
```

I am very interested to hear what people have to say about my research. Feel free to e-mail me (please use PGP/GPG), tweet, or phone me. I am currently looking for an job or internship at a Nanotechnology research lab. Let me know if you can introduce me to someone in the field.

Off topic: I wrote this in Markdown and used pandoc to convert it to html. It looks nice, doesn't it? It also supports a lot more formats than HTML5. The markdown version is here.

Javantea out.

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCgAGBQJVzl6zAAoJEDxoyNvLp4PvHoEP/0FaXu+SBuHeufxkOpElI//l DSGsPckbhlJfYDH5thT+4QEgpufWwUoJeWb7Oo4ULL3FnsQ3h7F6Bz7LQ4QMEgno 0HBSchgttP3Qf2caYhf0ZPSoVUu58k500tRWloLydWueggIJ82e4OZHCMMg1ij1M M+nKPqpEzSf/9mMziHuFACeCaJn34HZmr+IXz8PVoRnDpoY28xOCTrMO6qaLIN8P 1DoqHKHLs2sv/QMcUaAorcM/jvAQZBZdyWY0BgDmoPz7+1tqKdBC1WTKRss6OTZc fQi8k0FVu/UPbJ21gPpo1s0UMTK70Mg+dRkyVLZZL0bHjcqZICdiATgKMvOO12vT EnwoUP//b5qsJiwoGy2FbrDCMiEFwVlZ2tKqbQCrLe8N9xPHMAjIwufW/UH3j62c vhtHrjfYv237SSN7y19jhwHvfsDZp5AyQ9VX2Wj7kEEZAUVCAK566kHbcVDYvqqG bsNIgSukakRa7eMHsOtDcB1hcEOZqbd8GPKjre575ro8HhpzxgJ4wRA557dVg6N3 w53aNH/aISf8ZavMqbBEzYTPErSOlt3meBm6bRTX3gA//k9jtI/94QGO0lPey6pq 3UqH2Fwgjosp20LqF9vq6HovvTbSYhQyVibH0wQk1c6jdZIR9iGqwFjKO3ir0qO7 UMhG8RgsvwHeiDFBK8NC =vUfH -----END PGP SIGNATURE-----