"Is progress in SAT solving the sole result of hardware advancement? Time Leap C...

xavxav · on May 26, 2023

A fun exercise is to write a series of sat solvers: brute force, DPLL and CDCL you really get an immediate feel for the strength of the algorithms and none of these implementations need to go much beyond 50-100 lines in a modern language.

You can then of course spend months tuning and optimizing the hell out of your code to squeeze a further 10x performance.

If you're really crazy like Sarek, you can also formally verify these implementations: https://sarsko.github.io/_pages/SarekSkot%C3%A5m_thesis.pdf

philzook · on May 26, 2023

Excellent link. I would love to see a CDCL implementation in a 100 (understandable) lines. From what I see in the repo, creusat is probably more like 1000?

GregarianChild · on May 26, 2023

I found the description of CDCL as an abstract rewrite system illuminating. It's much shorter than an implementation. See e.g. [1]. There is/was a more readable version online, but I can't find it now.

[1] https://easychair.org/publications/open/b7Cr

aaw · on May 26, 2023

This one's 250 understandable lines: https://github.com/marijnheule/microsat. You could probably get it to below 200 lines if you removed the restart logic and learned clause compaction, but it's actually a surprisingly competitive solver as is.

xavxav · on May 26, 2023

Indeed, but CreuSAT is actually a high-performance implementation (ie: worst of the best), I should probably just write up a Rust gist of CDCL.

GregarianChild · on May 26, 2023

It's worth mentioning Mathias Fleury work who first verified a CDCL implementation [1]. He has written follow-up stuff.

[1] https://fmv.jku.at/fleury/papers/Fleury-thesis.pdf

ykonstant · on May 26, 2023

What a lovely and well-written thesis; I am reading it with great interest.

sanxiyn · on May 26, 2023

You may want to read "A Time Leap Challenge for SAT Solving". It's just a really fun paper: https://arxiv.org/abs/2008.02215

bmc7505 · on May 26, 2023

And yet, despite tremendous progress accelerating continuous optimization workloads, there are still no competitive GPU-based SAT solvers. I wonder why?

porcoda · on May 26, 2023

Super short answer: SAT doesn’t exhibit the nice properties other optimization problems often have where you can tell when you’re near a correct answer. So it doesn’t resemble most continuous optimization problems. Regarding parallelism, there has been research into parallel SAT, but it’s hard in part due to the problem of sharing information between threads/tasks in conflict clause learning algorithms. I don’t remember specific papers but a quick search on google scholar or looking at past SAT competitions would turn up material if you’re interested in the topic.

bmc7505 · on May 26, 2023

> SAT doesn’t exhibit the nice properties other optimization problems often have where you can tell when you’re near a correct answer.

You mean like a metric? Not all SAT instances have a natural metric, maybe a large fraction could be relaxed to MAXSAT, but it seems parallel solvers are not very popular in that setting either, i.e., there is no parallel track and parallel solvers are disqualified [1] from participating.

I'm not sure I follow how metrics are related to parallelism though. A naive strategy would be to try every assignment in parallel on 2^n processors. There are plenty of better strategies for search space splitting [2] and stochastic local search [3] algorithms that seem amenable to parallelization.

[1]: https://maxsat-evaluations.github.io/2023/rules.html

[2]: http://sat.inesc-id.pt/~ruben/papers/martins-ictai10-talk.pd...

[3]: https://link.springer.com/article/10.1023/A:1006350622830

calf · on May 26, 2023

n is like a million variables so you would need 2^n gazillion cores.

I'm no expert but my thesis advisor is cited in the OP article, so I'm just guessing but it's an interesting question. It's not for the lack of trying in the part of SAT researchers; after the hardware advances with SAT solvers implementation in the early 2000s, they would've looked at parallelism and concurrency as well. But with continuous optimization (like with training machine learning), there's gradient descent which tells you where to guess next. But with Boolean logic there's nothing like that, the needle is either in the next bale of hay, or it isn't.

bmc7505 · on May 26, 2023

The 2^n idea is just a thought experiment, but you should be able to realize linear or close-to-linear speedup using a communication-free algorithm that shards the search space evenly across available cores, no? I don't see why this strategy isn't more widely used.

I can see how continuity accelerates the search for solutions, but I see that more as an orthogonal property, not a limitation of parallelism. Even if your domain is continuous, accelerating gradient descent is nontrivial, and many discrete optimizations problems benefit from parallelism.

klyrs · on May 26, 2023

It is obviously true that the worst-case runtime of a problem on n variables and m clauses is O(m2^n), and that the brute-force algorithm can be sharded out to 2k cores and run in time O(m2^(n-k)). If your approach is brute-force, yes, you might as well use this technique.

But, notice that a shard whose k preset bits are in direct conflict should take zero time -- it's smarter to detect a conflict and terminate than carry on computing all 2^(n-k) invalid assignments. Each such conflicting shard decreases the benefit of your parallelization -- a depth-first solver wouldn't spend almost any time on that particular branch, but you've wasted a whole core on it.

The real question is, what if your algorithm is smarter than brute force? Such algorithms may spend considerable time preprocessing, which is not trivially parallelized, resulting in many cores duplicating tons of work with very little effective speedup.

bmc7505 · on May 26, 2023

It depends on the sharding granularity. If you choose a low sharding ratio, then you waste a whole core, but that's assuming 1:1 shards to cores. If you shatter the search space using a large multiple of the number of cores, e.g. 1000:1, then shuffle the shards and have each core pick a shard off the queue, if any core ever gets stuck with a bad shard (i.e., conflict), it discards and picks a new one off the queue. This requires no communication and ensures all cores stay hot. I guess you might also need some preprocessing to remove trivially-conflicting shards.

klyrs · on May 27, 2023

You're not wrong, but it still only works for the most naive algorithm.

calf · on May 26, 2023

So I went down the rabbit hole a little and it turns out they did try all the stuff, it just hasn't panned out. Yet. It's one of those questions, theoretically it should work, but empirically it doesn't seem to work that compellingly. The PPfolio solver was to answer the question of why not just independently parallelize across cores, and so now it's a benchmark for parallelization efforts. So people are out there keeping tabs on this problem.

Now apparently neither of us read the article, because indeed there are a couple sections where it discusses parallelization. You might want to look at those remarks. For example, there's current research using GPUs and tensor processors to parallelize SAT.

But going back further I think there's a historical context, and the article alludes to why it was not in fashion the preceding decade. Parallel SAT was looked at for a long time, but that really depended on progress in parallel hardware (the transition from Moore's law having to end i.e. unicore to multicore). So in the 2000s when consumer-level multicore CPUs arose, one would imagine that would've been a turning point for running parallel SAT algorithms. But historically, it was in 2001 that sequential solvers also had their implementation breakthrough, i.e. the Chaff solvers were a 10-100x sequential improvement, and experimentally beating the parallel implementations of the time. It was a big deal. So in terms of fashion there was a lot of interest specifically in sequential solvers at the time. Hence the ebb and flow of research focus.

And it makes sense as a justification. From a bird's eye view, if, n >> k, then the subproblems are still large NP-complete problems. Which is a reasonable rationale for firstly investigating the best theory/algorithms for a sequential solver in the first place, one that would be run at the core of every parallel hardware node. As the article points out, Knuth says to have a good algorithm first, before you resort to throwing ever more hardware at the problem. (In some papers, SAT algorithms that are complete as well as deterministic are also offered as theoretical justifications, and the article also touches on why requiring these properties makes parallelization more complex.)

What the new GPUs/Tensors are doing are massively splitting up the space, perhaps so that n !>> k, and this alternative approach is again mentioned in the concluding paragraph. Clearly that seems like a promising area of future research to the authors. And who knows, maybe one day, there's some new breakthrough, that for all intents and purposes makes P = NP.

bmc7505 · on May 27, 2023

Thank you for the detailed reply, this is the first cogent answer I have read that explains why GPU solvers are not as widely used in SAT. The argument that we should focus on getting sequential solvers to run fast, then parallelizing that in hopes that it can be reused on each core does make sense, although I can't help but wonder if there isn't a smarter way to leverage parallelism than simply running CDCL in parallel on n-independent cores.

I guess if I were to summarize your argument, (current) solvers do not use GPUs because there are not many algorithms that can leverage the additional compute (and maybe there is some theoretical barrier?). On the other hand, the continuous optimization folks had neural networks for more than half a century before they figured out how to make them run fast, and were also ridiculed for pursing what was widely believed to be a dead-end. So I would be careful about falling for the "smart people tried really hard for a while" argument.

calf · on May 27, 2023

Yeah if there indeed is some kind of theoretical reason that makes NP-complete problems hard to parallelize, that would be interesting to know about and understand. Maybe somebody has studied this problem.

I think that unlike the story with neural nets, there wasn't some faction trying to say that parallelizing SAT is a bad idea. In fact the state-of-the-art sequential solvers use techniques to exploit properties of CPU caches to get their speedup breakthrough; that in itself was an important lesson within the research community to pay close attention to hardware capabilities. My impression is that ever since that success within their discipline, SAT specialists became interested not just in algorithms/formal theory but also hardware and implementation, and in today's case that would include having a close look at GPUs/TPUs for the possibility of new algorithms.

tgflynn · on May 26, 2023

> there is no parallel track and parallel solvers are disqualified [1] from participating.

Maybe that's as much a cause as an effect. What's the incentive to create and improve parallel solvers if there's no place that evaluates and rewards your work ?

zero_k · on May 26, 2023

No, parallel SAT solvers are NOT disqualified, in fact there is a parallel track. Seriously? Every year there is a winner for the parallel track. Here's last year's:

https://satcompetition.github.io/2022/results.html

bmc7505 · on May 26, 2023

That’s SATComp, I was talking about the MAXSAT competition.

andrewprock · on May 26, 2023

Unless I'm missing something, space splitting should be trivial. If you have two cores and three propositional variables x1, x2, x3, you could simply set x1 to tier on one core and false on the other, and then perform two parallel SAT searches on a space with one less variable.

thesz · on May 27, 2023

What you describe looks like cube-and-conquer technique [1].

[1] https://www.cs.utexas.edu/~marijn/publications/cube.pdf

The thing is much more deeper than what you described. You need to choose which variables you will assign, that's first. Then you have to assign them, and it is important to do as carefully because assignment may produce inbalanced search space.

Splitting on one variable when there are hundredths of thousands of them is very, very inefficient.

c-cube · on May 26, 2023

The difficulty is which variables to pick so that all subspaces are of comparable difficulty. Cube and conquer is a nice paper on that problem.

bmc7505 · on May 26, 2023

Using a naive splitting strategy is trivial, but ensuring the distribution is well-balanced across the subspaces is not. You need to preprocess the formula by setting the variables, then doing some propagation, then restarting, otherwise one core will get stuck doing a bunch of useless work.

thesz · on May 26, 2023

GPU's are not friends with pointer chasing, which is the basis of contemporary CDCL solvers, because internal lazily-updateable structures are multi-linked lists.

They are not friends with bottlenecks like clause queues used in contemporary stochastic SAT solvers.

They can be good at something like survey and/or belief propagation [1].

[1] https://simons.berkeley.edu/talks/belief-survey-propagation

The problem is that survey/belief propagatin is effective in the area of random CNF problems and it is not a complete solution technique. It may compute a solution but it cannot provide you with the evidence that there is no soolution.

bmc7505 · on May 26, 2023

My hunch is that CDCL is probably the wrong approach to take for parallelization due to synchronization issues. You want an algorithm that is communication-free or makes belief propagation wall-clock competitive for large set of instances. It should be possible to modify BP to be asymptotically complete by using a special-purpose PRNG that ensures samples are drawn uniformly without replacement from the search space, but I'm not sure how to make it more sample-efficient.

sanxiyn · on May 26, 2023

It's not just GPU. There are not even good multicore SAT solvers. State of the art SAT algorithms are serial and they don't parallelize.

thesz · on May 26, 2023

ManySAT: http://www.cril.univ-artois.fr/~jabbour/manysat.htm

It shares short conflict clauses between parallel solvers and achieves superlinear speedup in some cases, e.g., 4 parallel solvers solve faster than one forth of the single solver soolution time.

Short conflict clauses are rare so there is little communication between solvers required.

CryptoMiniSAT: https://github.com/msoos/cryptominisat

Author's goal to have solver that is good in computing range from single CPU up to cluster. Judging from CryptoMiniSAT successes, he has mostly reached the goal.

bmc7505 · on May 26, 2023

I was thinking there might be some theoretical barrier to parallelization, e.g., maybe naturally-arising SAT instances have poor empirical scaling constants [1], so sequential solvers with good branch heuristics are pretty close to optimal. There are some portfolio solvers (e.g., [2]) that have shown modest, but not dramatic speedups. Or maybe we just haven't tried hard enough.

[1]: https://en.wikipedia.org/wiki/NC_(complexity)

[2]: https://baldur.iti.kit.edu/hordesat/files/horde.pdf

js8 · on May 26, 2023

You could reduce to 2XSAT and presolve the linear portion. That would paralelize. I think people are not doing it because they are not aware of the reduction.

adrianN · on May 27, 2023

You'd think that the handful of people who devote a significant fraction of their time to writing SAT solver would be aware of such simple tricks and there might be other reasons why your idea isn't being applied.

js8 · on May 27, 2023

I would think so too, but I couldn't find this particular trick anywhere.

Anyway, I believe it's similar to LP - we still use simplex method in practice, despite the fact that LP has a polynomial algorithm.

What likely happens in SAT is even though there actually is (as I believe) a nice polynomial algorithm on the order of O(n^4) or so (which involves repeated solving of linear systems), the CDCL still wins in practical problems simply because in most cases, the upfront cost of solving the equations is higher than trying to find a first solution.

However, not all hope is lost. You could actually presolve a more general problem, and then just rely on propagation to deal with a more specific instance. For instance, for integer factorization, you could presolve a general system for integers of certain bit size, and then if you wanted to factorize a specific integer, you would just plugin constants into the presolved instance. That would eliminate most of the upfront cost and bring it on par with CDCL.

Unfortunately, DIMACS is not all that good format to represent SAT, because it is not composable like this - it doesn't allow for representation of presolved instances. But composability might be a win for a different types of algorithms, which do presolving.

adrianN · on May 28, 2023

If you know a O(n^4) algorithm for SAT, I'm sure a couple of people would be interested in your paper.

js8 · on May 28, 2023

I don't really know the exact algorithm yet, but I have a strong feeling it is possible based on what I see. There are like 3 major steps in the strategy, first one is 2XSAT reduction, second is a proof of refutation completeness (in particular, a form of deduction theorem) of a certain logic that models 2XSAT class, and the last one is polynomial-sized representation of all true statements in that logic (which uses n sets of linear equations, and from that follows the complexity of O(n^4)).

The 2XSAT reduction is 100% correct (I have been using it to solve small integer factoring problems), the deduction theorem proof still has a small flaw which I think is fixable, and the third part is an ongoing research which is trucking along. I will get there eventually, but if more people would be looking at this approach, then we (humanity) could arrive there faster, that's all.

bmc7505 · on May 31, 2023

Just gotta keep on trucking. Either way, I would be interested in seeing how the 2XSAT reduction works. Don't get discouraged!

zero_k · on May 26, 2023

There is a solver that can make effective use of GPUs (shameless self-promotion): https://github.com/nicolasprevot/GpuShareSat

It is a VERY fun work. Code entirely written by Nicolas Prevot, a magician of CUDA. Paper link here: https://comp.nus.edu.sg/~meel/Papers/sat21-psm.pdf

bmc7505 · on May 26, 2023

Oh you're one of the GPUShareSat guys, I forgot about that one. Would you say the main obstacle to scaling to a million processors is the synchronization overhead or are there certain instances from the SATComp benchmark that exhibited poor scaling with increased parallelism? Do you think sharing is essential or do you have any ideas how to improve communication-free strategies, i.e., parallelism without clause sharing using divide-and-conquer or randomized search? Six years ago, your co-author Mate wrote a very pessimistic remark on GPU-based SAT solvers here [1], I'm curious whether he still believes this or if his option was changed during the course of the project and what changed his mind? Thanks!

[1]: https://news.ycombinator.com/item?id=13667565

zero_k · on May 27, 2023

Hahha, I am Mate :) I still think that GPGPU can't help in the core of the algorithm. However, I was pleasantly surprised with Nicolas' work. He is a proper magician, and he had a great idea and made it work. Notice that he didn't make the GPU do propagation/conflict generation. Instead, he used it to better distribute clauses between the threads that do all of that. In a way, he improved clause cleaning. When I saw his work, I was very-very happy.

I still hold that GPGPUs can't do CDCL. However, they may be helpful in some of its sub-parts, or may be able to run another algorithm for solving SAT that we haven't invented yet.

Just my 2 cents,

Mate

GregarianChild · on May 26, 2023

Because all known SAT algorithms rely heavily on data-dependent branching based on non-local data. That makes the existing algorithms slow on GPUs. An algorithmic breakthrough is needed to change this.

cwzwarich · on May 26, 2023

Unit propagation (a major component of CDCL-based SAT solvers) is P-complete, and thus has no effective parallel algorithm.

bmc7505 · on May 26, 2023

Interesting. Do you have a link where I can read more about that? How does this relate to k-SAT or CNF complexity, isn't UP is complete w.r.t. these problems?

geysersam · on May 26, 2023

Parallelizing SAT solvers seems like a strikingly important avenue for future research.

hgsgm · on May 26, 2023

Similar to faster-than-light travel.

geysersam · on May 26, 2023

Really? Does parallel algorithms for SAT contradict any known laws of physics?

fooker · on May 26, 2023

It contradicts mathematics. :)

Unit propagation is P complete.

dgacmu · on May 26, 2023

That's not really a proof that we can't get a nice thing, though. SAT is NP-complete, but here we are talking about it. There may be ways to use parallelism at a higher level. And even if there isn't an alternative to unit propagation, a purely constant factor speedup with parallel processors isn't out of bounds and would still be fantastic.

fooker · on May 26, 2023

>a purely constant factor speedup with parallel processors isn't out of bounds and would still be fantastic.

The state of the art with this is about 2x the speedup with an unbounded number of threads.

That's not bad, but is not very useful because in most use cases of SAT you have multiple distinct problems solvable in parallel anyway.

bmc7505 · on May 26, 2023

Isn’t UP complete with respect to CNF satisfiability though? How does P completeness relate to parallelizability of SAT?

fooker · on May 26, 2023

It's related in the sense that we can guarantee that all threads have to communicate at every step of the algorithm.

It's not a deal breaker, but we really need some unprecedented advance in mathematics to negate this or to solve SAT without this communication.

bmc7505 · on May 26, 2023

I remain skeptical. NC=P? is still an open conjecture, so there could be problems which are decidable in P, but only when using a polynomial number of processors or superpolynomial on a single processor, but polynomial on a polynomial number of processors. Furthermore, the amortized complexity of common instances could turn out to be tractable on average (e.g., random 3-SAT instances are tractable unless they fall in a very narrow region where the ratio of clauses to variables is ~4.25, but NP-complete in the worst case).

fooker · on May 26, 2023

You're not wrong, but people have been trying to do this for a while now without much success.

So much that you can probably get a turing award by making a SAT solver that can be 100-1000x faster on a GPU than on a CPU.

Also that would lead to solving a bunch of NP complete problems 1000x faster, so you can see why that would be a big deal.