"Is progress in SAT solving the sole result of hardware advancement? Time Leap Challenge compared 20-year-old SAT solvers on new computer hardware and modern SAT solvers on 20-year-old computer hardware. Although hardware improvements make old solvers faster, algorithmic progress dominates and drives today's SAT solving."*
Pretty cool given computing progress over the last 20 years:
1. CPUs sped up 40-60x
2. GPU FLOPS/$ increased ~10,000x
A fun exercise is to write a series of sat solvers: brute force, DPLL and CDCL you really get an immediate feel for the strength of the algorithms and none of these implementations need to go much beyond 50-100 lines in a modern language.
You can then of course spend months tuning and optimizing the hell out of your code to squeeze a further 10x performance.
Excellent link. I would love to see a CDCL implementation in a 100 (understandable) lines. From what I see in the repo, creusat is probably more like 1000?
I found the description of CDCL as an abstract rewrite system illuminating. It's much shorter than an implementation. See e.g. [1]. There is/was a more readable version online, but I can't find it now.
This one's 250 understandable lines: https://github.com/marijnheule/microsat. You could probably get it to below 200 lines if you removed the restart logic and learned clause compaction, but it's actually a surprisingly competitive solver as is.
And yet, despite tremendous progress accelerating continuous optimization workloads, there are still no competitive GPU-based SAT solvers. I wonder why?
Super short answer: SAT doesn’t exhibit the nice properties other optimization problems often have where you can tell when you’re near a correct answer. So it doesn’t resemble most continuous optimization problems. Regarding parallelism, there has been research into parallel SAT, but it’s hard in part due to the problem of sharing information between threads/tasks in conflict clause learning algorithms. I don’t remember specific papers but a quick search on google scholar or looking at past SAT competitions would turn up material if you’re interested in the topic.
> SAT doesn’t exhibit the nice properties other optimization problems often have where you can tell when you’re near a correct answer.
You mean like a metric? Not all SAT instances have a natural metric, maybe a large fraction could be relaxed to MAXSAT, but it seems parallel solvers are not very popular in that setting either, i.e., there is no parallel track and parallel solvers are disqualified [1] from participating.
I'm not sure I follow how metrics are related to parallelism though. A naive strategy would be to try every assignment in parallel on 2^n processors. There are plenty of better strategies for search space splitting [2] and stochastic local search [3] algorithms that seem amenable to parallelization.
n is like a million variables so you would need 2^n gazillion cores.
I'm no expert but my thesis advisor is cited in the OP article, so I'm just guessing but it's an interesting question. It's not for the lack of trying in the part of SAT researchers; after the hardware advances with SAT solvers implementation in the early 2000s, they would've looked at parallelism and concurrency as well. But with continuous optimization (like with training machine learning), there's gradient descent which tells you where to guess next. But with Boolean logic there's nothing like that, the needle is either in the next bale of hay, or it isn't.
The 2^n idea is just a thought experiment, but you should be able to realize linear or close-to-linear speedup using a communication-free algorithm that shards the search space evenly across available cores, no? I don't see why this strategy isn't more widely used.
I can see how continuity accelerates the search for solutions, but I see that more as an orthogonal property, not a limitation of parallelism. Even if your domain is continuous, accelerating gradient descent is nontrivial, and many discrete optimizations problems benefit from parallelism.
It is obviously true that the worst-case runtime of a problem on n variables and m clauses is O(m2^n), and that the brute-force algorithm can be sharded out to 2k cores and run in time O(m2^(n-k)). If your approach is brute-force, yes, you might as well use this technique.
But, notice that a shard whose k preset bits are in direct conflict should take zero time -- it's smarter to detect a conflict and terminate than carry on computing all 2^(n-k) invalid assignments. Each such conflicting shard decreases the benefit of your parallelization -- a depth-first solver wouldn't spend almost any time on that particular branch, but you've wasted a whole core on it.
The real question is, what if your algorithm is smarter than brute force? Such algorithms may spend considerable time preprocessing, which is not trivially parallelized, resulting in many cores duplicating tons of work with very little effective speedup.
It depends on the sharding granularity. If you choose a low sharding ratio, then you waste a whole core, but that's assuming 1:1 shards to cores. If you shatter the search space using a large multiple of the number of cores, e.g. 1000:1, then shuffle the shards and have each core pick a shard off the queue, if any core ever gets stuck with a bad shard (i.e., conflict), it discards and picks a new one off the queue. This requires no communication and ensures all cores stay hot. I guess you might also need some preprocessing to remove trivially-conflicting shards.
So I went down the rabbit hole a little and it turns out they did try all the stuff, it just hasn't panned out. Yet. It's one of those questions, theoretically it should work, but empirically it doesn't seem to work that compellingly. The PPfolio solver was to answer the question of why not just independently parallelize across cores, and so now it's a benchmark for parallelization efforts. So people are out there keeping tabs on this problem.
Now apparently neither of us read the article, because indeed there are a couple sections where it discusses parallelization. You might want to look at those remarks. For example, there's current research using GPUs and tensor processors to parallelize SAT.
But going back further I think there's a historical context, and the article alludes to why it was not in fashion the preceding decade. Parallel SAT was looked at for a long time, but that really depended on progress in parallel hardware (the transition from Moore's law having to end i.e. unicore to multicore). So in the 2000s when consumer-level multicore CPUs arose, one would imagine that would've been a turning point for running parallel SAT algorithms. But historically, it was in 2001 that sequential solvers also had their implementation breakthrough, i.e. the Chaff solvers were a 10-100x sequential improvement, and experimentally beating the parallel implementations of the time. It was a big deal. So in terms of fashion there was a lot of interest specifically in sequential solvers at the time. Hence the ebb and flow of research focus.
And it makes sense as a justification. From a bird's eye view, if, n >> k, then the subproblems are still large NP-complete problems. Which is a reasonable rationale for firstly investigating the best theory/algorithms for a sequential solver in the first place, one that would be run at the core of every parallel hardware node. As the article points out, Knuth says to have a good algorithm first, before you resort to throwing ever more hardware at the problem. (In some papers, SAT algorithms that are complete as well as deterministic are also offered as theoretical justifications, and the article also touches on why requiring these properties makes parallelization more complex.)
What the new GPUs/Tensors are doing are massively splitting up the space, perhaps so that n !>> k, and this alternative approach is again mentioned in the concluding paragraph. Clearly that seems like a promising area of future research to the authors. And who knows, maybe one day, there's some new breakthrough, that for all intents and purposes makes P = NP.
Thank you for the detailed reply, this is the first cogent answer I have read that explains why GPU solvers are not as widely used in SAT. The argument that we should focus on getting sequential solvers to run fast, then parallelizing that in hopes that it can be reused on each core does make sense, although I can't help but wonder if there isn't a smarter way to leverage parallelism than simply running CDCL in parallel on n-independent cores.
I guess if I were to summarize your argument, (current) solvers do not use GPUs because there are not many algorithms that can leverage the additional compute (and maybe there is some theoretical barrier?). On the other hand, the continuous optimization folks had neural networks for more than half a century before they figured out how to make them run fast, and were also ridiculed for pursing what was widely believed to be a dead-end. So I would be careful about falling for the "smart people tried really hard for a while" argument.
Yeah if there indeed is some kind of theoretical reason that makes NP-complete problems hard to parallelize, that would be interesting to know about and understand. Maybe somebody has studied this problem.
I think that unlike the story with neural nets, there wasn't some faction trying to say that parallelizing SAT is a bad idea. In fact the state-of-the-art sequential solvers use techniques to exploit properties of CPU caches to get their speedup breakthrough; that in itself was an important lesson within the research community to pay close attention to hardware capabilities. My impression is that ever since that success within their discipline, SAT specialists became interested not just in algorithms/formal theory but also hardware and implementation, and in today's case that would include having a close look at GPUs/TPUs for the possibility of new algorithms.
> there is no parallel track and parallel solvers are disqualified [1] from participating.
Maybe that's as much a cause as an effect. What's the incentive to create and improve parallel solvers if there's no place that evaluates and rewards your work ?
No, parallel SAT solvers are NOT disqualified, in fact there is a parallel track. Seriously? Every year there is a winner for the parallel track. Here's last year's:
Unless I'm missing something, space splitting should be trivial. If you have two cores and three propositional variables x1, x2, x3, you could simply set x1 to tier on one core and false on the other, and then perform two parallel SAT searches on a space with one less variable.
The thing is much more deeper than what you described. You need to choose which variables you will assign, that's first. Then you have to assign them, and it is important to do as carefully because assignment may produce inbalanced search space.
Splitting on one variable when there are hundredths of thousands of them is very, very inefficient.
Using a naive splitting strategy is trivial, but ensuring the distribution is well-balanced across the subspaces is not. You need to preprocess the formula by setting the variables, then doing some propagation, then restarting, otherwise one core will get stuck doing a bunch of useless work.
GPU's are not friends with pointer chasing, which is the basis of contemporary CDCL solvers, because internal lazily-updateable structures are multi-linked lists.
They are not friends with bottlenecks like clause queues used in contemporary stochastic SAT solvers.
They can be good at something like survey and/or belief propagation [1].
The problem is that survey/belief propagatin is effective in the area of random CNF problems and it is not a complete solution technique. It may compute a solution but it cannot provide you with the evidence that there is no soolution.
My hunch is that CDCL is probably the wrong approach to take for parallelization due to synchronization issues. You want an algorithm that is communication-free or makes belief propagation wall-clock competitive for large set of instances. It should be possible to modify BP to be asymptotically complete by using a special-purpose PRNG that ensures samples are drawn uniformly without replacement from the search space, but I'm not sure how to make it more sample-efficient.
It shares short conflict clauses between parallel solvers and achieves superlinear speedup in some cases, e.g., 4 parallel solvers solve faster than one forth of the single solver soolution time.
Short conflict clauses are rare so there is little communication between solvers required.
Author's goal to have solver that is good in computing range from single CPU up to cluster. Judging from CryptoMiniSAT successes, he has mostly reached the goal.
I was thinking there might be some theoretical barrier to parallelization, e.g., maybe naturally-arising SAT instances have poor empirical scaling constants [1], so sequential solvers with good branch heuristics are pretty close to optimal. There are some portfolio solvers (e.g., [2]) that have shown modest, but not dramatic speedups. Or maybe we just haven't tried hard enough.
You could reduce to 2XSAT and presolve the linear portion. That would paralelize. I think people are not doing it because they are not aware of the reduction.
You'd think that the handful of people who devote a significant fraction of their time to writing SAT solver would be aware of such simple tricks and there might be other reasons why your idea isn't being applied.
I would think so too, but I couldn't find this particular trick anywhere.
Anyway, I believe it's similar to LP - we still use simplex method in practice, despite the fact that LP has a polynomial algorithm.
What likely happens in SAT is even though there actually is (as I believe) a nice polynomial algorithm on the order of O(n^4) or so (which involves repeated solving of linear systems), the CDCL still wins in practical problems simply because in most cases, the upfront cost of solving the equations is higher than trying to find a first solution.
However, not all hope is lost. You could actually presolve a more general problem, and then just rely on propagation to deal with a more specific instance. For instance, for integer factorization, you could presolve a general system for integers of certain bit size, and then if you wanted to factorize a specific integer, you would just plugin constants into the presolved instance. That would eliminate most of the upfront cost and bring it on par with CDCL.
Unfortunately, DIMACS is not all that good format to represent SAT, because it is not composable like this - it doesn't allow for representation of presolved instances. But composability might be a win for a different types of algorithms, which do presolving.
I don't really know the exact algorithm yet, but I have a strong feeling it is possible based on what I see. There are like 3 major steps in the strategy, first one is 2XSAT reduction, second is a proof of refutation completeness (in particular, a form of deduction theorem) of a certain logic that models 2XSAT class, and the last one is polynomial-sized representation of all true statements in that logic (which uses n sets of linear equations, and from that follows the complexity of O(n^4)).
The 2XSAT reduction is 100% correct (I have been using it to solve small integer factoring problems), the deduction theorem proof still has a small flaw which I think is fixable, and the third part is an ongoing research which is trucking along. I will get there eventually, but if more people would be looking at this approach, then we (humanity) could arrive there faster, that's all.
Oh you're one of the GPUShareSat guys, I forgot about that one. Would you say the main obstacle to scaling to a million processors is the synchronization overhead or are there certain instances from the SATComp benchmark that exhibited poor scaling with increased parallelism? Do you think sharing is essential or do you have any ideas how to improve communication-free strategies, i.e., parallelism without clause sharing using divide-and-conquer or randomized search? Six years ago, your co-author Mate wrote a very pessimistic remark on GPU-based SAT solvers here [1], I'm curious whether he still believes this or if his option was changed during the course of the project and what changed his mind? Thanks!
Hahha, I am Mate :) I still think that GPGPU can't help in the core of the algorithm. However, I was pleasantly surprised with Nicolas' work. He is a proper magician, and he had a great idea and made it work. Notice that he didn't make the GPU do propagation/conflict generation. Instead, he used it to better distribute clauses between the threads that do all of that. In a way, he improved clause cleaning. When I saw his work, I was very-very happy.
I still hold that GPGPUs can't do CDCL. However, they may be helpful in some of its sub-parts, or may be able to run another algorithm for solving SAT that we haven't invented yet.
Because all known SAT algorithms rely heavily on data-dependent branching based on non-local data. That makes the existing algorithms slow on GPUs. An algorithmic breakthrough is needed to change this.
Interesting. Do you have a link where I can read more about that? How does this relate to k-SAT or CNF complexity, isn't UP is complete w.r.t. these problems?
That's not really a proof that we can't get a nice thing, though. SAT is NP-complete, but here we are talking about it. There may be ways to use parallelism at a higher level. And even if there isn't an alternative to unit propagation, a purely constant factor speedup with parallel processors isn't out of bounds and would still be fantastic.
I remain skeptical. NC=P? is still an open conjecture, so there could be problems which are decidable in P, but only when using a polynomial number of processors or superpolynomial on a single processor, but polynomial on a polynomial number of processors. Furthermore, the amortized complexity of common instances could turn out to be tractable on average (e.g., random 3-SAT instances are tractable unless they fall in a very narrow region where the ratio of clauses to variables is ~4.25, but NP-complete in the worst case).
Pretty cool given computing progress over the last 20 years:
Sources: *Quote shortened for brevity