It has a pretty large cost for small amounts of work, but Z3 seems like a good dependency for an optimizing compiler - asking it to do nothing from C++ seems to cost about 4ms, which isn't cheap but you can batch these things.
I'm not sure exactly what data structures LLVM uses to track knowledge about the program it's optimising, but with SMT you can ask the solver very non-trivial questions about the code and either eliminate subtle dead code or generate exploitable invariants (for "free")
Well, optimizing similar to how Rust guides the developer to write more optimized code. Comparing them is comparing apples to oranges however. There are many ways to get there. With Z3 you can do symbolic testing and it’s very much about defining if the expected output of the constructed IR is the same as the real output. When it comes to the IR I like to think of Z3 as almost an abstract assembler. Usually a separate language altogether is assembled via the IR that’s produced based on the rules given to SMT. LLVM optimization occurs after it’s been “assembled” but it’s the rules in place given to SMT which guides the developer to write code which can be optimized better by LLVM. Otherwise you could just write anything and hope that LLVM optimizes it. Personally, I wouldn’t want an extra application that tries to silently fix things beyond what LLVM does with IR because it would feel compulsory for everything. There should be an end goal for what it’s optimizing and why in the form of a proof or lemma for it to make sense in the context of Z3. So yes, definitely possible but probably more useful if you are building a new compiler for a new platform rather than trying to do better than an existing one.
The aim here is really to get the optimisations for free rather than basically building an ad hoc system that does the same thing - there is a large gap in the compiler literature for optimisations like this - there is a need for a new 1000-pager at the moment, all the books are older than modern microarchitectures now, let alone the more theoretical things mentioned previously.
You can already get some serious performance increases by giving the compiler information - if I assert two arrays have equal length in D, I can then add an inline statement equivalent to GCC's builtin_unreachable and the compiler does s surprisingly good job of eliding length checks that it normally can't assume.
It sounds nice for optimization in theory but in practice production compilers already have well oiled optimization pipelines that have mostly "flat" performance characteristics (i.e. all the low hanging fruit is gone), and the algorithms are all very well understood because most of the fundamentals really haven't changed in ~20 years or more. The 80/20 rule mostly still applies.
TBF this isn't always true. Unison[1] characterizes register allocation as geometric packing, a solution which is very robust on irregular architectures like Hexagon. GPUs have problems like extremely large register files that the literature doesn't treat very well which some solvers might help with, etc. And fully automated solvers are useful as complementary tools for any compiler, like Souper or Alive2. They're certainly good for specialized tools, especially validators.
Personally, I think to prototype and construct a general program analysis quickly, you're better off with a tool like a Datalog variant. Souffle is pretty good. If you want to make generalized optimizers cheaper and more quickly, especially ones that can explore emergent optimizations that heuristics may not discover, solutions like equality saturation might be a better starting point[2] (it's kind of like a generalized version of a global cost-based SQL optimizer -- so the engine implicitly combines small rules together into more powerful rules.) Specialized solvers for particular logics or domains are still going to be valuable beyond that, of course. If you're writing a production compiler, your job is still cut out for you no matter what.
If someone smart out there could "just" write an SMT solver and a robust Datalog that are efficient and easy to integrate as Lua ("drop in C files and go"), I'd fund that work.
My scheme would really be a cheap way of getting some symbolic optimisations for free during the early days of a compiler, I don't think it's a realistic strategy for all compilation.
Lean 4 is a functional programming language and theorem prover that compiles to C. A lot of research went into making Lean 4 blazingly fast https://leanprover.github.io/publications/
Galois inc reproduced benchmarks by the Lean 4 devs that show that Lean 4 regularly outperforms the C++ stdlib.
This is an awesome mix of raw speed, functional programming, and formal software verification.
First time I see Z3 in the wild. Bonus points for the CLI friendly model checker, but lisp syntax... is less legible than TLA+ imho. Gotta give it a try someday to see for myself if the experience of creating the spec is better.
> but lisp syntax... is less legible than TLA+ imho.
I've developed theorem provers with more natural syntax, and in retrospect simpler grammars are better. The stone cold truth is that most people aren't interested in using theorem provers directly; instead, they're programmatically generating queries. Simpler grammar = easier to generate, and that out-weighs direct usability.
In my next system I'll be supporting both types of grammars in the parser.
Z3 is intended as more of a backend for higher-level languages, like TLA+, to use in model checking. I would be surprised if anyone were writing specs in the input language directly.
I’m just starting to investigate using formal analysis in practice and I think a much more practical topic is the symbolic execution frameworks which can analyze programs and feed them to theorem solvers like Z3.
The most useful and accessible symbolic execution package I’ve found so far is KLEE https://klee.github.io/
If anyone else has recommendations for tools or beginner material I’m all ears!
Nielson & Nielson have the standard textbook in static program analysis that is used everywhere. But it's quite unfriendly as it's written using abstract algebra. They've recently released two textbooks that are much gentler. Actually, I'd say they are easy going and fun but still retain all the mathematical rigor.
They use program graphs, which are a bit less general but a lot easier to digest. They cover all major techniques, including theorem proving, static analysis, model checking, abstract interpretation, type and effect systems, etc:
There's also a companion website with some F# code. The second book, which seems still unfinished discusses how to implement program analyses using datalog. This speeds up development quite a lot. Otherwise, developing your own static analyzer is a lot of work.
My dream is to implement some kind of framework that enables quick DSL creation along with lightweight formal methods support to verify programs written in each DSL. I think restricted semantics is the key to make formal methods practical. Quoting Alan Perlis, "Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy."
>My dream is to implement some kind of framework that enables quick DSL creation along with lightweight formal methods support to verify programs written in each DSL.
There's work in this area using monads. Specifically, Darais (from Galois) et al show in "Abstracting Definitional Interpreters" how given a definitional interpreter you can easily create all sorts of abstractions using a stack of monad transformers. The best part of it all is that your particular chosen stack remains valid when moved between interpreters of different languages.
Your dream of varied static analysis can be achieved using monad transformers, definitional interpreters written in the required style, and Racket's DSL-creation system.
Another option: the "reversing" challenges in infosec Capture The Flag competitions regularly require you to work out which constraints a program has implemented and then plug into Z3 or Angr to find a solution, that might be a fun way to learn.
That was interesting - never seen Z3 applied for verification of networking and cryptographic algorithms.
My team similarly used Z3 to reverse-generate data from SQL queries so that erroneous mutations of the queries get detected early on: http://www.cse.iitb.ac.in/infolab/xdata/
The age old problem of verification vs. validation. Verification is about whether or not you built something right. Validation is about whether or not you built the right thing. This article is about using Z3 to pursue the former, the latter is an entirely different issue.
This is, basically, an unsolvable philosophical problem. You are the only one who can bridge the infinite lacuna between the idea in your head and its properties encoded in an actual model. There are all sorts of validity tests you can add to ensure it matches your vision, of course, but any attempt at solving this problem would just look like an even higher-level language.
This is called validation, and it's much harder epistemically than verification.
Is it possible to check at runtime that the model is accurate?
Do you have a sensible thing to do if the model isn't accurate (e.g., fallback into "failsafe" mode, let an on-call engineer know that an assumption was violated, etc.)?
Which, in turn, requires trusting or proving the soundness of your program generator and only proves that your incorrect original program is exactly as incorrect as the model that verifies it.
Automating the stronger proof (that the model is exactly as correct as the original program, and that the model is correct) hasn't been solved in the general case, to the best of my knowledge.
Evaluating the accuracy of a model is an unsolved problem with both formal and informal approaches.
Formally: we could ask multiple separate model/proof assistants to generate separate models from the same underlying specification, and then attempt to find discrepancies between their predicted results. This really just punts the responsibility: now we're relying on the accuracy of the abstract specification, rather than the model(s) automatically or manually produced from it. It's also not sound; it just allows us to feel more confident.
Informally: we can have a lot of different people look at the model very closely, and produce test vectors for the model based on human predictions of behavior.
You could generate a set of random vectors that span the entire input space, exercise the system with those vectors, and publish some sort of "accuracy" (e.g. generate random vectors through i.i.d. uniform R.V.s over the input space, evaluate f(input), and use the successes in a hierarchical binomial distribution). Remember that most of the time we try to verify programs by building a model to model _edge cases_; after all the "happy path" of the program is simple to test. Edge cases are, by their nature, rare occurrences. As a trivial example, think of a boolean function f(x, y) = x & y. f evaluates to 0 for every value of (x, y) except (1, 1). If we were to create a model of this function, f_model = 0, f_model would appear to evaluate similarly to f 75% of the time. With a sufficiently large input state space, it would be quite feasible to hide essential edge cases in very small tail probabilities (e.g. < 99.5%).
I'm not sure exactly what data structures LLVM uses to track knowledge about the program it's optimising, but with SMT you can ask the solver very non-trivial questions about the code and either eliminate subtle dead code or generate exploitable invariants (for "free")