Software Verification and Analysis Using Z3

mhh__ · on Jan 31, 2021

It has a pretty large cost for small amounts of work, but Z3 seems like a good dependency for an optimizing compiler - asking it to do nothing from C++ seems to cost about 4ms, which isn't cheap but you can batch these things.

I'm not sure exactly what data structures LLVM uses to track knowledge about the program it's optimising, but with SMT you can ask the solver very non-trivial questions about the code and either eliminate subtle dead code or generate exploitable invariants (for "free")

wolfspider · on Jan 31, 2021

Well, optimizing similar to how Rust guides the developer to write more optimized code. Comparing them is comparing apples to oranges however. There are many ways to get there. With Z3 you can do symbolic testing and it’s very much about defining if the expected output of the constructed IR is the same as the real output. When it comes to the IR I like to think of Z3 as almost an abstract assembler. Usually a separate language altogether is assembled via the IR that’s produced based on the rules given to SMT. LLVM optimization occurs after it’s been “assembled” but it’s the rules in place given to SMT which guides the developer to write code which can be optimized better by LLVM. Otherwise you could just write anything and hope that LLVM optimizes it. Personally, I wouldn’t want an extra application that tries to silently fix things beyond what LLVM does with IR because it would feel compulsory for everything. There should be an end goal for what it’s optimizing and why in the form of a proof or lemma for it to make sense in the context of Z3. So yes, definitely possible but probably more useful if you are building a new compiler for a new platform rather than trying to do better than an existing one.

mhh__ · on Jan 31, 2021

The aim here is really to get the optimisations for free rather than basically building an ad hoc system that does the same thing - there is a large gap in the compiler literature for optimisations like this - there is a need for a new 1000-pager at the moment, all the books are older than modern microarchitectures now, let alone the more theoretical things mentioned previously.

You can already get some serious performance increases by giving the compiler information - if I assert two arrays have equal length in D, I can then add an inline statement equivalent to GCC's builtin_unreachable and the compiler does s surprisingly good job of eliding length checks that it normally can't assume.

aseipp · on Jan 31, 2021

It sounds nice for optimization in theory but in practice production compilers already have well oiled optimization pipelines that have mostly "flat" performance characteristics (i.e. all the low hanging fruit is gone), and the algorithms are all very well understood because most of the fundamentals really haven't changed in ~20 years or more. The 80/20 rule mostly still applies.

TBF this isn't always true. Unison[1] characterizes register allocation as geometric packing, a solution which is very robust on irregular architectures like Hexagon. GPUs have problems like extremely large register files that the literature doesn't treat very well which some solvers might help with, etc. And fully automated solvers are useful as complementary tools for any compiler, like Souper or Alive2. They're certainly good for specialized tools, especially validators.

Personally, I think to prototype and construct a general program analysis quickly, you're better off with a tool like a Datalog variant. Souffle is pretty good. If you want to make generalized optimizers cheaper and more quickly, especially ones that can explore emergent optimizations that heuristics may not discover, solutions like equality saturation might be a better starting point[2] (it's kind of like a generalized version of a global cost-based SQL optimizer -- so the engine implicitly combines small rules together into more powerful rules.) Specialized solvers for particular logics or domains are still going to be valuable beyond that, of course. If you're writing a production compiler, your job is still cut out for you no matter what.

If someone smart out there could "just" write an SMT solver and a robust Datalog that are efficient and easy to integrate as Lua ("drop in C files and go"), I'd fund that work.

[1] https://unison-code.github.io/ [2] https://egraphs-good.github.io/

MaxBarraclough · on Jan 31, 2021

> most of the fundamentals really haven't changed in ~20 years or more

I'm not sure that's true. Apparently some low-level optimisations are barely worth implementing if you're targeting a modern CPU.

https://news.ycombinator.com/item?id=24991846

xxpor · on Jan 31, 2021

Google's one step ahead of you there :)

https://github.com/google/souper

CalChris · on Jan 31, 2021

I think you meant John Regehr and his souper friends at the University of Utah.

https://arxiv.org/abs/1711.04422

mhh__ · on Jan 31, 2021

Not exactly what I'm talking about.

My scheme would really be a cheap way of getting some symbolic optimisations for free during the early days of a compiler, I don't think it's a realistic strategy for all compilation.

read_if_gay_ · on Jan 31, 2021

That’s just peephole optimizations though.

tromoi21 · on Jan 31, 2021

The author of Z3, Leo de Moura, released Lean 4 milestone 1 about 3 weeks ago: https://github.com/leanprover/lean4

Lean 4 is a functional programming language and theorem prover that compiles to C. A lot of research went into making Lean 4 blazingly fast https://leanprover.github.io/publications/

Galois inc reproduced benchmarks by the Lean 4 devs that show that Lean 4 regularly outperforms the C++ stdlib.

This is an awesome mix of raw speed, functional programming, and formal software verification.

iso8859-1 · on Jan 31, 2021

but how is Lean related to Z3? model checking and dependently typed programming are totally orthogonal, right?

the-smug-one · on Jan 31, 2021

I don't know about that. You can obviously use them for similar things. Namely, proving properties about your software.

Here's a good blog post talking about type theory vs logics: https://ucsd-progsys.github.io/liquidhaskell-blog/2019/10/20...

akavel · on Jan 31, 2021

There is experimental partial support for builtin Z3 integration in Nim language:

https://nim-lang.org/docs/drnim.html

villasv · on Jan 31, 2021

First time I see Z3 in the wild. Bonus points for the CLI friendly model checker, but lisp syntax... is less legible than TLA+ imho. Gotta give it a try someday to see for myself if the experience of creating the spec is better.

throwawaygh · on Jan 31, 2021

> but lisp syntax... is less legible than TLA+ imho.

I've developed theorem provers with more natural syntax, and in retrospect simpler grammars are better. The stone cold truth is that most people aren't interested in using theorem provers directly; instead, they're programmatically generating queries. Simpler grammar = easier to generate, and that out-weighs direct usability.

In my next system I'll be supporting both types of grammars in the parser.

ahelwer · on Jan 31, 2021

Z3 is intended as more of a backend for higher-level languages, like TLA+, to use in model checking. I would be surprised if anyone were writing specs in the input language directly.

igornotarobot · on Jan 31, 2021

True. There are many frontends for Z3 that focus on various domains. For instance, those developed at Microsoft:

- Dafny: https://www.microsoft.com/en-us/research/project/dafny-a-lan...

- Coral: https://www.microsoft.com/en-us/research/project/q-program-v...

- Ivy: https://github.com/microsoft/ivy

romac · on Jan 31, 2021

You nailed it, that's exactly what we use Z3 for in Apalache, a symbolic model checker for TLA+: https://apalache.informal.systems

villasv · on Jan 31, 2021

Ahh, neat. Indeed, checking SAT felt a bit more generally applicable than checking SAT for temporal behaviors.

0xFFC · on Jan 31, 2021

Is there any introductory course for these kind of stuff?

ripecoconut · on Jan 31, 2021

https://sat-smt.codes/main.html is a good introduction to applying SMT to many kinds of problems.

https://rise4fun.com/ for all sorts of examples of what these sorts of algorithms can solve.

https://www.coursera.org/learn/discrete-optimization is a good introduction to constraint optimization, local search, linear programming, and mixed integer programming.

sempron64 · on Jan 31, 2021

I’m just starting to investigate using formal analysis in practice and I think a much more practical topic is the symbolic execution frameworks which can analyze programs and feed them to theorem solvers like Z3.

The most useful and accessible symbolic execution package I’ve found so far is KLEE https://klee.github.io/

If anyone else has recommendations for tools or beginner material I’m all ears!

nextos · on Jan 31, 2021

There's some nice literature. I got into the field 15 years ago and there were practically no good textbooks. Things are way better now: https://avigad.github.io/formal_methods_in_education/

Nielson & Nielson have the standard textbook in static program analysis that is used everywhere. But it's quite unfriendly as it's written using abstract algebra. They've recently released two textbooks that are much gentler. Actually, I'd say they are easy going and fun but still retain all the mathematical rigor.

They use program graphs, which are a bit less general but a lot easier to digest. They cover all major techniques, including theorem proving, static analysis, model checking, abstract interpretation, type and effect systems, etc:

- Formal Methods: An Appetizer https://www.springer.com/gp/book/9783030051556

- Program Analysis: An Appetizer https://arxiv.org/abs/2012.10086

There's also a companion website with some F# code. The second book, which seems still unfinished discusses how to implement program analyses using datalog. This speeds up development quite a lot. Otherwise, developing your own static analyzer is a lot of work.

My dream is to implement some kind of framework that enables quick DSL creation along with lightweight formal methods support to verify programs written in each DSL. I think restricted semantics is the key to make formal methods practical. Quoting Alan Perlis, "Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy."

the-smug-one · on Jan 31, 2021

>My dream is to implement some kind of framework that enables quick DSL creation along with lightweight formal methods support to verify programs written in each DSL.

There's work in this area using monads. Specifically, Darais (from Galois) et al show in "Abstracting Definitional Interpreters" how given a definitional interpreter you can easily create all sorts of abstractions using a stack of monad transformers. The best part of it all is that your particular chosen stack remains valid when moved between interpreters of different languages.

Your dream of varied static analysis can be achieved using monad transformers, definitional interpreters written in the required style, and Racket's DSL-creation system.

nextos · on Jan 31, 2021

Thanks. I'm quite familiar with all the work from David Van Horn, including the paper you cited. I also think it's the way forward.

cjbprime · on Jan 31, 2021

Another option: the "reversing" challenges in infosec Capture The Flag competitions regularly require you to work out which constraints a program has implemented and then plug into Z3 or Angr to find a solution, that might be a fun way to learn.

philzook · on Jan 31, 2021

I'm giving a tutorial in a couple of days. Vids will be up later.

https://github.com/philzook58/z3_tutorial

philzook · on Jan 31, 2021

These are other good resources:

- rise4fun https://rise4fun.com/z3/tutorialcontent/guide

- https://ericpony.github.io/z3py-tutorial/guide-examples.htm

- Programming Z3 -https://theory.stanford.edu/~nikolaj/programmingz3.html

- Nikolaj Bjorner's tutorial https://youtu.be/nGwyNmsxX6I

- Hakank's examples http://www.hakank.org/z3/

- Yurichev's book "SMT by Example" https://yurichev.com/writings/SAT_SMT_by_example.pdf

- http://hackage.haskell.org/package/sbv

- https://www.youtube.com/watch?v=ruNFcH-KibY Tikhon Jelvis - Analyzing Programs with Z3

- https://www.youtube.com/watch?v=rvPWDgJc0O4&ab_channel=ACMSI... - Nadia Polykarpova on Z3

- SAT SMT school https://sat-smt.in/

- Emina Torlak's course https://courses.cs.washington.edu/courses/cse507/19au/calend...

- Lindsey Kuper - SMT Solving and Solver-Aided Systems http://composition.al/CSE290Q-2019-09/

- http://www.sc-square.org/CSA/school/lectures.html

schedutron · on Jan 31, 2021

That was interesting - never seen Z3 applied for verification of networking and cryptographic algorithms.

My team similarly used Z3 to reverse-generate data from SQL queries so that erroneous mutations of the queries get detected early on: http://www.cse.iitb.ac.in/infolab/xdata/

jdc · on Jan 31, 2021

This may be a dumb question, but how do I know that the model is accurate?

im_down_w_otp · on Jan 31, 2021

The age old problem of verification vs. validation. Verification is about whether or not you built something right. Validation is about whether or not you built the right thing. This article is about using Z3 to pursue the former, the latter is an entirely different issue.

ahelwer · on Jan 31, 2021

This is, basically, an unsolvable philosophical problem. You are the only one who can bridge the infinite lacuna between the idea in your head and its properties encoded in an actual model. There are all sorts of validity tests you can add to ensure it matches your vision, of course, but any attempt at solving this problem would just look like an even higher-level language.

throwawaygh · on Jan 31, 2021

This is called validation, and it's much harder epistemically than verification.

Is it possible to check at runtime that the model is accurate?

Do you have a sensible thing to do if the model isn't accurate (e.g., fallback into "failsafe" mode, let an on-call engineer know that an assumption was violated, etc.)?

If yes & yes, well, there's your answer :)

1337_d00dZ · on Jan 31, 2021

Do the opposite: generate the program from the model

woodruffw · on Jan 31, 2021

Which, in turn, requires trusting or proving the soundness of your program generator and only proves that your incorrect original program is exactly as incorrect as the model that verifies it.

Automating the stronger proof (that the model is exactly as correct as the original program, and that the model is correct) hasn't been solved in the general case, to the best of my knowledge.

woodruffw · on Jan 31, 2021

That's not a dumb question! The accuracy of one's underlying models is an outstanding problem in verification.

0xFFC · on Jan 31, 2021

Let me ask another question,

how do we evaluate the accuracy of a model?

Thank you for your time.

woodruffw · on Jan 31, 2021

Evaluating the accuracy of a model is an unsolved problem with both formal and informal approaches.

Formally: we could ask multiple separate model/proof assistants to generate separate models from the same underlying specification, and then attempt to find discrepancies between their predicted results. This really just punts the responsibility: now we're relying on the accuracy of the abstract specification, rather than the model(s) automatically or manually produced from it. It's also not sound; it just allows us to feel more confident.

Informally: we can have a lot of different people look at the model very closely, and produce test vectors for the model based on human predictions of behavior.

0xFFC · on Jan 31, 2021

Thank you. Is there any way to statistically evaluate a model? Something like testing a model.

Karrot_Kream · on Jan 31, 2021

You could generate a set of random vectors that span the entire input space, exercise the system with those vectors, and publish some sort of "accuracy" (e.g. generate random vectors through i.i.d. uniform R.V.s over the input space, evaluate f(input), and use the successes in a hierarchical binomial distribution). Remember that most of the time we try to verify programs by building a model to model _edge cases_; after all the "happy path" of the program is simple to test. Edge cases are, by their nature, rare occurrences. As a trivial example, think of a boolean function f(x, y) = x & y. f evaluates to 0 for every value of (x, y) except (1, 1). If we were to create a model of this function, f_model = 0, f_model would appear to evaluate similarly to f 75% of the time. With a sufficiently large input state space, it would be quite feasible to hide essential edge cases in very small tail probabilities (e.g. < 99.5%).