Published: 14th November 2023
DOI: 10.4204/EPTCS.393
ISSN: 2075-2180

EPTCS 393

Proceedings of the 18th International Workshop on the
ACL2 Theorem Prover and Its Applications
Austin, TX, USA and online, November 13-14, 2023

Edited by: Alessandro Coglio and Sol Swords

Preface
Alessandro Coglio and Sol Swords
Extended Abstract: Classical LU Decomposition in ACL2
Carl Kwan
1
Extended Abstract: CHERI Concentrate in ACL2
Carl Kwan, Yutong Xin and William D. Young
6
Extended Abstract: A Bound-Finding Tool for Arithmetic Terms
Sol Swords
11
A Formalization of Finite Group Theory: Part II
David M. Russinoff
16
A Formalization of Finite Group Theory: Part III
David M. Russinoff
33
A Case Study in Analytic Protocol Analysis in ACL2
Max von Hippel, Panagiotis Manolios, Kenneth L. McMillan, Cristina Nita-Rotaru and Lenore Zuck
50
Advances in ACL2 Proof Debugging Tools
Matt Kaufmann and J Strother Moore
67
Using Counterexample Generation and Theory Exploration to Suggest Missing Hypotheses
Ruben Gamboa, Panagiotis Manolios, Eric Smith and Kyle Thompson
82
Formal Verification of Zero-Knowledge Circuits
Alessandro Coglio, Eric McCarthy and Eric W. Smith
94
Verification of GossipSub in ACL2s
Ankit Kumar, Max von Hippel, Panagiotis Manolios and Cristina Nita-Rotaru
113
Proving Calculational Proofs Correct
Andrew T. Walter, Ankit Kumar and Panagiotis Manolios
133
ACL2 Proofs of Nonlinear Inequalities with Imandra
Grant Passmore
151
Verification of a Rust Implementation of Knuth's Dancing Links using ACL2
David S. Hardin
161

Preface

This volume contains the proceedings of the Eighteenth International Workshop on the ACL2 Theorem Prover and Its Applications (ACL2-2023), a two-day workshop held at the University of Texas at Austin and online, on November 13-14. ACL2 workshops occur at approximately 18-month intervals, and they provide a major technical forum for users of the ACL2 theorem proving system to present research related to the ACL2 theorem prover and its applications.

ACL2 is an industrial-strength automated reasoning system, the latest in the Boyer-Moore family of theorem provers. The 2005 ACM Software System Award was awarded to Boyer, Kaufmann, and Moore for their work in ACL2 and the other theorem provers in the Boyer-Moore family.

The proceedings of ACL2-2023 include ten long papers and three extended abstracts. Each submission received three reviews. The workshop also included several "rump session" talks — short unpublished presentations that discussed ongoing research — as well as two invited talks from Jim Grundy of Amazon Web Services and Eric Smith of Kestrel Institute.

As program co-chairs we are grateful for the other leaders of this workshop: the organizing chairs, Matt Kaufmann and Mayank Manjrekar; the arrangements chair, Rob Sumners; and the registration chair, David Rager. We appreciate the great community of researchers who continue to develop exciting projects and tools using ACL2, as well as contributing papers to the workshop. We also wish to thank the program committee for helping the editorial process proceed very smoothly.

This workshop would not have been possible without the support of a large number of people. We thank those who authored, submitted, and presented papers. We also wish to thank the Program Committee for their diligence in reviewing the papers in a timely manner and for further discussions after the reviews. We are very grateful to the invited speakers for agreeing to provide their unique and extensive perspectives to the workshop participants.

Alessandro Coglio and Sol Swords, program chairs
November 2023

ACL2 2023 Program Committee:


Classical LU Decomposition in ACL2

Carl Kwan (University of Texas at Austin)

Abstract

We present an ACL2 formalization and verification of a classical LU decomposition algorithm. LU decomposition plays a crucial role in numerical computations, linear algebra algorithms, and applications that involve matrix operations or solving systems of linear equations. However, theorem proving with matrices can be challenging because typical implementations of matrix algorithms often involve the heavy use of indexing, among other issues. Our approach to formalizing LU decomposition in ACL2 adopts a methodical and constructive derivation process that is more amenable to formal verification. We also provide an ACL2 implementation for Gaussian elimination as one of several interesting consequences of our formalization.

The broad application of LU decomposition to linear algebra, and thus science as a whole, motivates our formalization in ACL2. We are interested in building reliable numerical linear algebra software for use in the design of critical systems. To our knowledge, this is the first formalization of an LU decomposition algorithm.

Linear algebra algorithms are used widely in scientific computing, data analysis, policy and decision making, the design of reliable infrastructure, etc. Their broad applications to critical systems make linear algebra methods a prime target for formal verification. However, computational linear algebra algorithms have seen few theorem prover formalizations. On one hand, these algorithms are often expressed in terms of operations on the indexed elements of a matrix. The use of indexing can obscure the design goals of the matrix algorithm, reduce human readability, and make reasoning for theorem provers difficult, among other drawbacks. On the other hand, many theorem provers tend to have limited support for the efficient computation of operations involving numeric values. This limits the utility of formalizing particularly computational linear algebra methods in such theorem provers.

Targeting the need for reliable but efficient linear algebra computation, we present a formalization of an LU decomposition algorithm and verify it in ACL2. LU decomposition, also known as LU factorization, is a fundamental linear algebra algorithm that decomposes a given matrix into upper and lower triangular factors. It is a crucial step in methods that aim to efficiently solve systems of linear equations, invert matrices, and compute matrix determinants.

As part of our formalization, we also verify the derivation of the LU decomposition algorithm. The algorithm we verify is sometimes known as "classical" or "right-looking", and many mathematical texts introduce it as a sequence of Gauss transformation multiplications [1, 2, 8, 9]. Since we are using ACL2, we need to translate into recursion and induction. One typical derivation partitions a given matrix into a form that suggests recursing along the principal submatrices. However, algorithms following this derivation tend to remain heavily reliant on loops or indexing to express matrix operations. We take the more ACL2 natural approach.

An LU decomposition of a matrix $A$ are lower and upper triangular matrices $L$ and $U$, respectively, such that $A=LU$. We also make the requirement that $L$ is unit lower triangular, i.e. $L$ has $1$s on the diagonal. In the interest of memory optimization, this extra requirement makes it possible to overwrite the upper part of $A$ with the upper part of $U$ and the strictly lower part of $A$ with the strictly lower part of $L$ during the algorithm. Partition $A=LU$ as follows:

$ \begin{equation} \left( \begin{array}{c|c} \alpha_{11} & a_{12}^T\\ \hline a_{21} & A_{22} \end{array} \right) = A = LU = \left( \begin{array}{c|c} 1 & \\ \hline \ell_{21} & L_{22} \end{array} \right) \left( \begin{array}{c|c} \upsilon_{11} & u_{12}^T \\ \hline & U_{22} \end{array} \right)\ . \label{eq:lu} \end{equation} $ (1)

Regarding notation: lower-case Greek letters are field scalars; lower-case Latin letters are vectors; upper-case Latin letters are matrices; and assume that any posed variables are "conformal", e.g. if $A$ is $m\times n$, then $a_{21}$ is $(m-1)\times 1$ and $a_{12}^T$ is $1\times (n-1)$. Moving forward, we will drop the "bars" in partitions for simplicity. Instead of looping through the rows or columns of $A$, the partition suggests we recurse along the principal submatrices. We want (1) to hold after performing the algorithm, i.e.

$\begin{align*} \alpha_{11}=\upsilon_{11}\ , && a_{21} = \upsilon_{11}\ell_{21}\ , && a_{12}^T = u_{12}^T\ , && A_{22} = \ell_{21}u_{21}^T + L_{22} U_{22}\ . \end{align*} $

Since $A$ is given, $u_{12}^T$ and $\upsilon_{11}$ are obvious. Solving for the remaining components of $L$ and $U$ forces

$ \ell_{21} = a_{21} \alpha_{11}^{-1}\ , \label{eq:base-case-1} $ (2) $ L_{22}U_{22} = A_{22} - a_{21}\alpha_{11}^{-1}a_{12}^T\ . $ (3)

This suggests an algorithm which requires merely updating $a_{21}$ and $A_{22}$. Indeed, consider Algorithm 1. Since the algorithm is recursive, we need to handle a few base cases. If the matrix is empty, then we just return an empty matrix. In the case that the matrix is a column vector, then Equation (2) indicates we should return $a_{21} \alpha_{11}^{-1}$. Similarly, if we are given a row vector, then no updates are necessary. Otherwise, we can update the components of $A$ suggested by Equations (2) and (3) and proceed with the recursion.

The ACL2 implementation of Algorithm 1 is Program 1. Descriptions of the ACL2 linear algebra functions used in this paper are in Table 1. We use existing ACL2 primitive matrix operations and functions in our formalization [3]. However, we also define both new and alternative operations that are necessary in order to implement and verify the LU decomposition algorithm. In cases where we define alternative operations, we prove they are equivalent to those already in ACL2. We defer the discussion of most of these new functions to the future, but those that appear in this paper are out-*, get-U, and get-L. It is also important to know that get-L fills the diagonal with $1$s.

Algorithm 1 Classical LU Decomposition

procedure lu $(A\in\mathbb R^{m\times n})$
    partition $A= \begin{pmatrix} \alpha_{11} & a_{12}^T\\ a_{21} & A_{22} \end{pmatrix} $
    if $m=0$ or $n=0$ then
        return $\begin{pmatrix} \end{pmatrix}$
    else if $n=1$ then
        return $\begin{pmatrix}\alpha_{11}\\ a_{21}\alpha_{11}^{-1}\end{pmatrix}$
    else if $m=1$ then
         return $A$
    else
         $a_{21} := a_{21}\alpha_{11}^{-1}$
         $A_{22} := A_{22} - a_{21}a_{12}^T$
         return $\begin{pmatrix}\alpha_{11} & a_{12}^T\\ a_{21} & \textrm{LU}(A_{22})\end{pmatrix}$
Program 1 ACL2 implementation of Algorithm 1
(define lu ((A matrixp)) ...
 (b* (((unless (matrixp A)) (m-empty))
      ((if (m-emptyp A)) A)
      (alph (car (col-car A)))
      ((if (zerop alph))
       (mzero (row-count A)
              (col-count A)))
      ((if (m-emptyp (col-cdr A)))
       (row-cons (list alph)
                 (sm* (/ alph)
                      (row-cdr A))))
      ((if (m-emptyp (row-cdr A))) A)
      (a21 (col-car (row-cdr A)))
      (a12 (row-car (col-cdr A)))
      (A22 (col-cdr (row-cdr A)))
      (a21 (sv* (/ alph) a21))
      (A22 (m+ A22 (sm* -1 (out-* a21 a12)))))
     (row-cons (row-car A)
               (col-cons a21 (lu A22)))))

With the exception of some extra edge cases, Program 1 directly implements Algorithm 1. In order for the decomposition to succeed, we require only that $\alpha_{11}\neq0$ and $L_{22}U_{22} = A_{22} - a_{21}\alpha_{11}^{-1}a_{12}^T$. The ACL2 theorem for this requirement is Program 2. This requirement is interesting because, given that $\alpha_{11}$ is nonsingular, it reduces the condition for a matrix to be LU decomposable into a condition about a smaller matrix, which is reminiscent of some "induction step". Indeed, the RHS of Equation (3) is the Schur complement of $\alpha_{11}$ in $A$ and is a key tool in statistics, probability, numerical analysis, and matrix analysis. While mathematical references commonly describe the conditions for the algorithm to succeed in terms of the leading principal submatrices, the proof that these conditions are sufficient reduces to an induction step that depends on Equation (3) [8].

In order to avoid writing an ACL2 statement about all the leading principal submatrices of a matrix, we recurse along the Schur complements, i.e. ensure $\alpha_{11}$ is nonzero and recurse on $S := A_{22} - a_{21}\alpha_{11}^{-1}a_{12}^T$. This recognizes matrices with nonsingular leading principal submatrices. Indeed, consider Program 3. The function nonsingular-leading-principal-submatrices-p only returns nil on a matrix if it encounters an $\alpha_{11}$ that is zero. One way to think about this condition is that if no zeros appear after $k$ recursive steps, then the $k$-th leading principal submatrix is nonsingular because its determinant is nonzero. Instead of reasoning with determinants, which are rarely useful in numerical algorithms [9], Schur complements provide a more concise ACL2 condition for success and presents opportunites for further rich ACL2 explorations in numerical linear algebra.

To prove the correctness of the classical LU decomposition for a square matrix, we want to assume that the matrix has nonsingular leading principal submatrices and induct along their Shur complements. In ACL2, these are both satisfied by assuming nonsingular-leading-principal-submatrices-p from Program 3 in the hypothesis. Indeed, as seen in Program 4, including such a hypothesis enables ACL2 to prove the desired theorem without any user-provided hints. ACL2 automatically inducts according to a scheme suggested by nonsingular-leading-principal-submatrices-p, the induction step for which enables the use of lu-correctness-induction-step from Program 2.

Table 1 ACL2 linear algebra functions
FunctionDescription
matrixp Recognizer for matrices
m-emptyp Recognizer for empty matrices
m-empty Returns an empty matrix
mzero Returns a zero matrix
row-car Returns first row of a matrix
col-car Returns first column of a matrix
row-cdr Remove a matrix's first row
col-cdr Remove a matrix's first column
row-cons Append a row to a matrix
col-cons Append a column to a matrix
m+ Matrix addition
m* Matrix multiplication
sm* Scalar-matrix multiplication
sv* Scalar-vector multiplication
out-* Outer product of vectors
get-L Get a matrix's lower triangular part
get-U Get a matrix's upper triangular part
Program 2 ACL2 LU "induction step" correctness
(defthm lu-correctness-induction-step
 (b* ((alph (car (col-car A)))
      (LU   (lu A))
      (L    (get-L LU))
      (U    (get-U LU))
      (a21  (col-car (row-cdr A)))
      (a12  (row-car (col-cdr A)))
      (A22  (col-cdr (row-cdr A)))
      (a21  (sv* (/ alph) a21))
      (A22  (m+ A22 (sm* -1 (out-* a21 a12)))))
     (implies (and (matrixp A)
                   (not (m-emptyp (row-cdr A)))
                   (equal (col-count A) 
                          (row-count A))
                   (not (zerop alph))
                   (equal (m* (get-L (lu A22)) 
                              (get-U (lu A22))) 
                          A22))
              (equal (m* L U) A))) ...)
      

Program 3 ACL2 recognizer for matrices with nonsingular leading principal submatrices
(define nonsingular-leading-principal-submatrices-p ((A matrixp))
 :measure (and (row-count A) (col-count A))
 (b* (((unless (matrixp A)) nil)
      ((if (m-emptyp A)) t)
      (alph (car (col-car A)))
      ((if (zerop alph)) nil)
      ((if (or (m-emptyp (row-cdr A))
               (m-emptyp (col-cdr A))))
       t)
      ;; Compute S = A22 - out-*(a21/alph,a12)
      (a21   (col-car (row-cdr A)))
      (a12   (row-car (col-cdr A)))
      (A22   (col-cdr (row-cdr A)))
      (a21/a (sv* (/ alph) a21))
      (S     (m+ A22 (sm* -1 (out-* a21/a a12)))))
     (nonsingular-leading-principal-submatrices-p S)) ...)

Program 4 ACL2 LU correctness
(defthm lu-correctness
 (b* ((LU (lu A))
      (L  (get-L LU))
      (U  (get-U LU)))
     (implies (and (equal (col-count A) (row-count A))
                   (nonsingular-leading-principal-submatrices-p A))
              (equal (m* L U) A))))

There are many applications to our formalization. One such application is that we can now calculate the determinant of an LU decomposable matrix $A$ since $\det(A)=\det(L)\det(U)=\det(U)$ is simply the product of the diagonal of $U$. Another consequence of Algorithm 1 is that the computed $U$ is the row echelon form of $A$. In particular, this means that (defun ge (A) (get-U (lu A))) is the ACL2 formalization of Gaussian elimination, which is one step away from an ACL2 formalization of Gauss-Jordan elimination. Gauss-Jordan can be used to find matrix inverses or even solve systems of linear equations such as $Ax=b$.

In practice, it tends to be better to use LU decomposition over forming an inverse. A linear system $Ax=b$ can be solved by forming $L$ and $U$, and then solving $Ly=b$ and $Ux = y$ via forwards and backwards substitution, respectively. This is faster than using Gauss-Jordan to form an inverse [8]. We have formalized backwards and forwards substitution in ACL2, the discussion for which we defer to the future; with LU decomposition, they form an ACL2 method for solving systems of linear equations.

Previously, we formalized the Cauchy-Schwarz inequality and other vector-related ideas from linear algebra and convex optimization [6, 5, 4, 7]. Here we show that our formalization of LU decomposition in ACL2 leads to a rich selection of research directions. We look forward to formalizing more methods for solving systems of linear equations. In addition, there are other decompositions (QR, rank-reducing, spectral, etc.), optimization methods (least-squares, gradient descent, etc.), and matrix analysis ideas (norms, tensors, etc.) to explore. These directions all serve as important methods in pure and applied mathematics, science, and engineering, and we expect to formalize verified and efficient computational linear algebra methods in the future.

References

  1. Robert van de Geijn & Margaret Myers (2023): Advanced Linear Algebra: Foundations to Frontiers. Available at https://www.cs.utexas.edu/users/flame/laff/alaff/frontmatter.html.
  2. Gene H. Golub & Charles F. Van Loan (2013): Matrix Computations - 4th Edition. Johns Hopkins University Press, Philadelphia, PA, doi:10.56021/9781421407944.
  3. Joe Hendrix (2003): Matrices in ACL2. Available at https://www.cs.utexas.edu/users/moore/acl2/workshop-2003/contrib/hendrix/hendrix.pdf.
  4. Carl Kwan (2019): Automated reasoning in first-order real vector spaces. University of British Columbia, doi:10.14288/1.0380600. Available at https://open.library.ubc.ca/collections/ubctheses/24/items/1.0380600.
  5. Carl Kwan & Mark R. Greenstreet (2018): Convex Functions in ACL2(r). In: Shilpi Goel & Matt Kaufmann: Proceedings of the 15th International Workshop on the ACL2 Theorem Prover and Its Applications, Austin, Texas, USA, November 5-6, 2018, Electronic Proceedings in Theoretical Computer Science 280. Open Publishing Association, pp. 128–142, doi:10.4204/EPTCS.280.10.
  6. Carl Kwan & Mark R. Greenstreet (2018): Real Vector Spaces and the Cauchy-Schwarz Inequality in ACL2(r). In: Shilpi Goel & Matt Kaufmann: Proceedings of the 15th International Workshop on the ACL2 Theorem Prover and Its Applications, Austin, Texas, USA, November 5-6, 2018, Electronic Proceedings in Theoretical Computer Science 280. Open Publishing Association, pp. 111–127, doi:10.4204/EPTCS.280.9.
  7. Carl Kwan, Yan Peng & Mark R. Greenstreet (2020): Cauchy-Schwarz in ACL2(r) Abstract Vector Spaces. In: Grant Passmore & Ruben Gamboa: Proceedings of the Sixteenth International Workshop on the ACL2 Theorem Prover and its Applications, Worldwide, Planet Earth, May 28-29, 2020, Electronic Proceedings in Theoretical Computer Science 327. Open Publishing Association, pp. 90–92, doi:10.4204/EPTCS.327.8.
  8. G. W. Stewart (1998): Matrix Algorithms. Society for Industrial and Applied Mathematics, doi:10.1137/1.9781611971408.
  9. Lloyd N. Trefethen & David Bau, III (1997): Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA, doi:10.1137/1.9780898719574.

CHERI Concentrate in ACL2

Carl Kwan (University of Texas at Austin)
Yutong Xin (University of Texas at Austin)
William D. Young (University of Texas at Austin)

Abstract

We present an ACL2 formalization of CHERI Concentrate, a practical compression format for capabilities. Capabilities are unforgeable tokens of authority that grant and describe access to a region of memory; CHERI Concentrate enables the efficient storage and encoding / decoding of capabilities in real-life reliable hardware. As part of our formalization, we verify the correctness of the encoding and decoding functions, which specify the bounds in memory a program is permitted to access. These encode / decode functions are complicated and their correctness is not obvious to a human reader. This along with the use of capabilities in purportedly secure hardware is why formal verification is important. To our knowledge, this is the first executable formalization of CHERI Concentrate in ACL2.

We present a formalization of CHERI-style capabilities in ACL2. CHERI (Capability Hardware Enhanced RISC Instructions) is a set of architectural features that extends conventional hardware Instruction-Set Architectures (ISAs) with capabilities that enable fine-grained memory protection and highly scalable software compartmentalization [7]. Capabilities are hardware descriptions for permissions that can be used to access data, code, and objects in a controlled and protected manner. Existing ISAs extended with CHERI-style capabilities include 64-bit MIPS, 32-bit RISC-V, 64-bit RISC-V, and 64-bit Armv8-A [6]. More recently, Arm has shipped a CHERI-enabled Morello prototype processor, SoC, and board [1,2]. The work we present here is part of an ongoing effort to model CHERI capabilities in ACL2 for y86 and, eventually, x86 ISAs.

In CHERI systems, any accesses to memory are managed by capabilities. In particular, a capability will store in memory the upper and lower bounds of a region that a program is permitted to access. The capabilities that interest us are 128-bit data structures that can be stored in memory or registers. However, in a 64-bit system, this means that the bounds, which are two 64-bit addresses, are encoded in a 128-bit capability; this is in addition to other information that a capability must describe (e.g. permissions). Storing all pertinent information uncompressed can require 256 bits, which enacts a heavy toll on memory and quickly saturates a system's cache and data-paths [6].

CHERI Concentrate is a format that compresses capabilities to 128 bits on 64-bit architectures and reduces L2 cache misses by 50% to 70% compared to 256-bit capabilities [8]. This format compresses the bounds to 27 bits compared to 128 bits uncompressed. However, such a reduction in space requires a complicated encoding algorithm—sufficiently complicated that it is not clear from their definitions whether the encode / decode functions perform the desired tasks. In fact, initial iterations of the encoding algorithm were erroneous and required the use of HOL4, a higher-order logic proof assistant, to guide the development of the compression algorithm. The complexity of CHERI Concentrate warrants the use of formal methods, but HOL4 has limited support for execution.

In addition to formalizing CHERI capabilities, we verify the correctness of the CHERI Concentrate encoding and decoding algorithms in ACL2. By performing our formalization in ACL2, we can take advantage of ACL2's capacity for reasoning and execution efficiency. Moreover, given the size and complexity of the CHERI ISA in general, future CHERI models in ACL2 would rely on our formalization of CHERI Concentrate every time a capability is called. Indeed, we are currently building an ACL2 model of the y86-64 ISA with CHERI-style capabilities with the intent of verifying machine code programs involving capabilities. Since our y86-64 model symbolically simulates the execution of machine code, our formalization of CHERI Concentrate needs to be executable in order to be useful. Using the ACL2 y86-64 model extended with capabilities in CHERI Concentrate format, we can verify machine code programs involving capabilities, though this is beyond the scope of this paper and we intend on sharing these results at a later date. To our knowledge, this is the first executable formalization of CHERI Concentrate in ACL2.

Space limitations allow only a brief description here of 128-bit CHERI capabilities and the CHERI Concentrate format. For details, an interested reader may refer to the CHERI ISA [6]. One method of motivating the introduction of CHERI capabilities is to consider them as fat pointers, i.e. a pointer extended with additional metadata such as bounds and permissions that may restrict the pointer's behaviour. In principle, a pointer contains an address that can be dereferenced in order to access a region of memory indicated by the address. However, in the interest of memory protection, the metadata associated with a fat pointer may store permissions to restrict how memory may be accessed and bounds to restrict the region of memory that may be accessed. CHERI design principles further restrict how capabilities may be constructed and updated, requiring additional metadata not typically found in traditional fat pointers.

We make a distinction between architectural capabilities and capabilities in memory. Architectural capabilities contain software-accessible fields that are not reflected explicitly by, but can still be inferred from, a given capability's in-memory representation. In our model, architectural capabilities contain the following fields: 1-bit validity tag, 16-bit permissions (perms), 18-bit object type (otype), 64-bit offset, 64-bit base, and 64-bit length. The base is the lower bound of the memory region dereferenceable by a capability. Adding length to base gives the upper bound (a.k.a. top) of the memory region dereferenceable by a capability. Adding offset to base gives the address when the capability is used as a pointer.

System constraints necessitate compressing the information provided by an architectural capability into the format described by CHERI Concentrate, which is then stored in memory. As part of our formalization, we develop and verify ACL2 functions that allow a user to easily convert between architectural and memory-resident capabilities. The proofs of these conversion functions amount to verifying that converting from architectural capabilities to memory-resident capabilities and then back to architectural capabilities recovers the information that was originally in the architectural capabilities, and vice-versa.

A visual representation of memory-resident capabilities in CHERI Concentrate format can be seen in Figure 1. In the diagram, $a$ is a 64-bit address, $p$ represents 16 bits for the permissions, $otype$ represents 18 bits for the object type, and 27 bits are used to encode bounds relative to the address. The compression uses a floating-point representation to encode the bounds. Regarding notation for the bounds: $B$ and $T$ are 14-bit values that form the base and top of a capability's bounds relative to its address (the top two bits of $T$ are omited during encoding but deduced from $B$ during decoding); $I_E$ is an internal exponent bit that, when set, indicates the lower three bits of $B$ and $T$ ($B_E$ and $T_E$, respectively) represents an exponent at the expense of three bits of precision; $E$ is a 6-bit exponent that determines the position at which $B$ and $T$ are inserted into $a$.

The bulk of our verification efforts is devoted to proving the encoding and decoding functions correct. As part of the compression process, CHERI Concentrate specifies the algorithms by which the bounds are encoded into $B$ and $T$, and how $B$ and $T$ can be decoded. To illustrate the complexity of these algorithms, consider Listing 1, which is the ACL2 implementation of the decoding algorithm decode-compression. We omit the similarly complicated function encode-compression for brevity.

Figure 1: CHERI Concentrate format

$p$'16 otype'18 $I_E$ $T[11:3]$ $T_\textrm{E}$'3 $B[13:3]$ $B_\text{E}$'3
$a$'64

Listing 1: Decoding function in ACL2

(define decode-compression ((c compressionp) (addr (nbp addr *xlen*))) ...
  (b* ((IE (nb-ash (+ *mw* *tw*) 1 c))
       ((mv E TS BL Lcarry Lmsb)
        (if (= IE 0)
            (vars-small-seg c)
            (vars-large-seg c)))
       (BLH (+ (nb-ash *tw* (- *mw* *tw*) BL) Lcarry Lmsb))
       (TLH (nb-ash 0 (- *mw* *tw*) BLH)) 
       (TL (logior (ash TLH *tw*) TS))
       ((mv CT CB) (corrections addr TL BL E))
       (AT (ash addr (- (+ *mw* E))))
       (top (nb-ash 0 (1+ *xlen*) (ash (logior (ash (+ AT CT) *mw*) TL) E)))
       (base (nb-ash 0 *xlen* (ash (logior (ash (+ AT CB) *mw*) BL) E)))
       (top (if (and (< E 51)
                     (< 1 (- (nb-ash 63 2 top) (nb-ash 63 1 base))))
                (nb-ash 0 *xlen* top)
                top)))
      (bounds top base)))

The function encode-compression compresses a base $b_0$ and length $\ell_0$, that together form a top $t_0=b_0+\ell_0$, into a format reflected by the portion of Figure 1 that pertains to $I_E$, $T$, and $B$. The function decode-compression reconstructs bounds relative to an address from the compression to obtain a base $b_1$ and top $t_1$ that may have been subjected to some rounding. Again, the interested reader can find motivating details in the CHERI ISA and the original CHERI Concentrate paper [6,8]. The properties of these functions that we have verified are:

The last two properties state the conditions under which decoding a compression will recover the exact bounds that were originally encoded, which is important because compression may result in loss of precision. These last two properties contain stronger conclusions than the other properties and were not proven in the original CHERI Concentrate paper.

Our approach to proving these properties makes heavy use of the symbolic simulation framework GL with case-splitting [5,3]. While discharging these proofs by introducing traditional ACL2 rewrite rules would of course be feasible, we find that it takes ACL2 over 90 minutes to verify the returns specifier of a capability constructor without user-provided hints. This suggests that verification of the encode / decode properties would be better suited to more automatic methods. Moreover, since these are essentially bittwiddling proofs, bitblasting would be an amenable approach.

Listing 2: Theorem of encode / decode property 1 in ACL2

(def-gl-param-thm decode-encode-b-bound-len>2^12
    :hyp   (and (valid-addr-p addr base len)
                (valid-b-l-p base len)
                (<= (expt 2 *tw*) len))
    :concl (<= (bounds->base (decode-compression (encode-compression len base) addr))
               base)
    :param-bindings
    `((((low 12) (high 16)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 16) (high 20)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 20) (high 24)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 24) (high 28)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 28) (high 32)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 32) (high 36)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 36) (high 40)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 40) (high 44)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 44) (high 48)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 48) (high 52)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 52) (high 56)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 56) (high 60)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 60) (high 64)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))
      (((low 64) (high 65)) ,(gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65)))))
    :param-hyp (and (<= (expt 2 low) len) (< len (expt 2 high)))
    :cov-bindings (gl::auto-bindings (:mix (:nat base 65) (:nat len 65) (:nat addr 65))))

GL supports model checking with both BDDs as ACL2 symbolic objects and external SAT solvers. We use BDDs as the back-end engine for GL. While case splits were necessary for some of the larger proofs, we found that GL BDDs were sufficient for our verification needs. As such, we did not utilize any external reasoning engine nor FGL, the successor to GL [4]. Moreover, GL's BDD proof procedure is verified in ACL2, alleviating any extra soundness concerns that may arise from calling external tools.

The ACL2 proofs of the encode / decode properties 5-6 are straightforward applications of GL's default symbolic simulation event def-gl-thm. In contrast, properties 1-4 all required parameterized case splitting. Listing 2 is the ACL2 theorem of the encode / decode property 1. As indicated by the :param-bindings and :param-hyp, we split the theorem into 4-bit "intervals", effectively performing individual symbolic simulations for when the length $\ell_0$ satisfies $2^{12}\leq \ell_0<2^{16}$ and $2^{16}\leq \ell_0<2^{20}$ and so on. Case splits for properties 2-4 are the same. We found that case splitting into smaller bit "intervals" would discharge the proofs faster at the expense of verbose user-provided hints. Case splitting being an effective approach suggests that the complexity of encoding and decoding does not scale with the length of the memory regions. Indeed, this corroborates with the design principles of CHERI Concentrate: "encoding efficiency, minimize delay of pointer arithmetic, and eliminate additional load-to-use delay" [8]. On the other hand, properties 5-6 required no case splitting largely because the encode / decode logic is greatly simplified when $\ell_0<2^{12}$, and alignment requirements reduce the number of variable bits.

Arm is deploying CHERI-enabled devices to industry security specialists at companies such as Google and Microsoft; they describe CHERI hardware capabilities as "a fundamentally more secure building block for software" [2]. It is critical that functions controlling memory accesses in these secure systems are correct; we have demonstrated that ACL2 is a excellent tool for modelling CHERI capabilities and verifying CHERI functions. While other formal methods have been part of the toolbox in developing CHERI-extended ARM, RISC, and MIPS ISAs, the application of CHERI to the Intel x86-64 ISA remains a sketch and has not yet been implemented. This presents an opportunity, which we intend to pursue, to perform ACL2 formal modelling and verification in the CHERI-x86-64 space.

Acknowledgements. This work was supported in part by SRI International and the Defense Advanced Research Projects Agency of the United States Department of Defense.

References

  1. Saar Amar (2022): An Armful of CHERIs. Available at https://msrc.microsoft.com/blog/2022/01/an_armful_of_cheris/.
  2. Richard Grisenthwaite (2022): Morello research program hits major milestone with hardware now available for testing. Available at https://www.arm.com/company/news/2022/01/morello-research-program-hits-major-milestone-with-hardware-now-available-for-testing.
  3. Sol Swords (2017): Term-Level Reasoning in Support of Bit-blasting. In: Anna Slobodova & Warren Hunt, Jr.: Proceedings 14th International Workshop on the ACL2 Theorem Prover and its Applications, Austin, Texas, USA, May 22-23, 2017, Electronic Proceedings in Theoretical Computer Science 249. Open Publishing Association, pp. 95–111, doi:10.4204/EPTCS.249.7.
  4. Sol Swords (2020): New Rewriter Features in FGL. In: Grant Passmore & Ruben Gamboa: Proceedings of the Sixteenth International Workshop on the ACL2 Theorem Prover and its Applications, Worldwide, Planet Earth, May 28-29, 2020, Electronic Proceedings in Theoretical Computer Science 327. Open Publishing Association, pp. 32–46, doi:10.4204/EPTCS.327.3.
  5. Sol Swords & Jared Davis (2011): Bit-Blasting ACL2 Theorems. In: David Hardin & Julien Schmaltz: Proceedings 10th International Workshop on the ACL2 Theorem Prover and its Applications, Austin, Texas, USA, November 3-4, 2011, Electronic Proceedings in Theoretical Computer Science 70. Open Publishing Association, pp. 84–102, doi:10.4204/EPTCS.70.7.
  6. Robert N. M. Watson, Peter G. Neumann, Jonathan Woodruff, Michael Roe, Hesham Almatary, Jonathan Anderson, John Baldwin, Graeme Barnes, David Chisnall, Jessica Clarke, Brooks Davis, Lee Eisen, Nathaniel Wesley Filardo, Richard Grisenthwaite, Alexandre Joannou, Ben Laurie, A. Theodore Markettos, Simon W. Moore, Steven J. Murdoch, Kyndylan Nienhuis, Robert Norton, Alexander Richardson, Peter Rugg, Peter Sewell, Stacey Son & Hongyan Xia (2020): Capability Hardware Enhanced RISC Instructions: CHERI Instruction-Set Architecture (Version 8). Technical Report 951. University of Cambridge Computer Laboratory. Available at https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-951.pdf.
  7. Robert N.M. Watson, Simon W. Moore, Peter Sewell & Peter G. Neumann (2019): An Introduction to CHERI. Technical Report 941. University of Cambridge Computer Laboratory. Available at https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-941.pdf.
  8. Jonathan Woodruff, Alexandre Joannou, Hongyan Xia, Anthony Fox, Robert Norton, David Chisnall, Brooks Davis, Khilan Gudka, Nathaniel Filard, A. Theodore Makettos, Michael Roe, Peter G. Neumann, Robert N.M. Watson & Simon W. Moore (2019): CHERI Concentrate: Practical Compressed Capabilities. IEEE Transactions on Computers 68(10), pp. 1455–1469, doi:10.1109/TC.2019.2914037.

A Bound-Finding Tool for Arithmetic Terms

Sol Swords (Intel Corp.)

Abstract

We describe a tool for establishing and proving upper and lower bounds for ACL2 arithmetic terms. The core algorithm used in this tool computes bounds of arithmetic terms using abstract interpretation. While this core algorithm is not new, we combine it with phased simplification using ACL2's rewriter, case splitting, and use of existing rules and user hints to provide bounds for subterms. Users may first expand and simplify a term as needed, split into cases if necessary, then find bounds for the simplified term using hypotheses, linear rules, type reasoning, and the core abstract interpretation alorithm. Finally, the same steps are retraced in order to prove the theorem that these bounds hold of the original term.

1. Introduction

Verification of arithmetic algorithms sometimes involves proving upper and lower bounds for intermediate values, and in many cases it isn't clear what those bounds need to be. Generally we'd prefer to prove the narrowest bounds possible in order to have the best chance of succeeding in the rest of the proof. For example, an iterative approximation algorithm for square root might converge as long as each partial remainder is smaller than some threshold, but the verifier (and, indeed, the designer) may not know exactly what that threshold is. With some study, it may be possible to determine the bound that is needed to allow the rest of the proof to go through. But it is often easier and more pragmatic to prove the best bound you can with some limited level of effort and move on to see whether this bound is sufficient for the rest of the proof.

To support this pragmatic approach, we provide a tool def-bounds that quickly determines and proves upper and lower bounds for an ACL2 arithmetic term. The tool allows rewriting the input term and optionally splitting into cases, then running a core algorithm that attempts to find bounds for the resulting term by a form of abstract interpretation very similar to that of Moore's Tau bounders and associated Ainni tool [4]. The core bounding algorithm is verified so that it can be used within an ACL2 metafunction. Initially, the def-bounds tool first emulates the given sequence of rewriting and case splitting steps that the prover will ultimately perform, then applies the bounding algorithm to find the bounds on the resulting rewritten term. It then proves these bounds by emitting a defthm form with a series of hints replicating the same rewriting and case splitting steps, followed by the application of this metafunction to complete the proof.

It might be logical to ask why we don't use ACL2's built-in nonlinear arithmetic procedure [1] to find bounds. This idea is appealing because this nonlinear arithmetic procedure is quite powerful, but unfortunately it has no built-in ability to find a bound that it can prove. We could search for such a bound by trial and error, but this is complicated by the fact that this nonlinear procedure can be quite slow when several multiplications are involved. We have encountered problems that nonlinear arithmetic fails to solve after 20 minutes on a modern CPU, even when our less-complete bounding algorithm can prove them. On the other hand, def-bounds simply produces the best bounds that it can prove with a linear pass over the input term; when needed, greater accuracy can sometimes be attained by case-splitting to consider sub-ranges of values for certain subterms.

2. Example

A simple example of the operation of def-bounds follows. We define a function foo and try to derive bounds for it given bounds for its input x:

(include-book "centaur/misc/def-bounds" :dir :system)

(defund foo (x)
  (- (* x x) (* 3 x)))

(def-bounds foo-bounds
  (foo x)
  :hyp (and (rationalp x)
            (<= 2 x)
            (<= x 4))
  :simp-hints ((:in-theory (enable foo))))

The above event first runs the bound-finding routine and derives bounds $[-8, 10]$ for (foo x), then emits a defthm event to prove them, producing a theorem called foo-bounds. These bounds aren't very precise (the actual range is $[-2, 4]$), because the two subterms (* x x) and (- (* 3 x)) are considered independently and their respective bounds are $[4, 16]$ and $[-12, -6]$. Somewhat better bounds can be obtained by instead bounding the factorization (* x (- x 3)), as follows:

(defthmd my-factor
  (equal (+ (- (* 3 x)) (* x x))
         (* x (- x 3))))

;; this rule would reverse the application of my-factor
(local (in-theory (disable distributivity)))

(def-bounds foo-better-bounds
  (foo x)
  :hyp (and (rationalp x)
            (<= 2 x)
            (<= x 4))
  :simp-hints ((:in-theory (enable foo))
               (:in-theory (enable my-factor))))

This now instead has two simplifcation phases, first allowing foo to open and subsequently to apply my-factor. Here the bounds for the two factors are $[2, 4]$ and $[-1, 1]$, so the computed bounds for the term are $[-4, 4]$. A further option for narrowing the computed range is discussed in Section 4.

3. Core Algorithm

The core algorithm used by def-bounds is very similar to Ainni [4] in that it recurs through the arithmetic operators of a term, gets the bounds for the subterms, if possible, and calls a specialized routine for each arithmetic operator to determine the bounds of the result given the bounds of the operands. A minor but useful refinement is that we add special treatment for square forms: we provide an automatic lower bound of 0 for terms matching (* x x), and additionally treat terms matching (* x x y) as (* (* x x) y) so that we get this benefit for the squared portion. On the other hand, we omit, for now, Ainni's support for bitwise operators and mod; we also treat all bounds as weak bounds rather than distinguishing between less than and less than or equal. Another difference is in the methods that we use to determine bounds in the base case where the term is not an arithmetic operator. Ainni is aimed at the particular problem of simplifying reads of writes in a byte-addressed memory, so it only supports calls of a function R (read bytes from a memory address) as its base case, both by looking for type-alist entries (assumptions) that establish known bounds for such terms and by using basic facts about the R function (i.e., a read of $n$ bytes is between 0 and $2^{8n}-1$). Our core algorithm uses linear rules, typeset reasoning, and user suggestions (checked by backchaining) to find candidate bounds for all subterms. We briefly describe each of these sources of bounds.

First, we consider user-provided suggestions. The user interface for this allows suggestions of the following forms:

(< a 3)                     ;; upper-bound the term A by 3
(>= (foo b) 5)              ;; lower-bound the term (FOO B) by 5
(:free (c) (<= (bar c) 10)) ;; upper-bound any call of BAR by 10

These are processed into a triple consisting of an LHS pattern (which may have free and/or fixed variables), an RHS term, and a direction, which together have the meaning ``for subterms matching LHS, try to prove them (less/greater) than RHS.'' In this case ``matching'' means that a term unifies with LHS. The RHS may then contain some or all of the variables of LHS, which are then replaced with their matches in the term. This can be used as a bound as long as the result of this substitution is a ground term evaluating to a rational number and we can verify its correctness. We use the mfc-relieve-hyp utility to check the validity of the suggested bound; this tries to prove this inequality by backchaining in ACL2's rewriter, as though it appeared in the hypothesis of a rewrite rule. The result of mfc-relieve-hyp can be trusted in the context of the the algorithm's proof of correctness due to ACL2's meta-extract feature [3][2].

Second, we call ACL2's typeset reasoning function mfc-ts to obtain a typeset for the term. This typeset is an integer in which each bit position signifies a certain type of object; an unset (0) bit in a given position signifies that the term cannot be of the corresponding type. Among these basic types corresponding to bits in the typeset are negative integer, 0, 1, integer greater than 1, negative ratio, and positive ratio. If the only possible types for the term are among these, then we can sometimes determine bounds on that basis; if e.g., the only bits set in our typeset are for 0 and negative-ratio, then this gives us an upper bound of 0. Again, the meta-extract feature allows us to assume that the typeset returned by mfc-ts is correct so that this determination can be trusted.

Third, we consider enabled linear arithmetic theorems where either side of the inequality matches our term. For each such theorem, we check that the other side of the inequality is a ground term that reduces to a rational number after applying the unifying substitution, that this number is a better bound than the current best one found, and that the hypotheses of the theorem can be relieved by backchaining, i.e. mfc-relieve-hyp. Since meta-extract allows us to assume the correctness of existing linear lemmas and of mfc-relieve-hyp, this is sufficient to allow this bound to be trusted once found.

Fourth, if the term is an application of any of the supported arithmetic operators, then we try to get bounds in the Ainni style, by recursively bounding the arguments and applying that operator's bounding routine to those inputs. The least upper bound and greatest lower bound produced by all four methods is then returned.

4. Preprocessing

While the core bound-finding algorithm described above is powerful, by itself it has a couple of deficiencies that the def-bounds tool addresses with extra features.

First, the bounding algorithm isn't very convenient to use by itself. A very basic issue is that while it will prove bounds of an arithmetic term, often we want to state a theorem about a function that expands to an arithmetic term instead of the term itself. Another issue is that a given arithmetic term might not be in the best form for finding bounds. As a simple example, suppose we want bounds for $ac + bc$ and while we have bounds for $a$, $b$, and $c$, we have better bounds for $d = a + b$ than our bounding algorithm can determine from the bounds for $a$ and $b$ separately. Then it is better to rewrite this expression to $dc$ before applying the bound-finding algorithm. The def-bounds tool supports this sort of need by allowing the user to provide a series of ACL2 hint objects, each giving instructions for a phased simplification step. For example, a def-bounds form might include the following argument, describing two simplification phases in sequence in which first my-function is expanded and then $ac+bc$ is rewritten to $dc$:

  :simp-hints
  ((:expand ((my-function x y)))
   (:in-theory (enable rewrite-ac-plus-bc-to-dc)))

A second deficiency is due to the bounding algorithm's separate consideration of correlated subterms. The example in Section 2 shows how this may make the computed bounds imprecise; in that case the subterms (* x x) and (- (* 3 x)) were considered separately with their ranges simply summed, producing bounds $[-8, 10]$ where the actual range is $[-2, 4]$. We saw that expressing the same function in a different form---in this case factored form rather than distributed---produced much better results, though still not perfectly precise. Another way to address this problem is to make the bounds more accurate by separately considering various ranges for the subexpressions. For example, if we split the range of $x$ into two segments $[2, 3]$ and $[3, 4]$, we get much better results on the original (distributed, not factored) formulation. For $x \in [2, 3]$, the algorithm concludes that the bounds for the two subexpressions $x^2$ and $3x$ are $[4, 9]$ and $[6, 9]$, giving $[-5, 3]$ for the full expression; and for $x \in [3, 4]$, the bounds for the two subexpressions are $[9, 16]$ and $[9, 12]$, giving $[-3, 7]$. Taking the union of those two ranges, this gives bounds $[-5, 7]$---still somewhat imprecise but greatly improved over the result without the case split. On the factored form, we get $[-3, 0]$ as the range when $x$ is in $[2, 3]$ and $[0, 4]$ when $x$ is in $[3, 4]$, so the bounds computed for the full domain are $[-3, 4]$; again somewhat more precise than the bounds computed for the factored form without a casesplit.

Given the speed of the core algorithm, it can be practical to split into many such cases if doing so increases the accuracy of the result. For example, the above problem may be split into 128 subranges with the def-bounds event still completing in a few seconds. For the non-factored form this results in computed bounds $[-131/64, 259/64]$, which is only $65/64$ the size of the actual range, and for the factored form the computed bounds are $[-129/64, 4]$, even slightly better. This can be done simply by adding the following argument to the def-bounds form:

  :cases ((:ranges-from-to-by x 2 4 1/64))

5. Conclusion

The def-bounds tool is in active use at Intel, especially in a successful verification of an RTL implementation of floating point square root. It is available under an MIT license from the public ACL2 books repository and its user-level documentation is available at the ACL2 documentation webpage [5].

References

  1. Warren A. Hunt, Robert Bellarmine Krug & J. Moore (2003): Linear and Nonlinear Arithmetic in ACL2. In: Daniel Geist & Enrico Tronci: Correct Hardware Design and Verification Methods. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 319–333, doi:10.1007/978-3-540-39724-3_29.
  2. Matt Kaufmann (Accessed: 2023): ACL2 documentation: Meta-extract. Available at https://www.cs.utexas.edu/users/moore/acl2/manuals/current/manual/?topic=ACL2____META-EXTRACT.
  3. Matt Kaufmann & Sol Swords (2017): Meta-extract: Using Existing Facts in Meta-reasoning. Electronic Proceedings in Theoretical Computer Science 249, pp. 47-60, doi:10.4204/eptcs.249.4.
  4. J Strother Moore (2017): Computing Verified Machine Address Bounds During Symbolic Exploration of Code. In: Mike Hinchey, Jonathan P. Bowen & Ernst-Rüdiger Olderog: Provably Correct Systems. Springer International Publishing, Cham, pp. 151–172, doi:10.1007/978-3-319-48628-4_7.
  5. Sol Swords (Accessed: 2023): ACL2 documentation: Def-bounds. Available at https://www.cs.utexas.edu/users/moore/acl2/manuals/current/manual/?topic=ACL2____DEF-BOUNDS.