Maurer Computers for Pipelined Instruction Processing†

J.A. BERGSTRA$^{1,2}$ and C.A. MIDDELBURG$^{1,‡}$

1 Programming Research Group, University of Amsterdam, P.O. Box 41883, 1009 DB Amsterdam, the Netherlands; email: J.A.Bergstra@uva.nl, C.A.Middelburg@uva.nl.
2 Department of Philosophy, Utrecht University, P.O. Box 80126, 3508 TC Utrecht, the Netherlands.

Received 13 June 2006; Revised 27 June 2007

We model micro-architectures with non-pipelined instruction processing and pipelined instruction processing, using Maurer machines, basic thread algebra and program algebra. We show that stored programs are executed as intended with these micro-architectures. We believe that this work provides a new mathematical approach to model micro-architectures and to verify their correctness and anticipated speed-up results.

1. Introduction

Pipelined instruction processing is a basic technique used in the design of micro-architectures (see e.g. Hennessy and Patterson (2003) or Sima (2004)). In this paper, we investigate the issue of dealing with pipelined instruction processing when modelling micro-architectures in a mathematically precise way. We model micro-architectures with non-pipelined instruction processing and pipelined instruction processing, using Maurer machines, basic thread algebra and program algebra. Moreover, we show that stored programs are executed as intended with these micro-architectures.

Maurer machines are based on a model for computers proposed in Maurer (1966). Maurer’s model for computers is quite different from the well-known models such as register machines, multi-stack machines and Turing machines (see e.g. Hopcroft et al. (2001)). The strength of Maurer’s model is that it is close to real computers. The operations that can be performed on the state of a computer play a prominent part in the model. Basic thread algebra is a form of process algebra which is introduced in Bergstra and Loots (2002) under the name basic polarized process algebra. It is a form of process algebra which is tailored to the description of the behaviour of deterministic sequential programs.

† The work presented in this paper has been carried out as part of the GLANCE-project MICROGRIDS, which is funded by the Netherlands Organisation for Scientific Research (NWO).
‡ The work presented in this paper has been partly carried out while the second author was also at Eindhoven University of Technology, Department of Mathematics and Computer Science.
under execution. The behaviours concerned are called threads. Basic thread algebra is used in this paper to direct a Maurer machine in performing operations on its state. Program algebra is introduced in Bergstra and Loots (2002) as well. In program algebra, there is considered, not the behaviour of deterministic sequential programs under execution, but rather the programs themselves. A program is viewed as an instruction sequence. The behaviour of a program is taken for a thread of the kind considered in basic thread algebra. With regard to execution of stored programs on a Maurer machine, we take the line that the programs concerned are programs of the kind considered in program algebra.

To make it possible for threads to direct a Maurer machine in performing operations on its state, basic thread algebra must be extended, for each Maurer machine, with an operator for applying a thread to the Maurer machine from one of its states. Applying a thread to a Maurer machine amounts to generating a sequence of state changes according to the operations that the Maurer machine associates with the basic actions performed by the thread. Because a program is viewed as an instruction sequence in the setting of program algebra, the representation of programs in the memory of a Maurer machine becomes trivial.

Why did we choose to use Maurer machines, basic thread algebra and program algebra to model micro-architectures? First of all, well-known models for computers, such as register machines, multi-stack machines and Turing machines, are too general for our purpose. Unlike Maurer’s model for computers, those models have little in common with real computers. For example, a real computer has a memory, and the contents of all memory elements make up the state of the computer. Moreover, a real computer processes instructions, and the processing of an instruction results in changes of the contents of certain memory elements. The design of micro-architectures must deal with these aspects of real computers. Secondly, general process algebras, such as ACP (Bergstra and Klop, 1984; Baeten and Weijland, 1990), CCS (Milner, 1980, 1989), and CSP (Brookes et al., 1984; Hoare, 1985), are too general for our purpose as well. Basic thread algebra has been designed as an algebra of deterministic sequential processes that interact with a machine. In Bergstra and Middelburg (2006b), we show that the processes considered in basic thread algebra can be viewed as processes that are definable over an extension of ACP with conditions introduced in Bergstra and Middelburg (2006a). However, it is quite awkward to describe and analyze processes of this kind using such a general process algebra. Thirdly, there are two reasons to use program algebra: (1) the view that programs are instruction sequences fits in well with real computers, and (2) program behaviours are taken for threads as considered in basic thread algebra.

In Bergstra and Middelburg (2007a), we have demonstrated the feasibility of the micro-architecture modelling approach taken in this paper. In this paper, we make use of the experience gained in that feasibility study to model more advanced micro-architectures. As mentioned above, Maurer’s model for computers is quite different from Turing’s model. The latter model is part of the foundations of theoretical computer science, whereas the model used in our approach to model micro-architectures is relatively unknown indeed. For that reason, we have investigated the connections between the two models in Bergstra and Middelburg (2007b).
We treat the instruction set architecture for which micro-architectures are modelled as a parameter that must fulfil a simple assumption: each instruction from the instruction set must be of a kind considered in program algebra. For example, program algebra considers test instructions and unconditional jump instructions, but it does not consider conditional jump instructions. Besides, program algebra considers forward jump instructions, but it does not consider backward jump instructions. The effect of a conditional jump instruction can be mimicked by a test instruction and an unconditional jump instruction; and the effect of a backward jump instruction can be mimicked by a forward jump instruction because programs may be infinite instruction sequences in program algebra.

In pipelined instruction processing, conditional jump instructions need a treatment different from that of unconditional jump instructions. Backward jump instructions do not need different treatment from that of forward jump instructions in pipelined instruction processing. In order to demonstrate the generality of our approach, we look also in this paper at the influence of extending program algebra with conditional jump instructions on non-pipelined and pipelined instruction processing. We also pay some attention to backward jump instructions.

We do not make explicit the instruction set architecture for which micro-architectures are modelled. In our modelling of a micro-architecture, we start from an arbitrary Maurer machine and enhance it. That Maurer machine determines the instruction set architecture for which a micro-architecture is modelled. However, there are Maurer machines for which the enhancement is primarily intended. We describe in this paper those Maurer machines as well. They are called strict load/store Maurer instruction set architectures.

We regard the work presented in this paper as one of the preparatory steps in developing, as part of a project investigating micro-threading (Bolychevsky et al., 1996; Jesshope and Luo, 2000), a formal approach to design new micro-architectures. That approach should allow for the correctness of new micro-architectures and their anticipated speed-up results to be verified. The work presented in this paper, as well as the preceding work presented in Bergstra and Middelburg (2007a), has convinced us that a special notation for the description of micro-architectures is desirable. However, we found that fixing an appropriate notation still requires some significant design decisions. We come back to this issue in Section 13.

The structure of this paper is as follows. First, we review Maurer computers (Section 2) and basic thread algebra (Section 3). Next, we extend basic thread algebra, for each Maurer machine, with the operator for applying a thread to the Maurer machine from one of its states (Section 4). Following this, we review program algebra (Section 5) and describe the way in which programs are represented in the memory of Maurer machines (Section 6). Then, we model a micro-architecture with non-pipelined instruction processing (Section 7). After that, we model a variant of that micro-architecture with pipelined instruction processing (Sections 8 and 9). Following this, we look at the influence of the addition of conditional jump instructions (Section 10) and briefly discuss the addition of backward jump instructions (Section 11). Then, we describe strict load/store Maurer instruction set architectures (Section 12). Finally, we make some concluding remarks (Section 13).
2. Maurer Computers

In this section, we briefly review Maurer computers, i.e. computers as defined in Maurer (1966).

A Maurer computer $C$ consists of the following components:

— a non-empty set $M$;
— a set $B$ with $\text{card}(B) \geq 2$;
— a set $S$ of functions $S : M \rightarrow B$;
— a set $O$ of functions $O : S \rightarrow S$;

and satisfies the following conditions:

— if $S_1, S_2 \in S$, $M' \subseteq M$ and $S_3 : M \rightarrow B$ is such that $S_3(x) = S_1(x)$ if $x \in M'$ and $S_3(x) = S_2(x)$ if $x \notin M'$, then $S_3 \in S$;
— if $S_1, S_2 \in S$, then the set $\{ x \in M | S_1(x) \neq S_2(x) \}$ is finite.

$M$ is called the memory, $B$ is called the base set, the members of $S$ are called the states, and the members of $O$ are called the operations. It is obvious that the first condition is satisfied if $C$ is complete, i.e. if $S$ is the set of all functions $S : M \rightarrow B$, and that the second condition is satisfied if $C$ is finite, i.e. if $M$ and $B$ are finite sets.

In Maurer (1966), operations are called instructions. In the current paper, the term operation is used because of the confusion that would otherwise arise with the instructions of which program algebra programs are made up.

The memory of a Maurer computer consists of memory elements which have as content an element from the base set of the Maurer computer. The contents of all memory elements together make up a state of the Maurer computer. The operations of the Maurer computer transform states in certain ways and thus change the contents of certain memory elements. Thus, a Maurer computer has much in common with a real computer. The first condition on the states of a Maurer computer is a structural condition and the second one is a finite variability condition. We return to these conditions, which are met by any real computer, after the introduction of the input region and output region of an operation.

Let $(M, B, S, O)$ be a Maurer computer, and let $S \rightarrow S$. Then the input region of $O$, written $\text{IR}(O)$, and the output region of $O$, written $\text{OR}(O)$, are the subsets of $M$ defined as follows:

$$\text{IR}(O) = \{ x \in M | \exists S_1, S_2 \in S \cdot (\forall z \in M \setminus \{ x \} \cdot S_1(z) = S_2(z) \land \exists y \in \text{OR}(O) \cdot O(S_1)(y) \neq O(S_2)(y)) \},$$

$$\text{OR}(O) = \{ x \in M | \exists S \in S \cdot S(x) \neq O(S)(x) \}.$$

$\text{OR}(O)$ is the set of all memory elements that are possibly affected by $O$; and $\text{IR}(O)$ is the set of all memory elements that possibly affect elements of $\text{OR}(O)$ under $O$.

Let $(M, B, S, O)$ be a Maurer computer, let $S_1, S_2 \in S$, and let $O \in O$. Then $S_1 \upharpoonright \text{IR}(O)$
IR(O) = S_2 \upharpoonright IR(O) implies O(S_1) \upharpoonright OR(O) = O(S_2) \upharpoonright OR(O) \uparrow\downarrow In other words, every operation transforms states that coincide on the input region of the operation to states that coincide on the output region of the operation. The second condition on the states of a Maurer computer is necessary for this fundamental property to hold. The first condition on the states of a Maurer computer could be relaxed somewhat.

Let (M, B, S, O) be a Maurer computer, let O \in O, let M' \subseteq OR(O), and let M'' \subseteq IR(O). Then the region affecting M' under O, written RA(M', O), and the region affected by M'' under O, written AR(M'', O), are the subsets of M defined as follows:

RA(M', O) = \{ x \in IR(O) \mid \exists S_1, S_2 \in S \cdot (\forall z \in IR(O) \setminus M' \cdot S_1(z) = S_2(z) \land O(S_1)(x) \neq O(S_2)(x)) \},

AR(M'', O) = \{ x \in OR(O) \mid \exists S_1, S_2 \in S \cdot (\forall z \in IR(O) \setminus M'' \cdot S_1(z) = S_2(z) \land O(S_1)(x) \neq O(S_2)(x)) \}.

AR(M'', O) is the set of all elements of OR(O) that are possibly affected by the elements of M'' under O; and RA(M', O) is the set of all elements of IR(O) that possibly affect elements of M' under O.

In Maurer (1966), Maurer gives many results about the relation between the input region and output region of operations, the composition of operations, the decomposition of operations and the existence of operations with specified input, output and affected regions. In Bergstra and Middelburg (2007a), we summarize the main results. Recently, a revised and expanded version of Maurer (1966), which includes all the proofs, has appeared in Maurer (2006).

3. Basic Thread Algebra

In this section, we review BTA (Basic Thread Algebra), a form of process algebra which is tailored to the description of the behaviour of deterministic sequential programs under execution. The behaviours concerned are called threads.

In BTA, it is assumed that there is a fixed but arbitrary set of basic actions \( \mathcal{A} \) with \( \tau \not\in \mathcal{A} \). We write \( \mathcal{A}_\tau \) for \( \mathcal{A} \cup \{\tau\} \). BTA has the following constants and operators:

— the deadlock constant \( D \);
— the termination constant \( S \);
— for each \( a \in \mathcal{A}_\tau \), a binary postconditional composition operator \( \cdot \trianglelefteq a \trianglerighteq \cdot \).

We use infix notation for postconditional composition. We introduce action prefixing as an abbreviation: \( a \circ p \), where \( p \) is a term of BTA, abbreviates \( p \trianglelefteq a \trianglerighteq p \).

The intuition is that each basic action performed by a thread is taken as a command to be processed by the execution environment of the thread. The processing of a command may involve a change of state of the execution environment. At completion of the processing of the command, the execution environment produces a reply value. This reply is either \( T \) or \( F \) and is returned to the thread concerned. Let \( p \) and \( q \) be closed terms of

\[ \text{We use the notation } f \upharpoonright D, \text{ where } f \text{ is a function and } D \subseteq \text{dom}(f), \text{ for the function } g \text{ with } \text{dom}(g) = D \text{ such that for all } d \in \text{dom}(g), \ g(d) = f(d). \]
BTA. Then \( p \sqsubseteq a \sqsupseteq q \) will perform action \( a \), and after that proceed as \( p \) if the processing of \( a \) leads to the reply \( T \) (called a positive reply) and proceed as \( q \) if the processing of \( a \) leads to the reply \( F \) (called a negative reply). The action \( \tau \) plays a special role. Its execution will never change any state and always produces a positive reply.

BTA has only one axiom. This axiom is given in Table 1. Using the abbreviation introduced above, axiom T1 can be written as follows:

\[
x \sqsubseteq \tau \sqsupseteq y = x \sqsubseteq \tau \sqsupseteq x
\]

Table 2. Axioms for guarded recursion

<table>
<thead>
<tr>
<th>Equation</th>
<th>Condition</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>( \langle X \mid E \rangle = \langle t_X \mid E \rangle )</td>
<td>( X = t_X \in E )</td>
<td>RDP</td>
</tr>
<tr>
<td>( E \Rightarrow X = \langle X \mid E \rangle )</td>
<td>( X \in V(E) )</td>
<td>RSP</td>
</tr>
</tbody>
</table>

A recursive specification over BTA is a set of equations

\[
E = \{ X = t_X \mid X \in V \}
\]

where \( V \) is a set of variables and each \( t_X \) is a term of BTA that contains only variables from \( V \).

We write \( V(E) \) for the set of all variables that occur on the left-hand side of an equation in \( E \). Let \( t \) be a term of BTA containing a variable \( X \). Then an occurrence of \( X \) in \( t \) is guarded if \( t \) has a subterm of the form \( t' \sqsubseteq a \sqsupseteq t'' \) containing this occurrence of \( X \). A recursive specification \( E \) is guarded if all occurrences of variables in the right-hand sides of its equations are guarded or it can be rewritten to such a recursive specification using the equations of \( E \). We are only interested in models of BTA in which guarded recursive specifications have unique solutions, such as the projective limit model of BTA presented in Bergstra and Bethke (2003). A thread that is the solution of a finite guarded recursive specification over BTA is called a finite-state thread.

We extend BTA with guarded recursion by adding constants for solutions of guarded recursive specifications and axioms concerning these additional constants. For each guarded recursive specification \( E \) and each \( X \in V(E) \), we add a constant standing for the unique solution of \( E \) for \( X \) to the constants of BTA. The constant standing for the unique solution of \( E \) for \( X \) is denoted by \( \langle X \mid E \rangle \). Moreover, we use the following notation. Let \( t \) be a term of BTA and \( E \) be a guarded recursive specification. Then we write \( \langle t \mid E \rangle \) for \( t \) with, for all \( X \in V(E) \), all occurrences of \( X \) in \( t \) replaced by \( \langle X \mid E \rangle \). We add the axioms for guarded recursion given in Table 2 to the axioms of BTA. In this table, \( X \), \( t_X \) and \( E \) stand for an arbitrary variable, an arbitrary term of BTA and an arbitrary guarded recursive specification, respectively. Side conditions are added to restrict the variables, terms and guarded recursive specifications for which \( X \), \( t_X \) and \( E \) stand. The additional axioms for guarded recursion are known as the recursive definition principle (RDP) and the recursive specification principle (RSP). The equations \( \langle X \mid E \rangle = \langle t_X \mid E \rangle \) for a fixed \( E \) express that the constants \( \langle X \mid E \rangle \) make up a solution of \( E \). The conditional equations \( E \Rightarrow X = \langle X \mid E \rangle \) express that this solution is the only one.

We often write \( X \) for \( \langle X \mid E \rangle \) if \( E \) is clear from the context. It should be borne in mind that, in such cases, we use \( X \) as a constant.

The projective limit characterization of process equivalence on threads is based on the
Maurer Computers for Pipelined Instruction Processing

Table 3. Approximation induction principle
\[ \land_{n \geq 0} \pi_n(x) = \pi_n(y) \Rightarrow x = y \quad \text{AIP} \]

Table 4. Axioms for projection operators
\[
\begin{align*}
\pi_0(x) &= D & \text{P0} \\
\pi_{n+1}(S) &= S & \text{P1} \\
\pi_{n+1}(D) &= D & \text{P2} \\
\pi_{n+1}(x \preceq a \succeq y) &= \pi_n(x) \preceq a \succeq \pi_n(y) & \text{P3}
\end{align*}
\]

The notion of a finite approximation of depth \( n \). When for all \( n \) these approximations are identical for two given threads, both threads are considered identical. This is expressed by the infinitary conditional equation AIP (Approximation Induction Principle) given in Table 3. Here, following Bergstra and Bethke (2003), approximation of depth \( n \) is phrased in terms of a unary projection operator \( \pi_n(\cdot) \). The projection operators are defined inductively by means of the axioms given in Table 4. In this table, \( a \) stands for an arbitrary member of \( A_{\text{act}} \). It happens that RSP follows from AIP.

The structural operational semantics of BTA and its extensions with guarded recursion and projection can be found in Bergstra and Middelburg (2005) and Bergstra and Middelburg (2007a).

Henceforth, we write \( T_{\text{finrec}} \) for the set of all closed terms of BTA with guarded recursion in which no constants \( \langle X|E \rangle \) for infinite \( E \) occur. We write \( T_{\text{finrec}}(\mathcal{A}) \), where \( \mathcal{A} \subseteq \mathcal{A} \), for the set of all closed terms from \( T_{\text{finrec}} \) that contain only basic actions from \( \mathcal{A} \).

4. Applying Threads to Maurer Machines

In this section, we introduce Maurer machines and add for each Maurer machine \( H \) a binary apply operator \( \cdot_H \) to BTA.

A Maurer machine is a tuple \( H = (M, B, S, O, A, \mathcal{I}) \), where \( (M, B, S, O) \) is a Maurer computer and:

- \( A \subseteq \mathcal{A} \);
- \( \mathcal{I} : A \rightarrow (O \times M) \) is such that for all \( S \in S \) and \( a \in A \), \( S(p_2([a])) \in \{T, F\} \).

The members of \( A \) are called the basic actions of \( H \), and \( \mathcal{I} \) is called the basic action interpretation function of \( H \). \( A \) and \( \mathcal{I} \) constitute the interface between the Maurer computer and its environment.

The apply operators associated with Maurer machines are related to the apply operators introduced in Bergstra and Ponse (2002). They allow for threads to transform states of the associated Maurer machine by means of its operations. Such state transformations produce either a state of the associated Maurer machine or the undefined state \( \uparrow \).

\[ \text{Let } A_1, \ldots, A_n \text{ be sets. Then the function from } A_1 \times \cdots \times A_n \text{ to } A_i \text{ (} 1 \leq i \leq n \text{) which maps each } (a_1, \ldots, a_n) \in A_1 \times \cdots \times A_n \text{ to } a_i \text{ is usually denoted by } \pi_i. \text{ We write } p_i \text{ instead of } \pi_i \text{ because of the confusion that would otherwise arise with the projection operator introduced in Section 3.} \]
assumed that \( \uparrow \) is not a state of any Maurer machine. We extend function restriction to \( \uparrow \) by stipulating that \( \uparrow \mid M = \uparrow \) for any set \( M \). The first operand of the apply operator \( \cdot \)\( \cdot_{H} \cdot \) associated with Maurer machine \( \mathcal{H} = (M, B, S, \mathcal{O}, A, [, ]) \) must be a term from \( T_{\text{finrec}}(A) \) and its second argument must be a state from \( S \cup \{ \} \).

Let \( \mathcal{H} = (M, B, S, \mathcal{O}, A, [, ]) \) be a Maurer machine, let \( p \in T_{\text{finrec}}(A) \), and let \( S \in S \). Then \( p \mathcal{H} S \) is the state that results if all basic actions performed by thread \( p \) are processed by the Maurer machine \( \mathcal{H} \) from initial state \( S \). Moreover, let \( \langle O_{a}, m_{a} \rangle = \{ a \} \) for all \( a \in A \). Then the processing of a basic action \( a \) by \( \mathcal{H} \) amounts to a state change according to the operation \( O_{a} \). In the resulting state, the reply produced by \( \mathcal{H} \) is contained in memory element \( m_{a} \). If \( p \) is \( S \), then there will be no state change. If \( p \) is \( D \), then the result is \( \uparrow \).

Let \( \mathcal{H} = (M, B, S, \mathcal{O}, A, [, ]) \) be a Maurer machine, and let \( \langle O_{a}, m_{a} \rangle = \{ a \} \) for all \( a \in A \). Then the apply operator \( \cdot \)\( \cdot_{H} \cdot \) is defined by the equations given in Table 5 and the rule given in Table 6. In these tables, \( a \) stands for an arbitrary member of \( A \) and \( S \) stands for an arbitrary member of \( S \).

We introduce some auxiliary notions, which are useful in proofs to come.

Let \( \mathcal{H} = (M, B, S, \mathcal{O}, A, [, ]) \) be a Maurer machine, and let \( \langle O_{a}, m_{a} \rangle = \{ a \} \) for all \( a \in A \). Then the step relation \( \vdash_{H} \subseteq (T_{\text{finrec}}(A) \times S) \times (T_{\text{finrec}}(A) \times S) \) is inductively defined as follows:
- if \( p = \text{tau} \circ p' \), then \( (p, S) \vdash_{H} (p', S) \);
- if \( O_{a}(S)(m_{a}) = \top \) and \( p = p' \preceq a \preceq p'' \), then \( (p, S) \vdash_{H} (p', O_{a}(S)) \);
- if \( O_{a}(S)(m_{a}) = \bot \) and \( p = p' \preceq a \preceq p'' \), then \( (p, S) \vdash_{H} (p'', O_{a}(S)) \).

Let \( \mathcal{H} = (M, B, S, \mathcal{O}, A, [, ]) \) be a Maurer machine. Then a full path in \( \vdash_{H} \) is one of the following:
- a finite path \( \langle (p_{0}, S_{0}), \ldots, (p_{n}, S_{n}) \rangle \) in \( \vdash_{H} \) such that there exists no \( (p_{n+1}, S_{n+1}) \in T_{\text{finrec}}(A) \times S \) with \( (p_{n}, S_{n}) \vdash_{H} (p_{n+1}, S_{n+1}) \);
- an infinite path \( \langle (p_{0}, S_{0}), (p_{1}, S_{1}), \ldots \rangle \) in \( \vdash_{H} \).

Moreover, let \( p \in T_{\text{finrec}}(A) \), and let \( S \in S \). Then the full path of \( (p, S) \) on \( \mathcal{H} \) is the unique

---

**Table 5. Defining equations for apply operator**

<table>
<thead>
<tr>
<th>Equation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>( x \mathcal{H} \top = \top )</td>
<td>( x \mathcal{H} S = S )</td>
</tr>
<tr>
<td>( \text{finrec} \mathcal{H} S = \top )</td>
<td>( (\text{tau} \circ x) \mathcal{H} S = x \mathcal{H} S )</td>
</tr>
<tr>
<td>( (x \preceq a \preceq y) \mathcal{H} S = x \mathcal{H} O_{a}(S) ) if ( O_{a}(S)(m_{a}) = \top )</td>
<td>( (x \preceq a \preceq y) \mathcal{H} S = y \mathcal{H} O_{a}(S) ) if ( O_{a}(S)(m_{a}) = \bot )</td>
</tr>
</tbody>
</table>

**Table 6. Rule for divergence**

\[ \top \mathcal{H} S = \top \Rightarrow x \mathcal{H} S = \top \]
full path in \( \vdash_H \) from \((p, S)\). If \( p \) converges from \( S \) on \( H \), then the full path of \((p, S)\) on \( H \) is called the *computation* of \((p, S)\) on \( H \) and we write \( \| (p, S) \|_H \) for the length of the computation of \((p, S)\) on \( H \).

It is easy to see that \((p_0, S_0) \vdash_H (p_1, S_1)\) only if \( p_0 \bullet_H S_0 = p_1 \bullet_H S_1 \) and that \(((p_0, S_0), \ldots, (p_n, S_n))\) is the computation of \((p_0, S_0)\) on \( H \) only if \( p_n = S \) and \( S_n = p_0 \bullet_H S_0 \). It is also easy to see that, if \( p_0 \) converges from \( S_0 \) on \( H \), \( \| (p_0, S_0) \|_H \) is the least \( n \in \mathbb{N} \) such that \( \pi_n(p_0) \bullet_H S_0 \neq \uparrow \).

In the definition of a Maurer machine, we could have taken a function \([ \_ ]\) that associates with each \( a \in A \) a triple \((n_a, O_a, m_a) \in M \times O \times M\) such that \( S(n_a), S(m_a) \in \{T, F\} \) for all \( S \in S \). In that case, \( S(n_a) \) would indicate whether basic action \( a \) is enabled in state \( S \), i.e. whether the processing of \( a \) is not blocked in state \( S \). In this paper, we consider only threads that are behaviours of deterministic sequential programs under execution. For such behaviours, it is not at all interesting to take into account the possibility that some basic actions are not always enabled. Therefore, it is assumed that all basic actions of a Maurer machine are enabled in all states. Under this assumption, it is sufficient that the function \([ \_ ]\) associates with each \( a \in A \) a pair \((O_a, m_a) \in O \times M\) as in the definition given at the beginning of this section.

5. Program Algebra

In this section, we review PGA (ProGram Algebra), an algebra of sequential programs based on the idea that sequential programs are in essence sequences of instructions. PGA provides a program notation for finite-state threads. A hierarchy of program notations that provide more and more sophisticated programming features are rooted in PGA (see Bergstra and Loots (2002)).

In PGA, it is assumed that there is a fixed but arbitrary set \( \mathfrak{A} \) of *basic instructions*. PGA has the following *primitive instructions*:

- for each \( a \in \mathfrak{A} \), a *void basic instruction* \( a \);
- for each \( a \in \mathfrak{A} \), a *positive test instruction* \(+a\);
- for each \( a \in \mathfrak{A} \), a *negative test instruction* \( -a \);
- for each \( k \in \mathbb{N} \), a *forward jump instruction* \( \#k \);
- a *termination instruction* !.

We write \( \mathcal{I} \) for the set of all primitive instructions.

The intuition is that the execution of a basic instruction \( a \) may modify a state and produces \( T \) or \( F \) at its completion. In the case of a positive test instruction \(+a\), basic instruction \( a \) is executed and execution proceeds with the next primitive instruction if \( T \) is produced and otherwise the next primitive instruction is skipped and execution proceeds with the primitive instruction following the skipped one. In the case where \( T \) is produced and there is not at least one subsequent primitive instruction and in the case where \( F \) is produced and there are not at least two subsequent primitive instructions, deadlock occurs. In the case of a negative test instruction \( -a \), the role of the value produced is reversed. In the case of a void basic instruction \( a \), the value produced is disregarded: execution always proceeds as if \( T \) is produced. The effect of a forward jump instruction
Table 7. Axioms of PGA

<table>
<thead>
<tr>
<th>Axiom</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>((X; Y); Z = X; (Y; Z))</td>
<td>PGA1</td>
</tr>
<tr>
<td>((X^n)^\omega = X^\omega)</td>
<td>PGA2</td>
</tr>
<tr>
<td>(X^\omega; Y = X^\omega)</td>
<td>PGA3</td>
</tr>
<tr>
<td>((X; Y)^\omega = X; (Y; X)^\omega)</td>
<td>PGA4</td>
</tr>
</tbody>
</table>

Table 8. Defining equations for thread extraction operator

<table>
<thead>
<tr>
<th>Equation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>([a] = a \circ D)</td>
<td>[#k = D]</td>
</tr>
<tr>
<td>([a; X] = a \circ [X])</td>
<td>([#0; X] = D)</td>
</tr>
<tr>
<td>([+a] = a \circ D)</td>
<td>([#1; X] =</td>
</tr>
<tr>
<td>([-a; X] =</td>
<td>X</td>
</tr>
<tr>
<td>([-a; X] =</td>
<td>#2; X</td>
</tr>
<tr>
<td>([-a; X] =</td>
<td>#k + 2; u; X</td>
</tr>
</tbody>
</table>

#k is that execution proceeds with the kth next instruction of the program concerned. If k equals 0 or the kth next instruction does not exist, then #k results in deadlock. The effect of the termination instruction ! is that execution terminates.

The thread extraction operator introduced below, together with the apply operators introduced in Section 4, makes it possible to associate operations of Maurer machines with basic instructions, and consequently with primitive instructions of PGA.

PGA has the following constants and operators:

— for each \(u \in \mathcal{J}\), an instruction constant \(u\);
— the binary concatenation operator \(\_ \circ \_\);
— the unary repetition operator \(\_; \omega\).

Closed terms of PGA are considered to denote programs. The intuition is that a program is in essence a non-empty finite or infinite sequence of primitive instructions. These sequences are called single pass instruction sequences because PGA has been designed to enable single pass execution of instruction sequences; each instruction can be dropped after it has been executed. Programs are considered to be equal if they represent the same single pass instruction sequence. The axioms for instruction sequence equivalence are given in Table 7. In this table, \(n\) stands for an arbitrary natural number greater than 0. For each \(n > 0\), the term \(X^n\) is defined by induction on \(n\) as follows: \(X^1 = X\) and \(X^{n+1} = X \cdot X^n\). The unfolding equation \(X^\omega = X \cdot X^\omega\) is derivable. Each closed term of PGA is derivably equal to a term in canonical form, i.e. a term of the form \(P\) or \(P; Q^\omega\), where \(P\) and \(Q\) are closed terms of PGA that do not contain the repetition operator.

Each closed term of PGA is considered to denote a program of which the behaviour is a finite-state thread, taking the set \(\mathcal{A}\) of basic instructions for the set \(\mathcal{A}\) of actions. The thread extraction operator \([\_]\) assigns a thread to each program. The thread extraction operator is defined by the equations given in Table 8 (for \(a \in \mathcal{A}\), \(k \in \mathbb{N}\) and \(u \in \mathcal{J}\)) and the rule given in Table 9. This rule is expressed in terms of the structural congruence predicate \(\_ \cong \_\), which is defined by the formulas given in Table 10 (for \(n, m, k \in \mathbb{N}\) and \(u_1, \ldots, u_n, v_1, \ldots, v_{m+1} \in \mathcal{J}\)).
The equations given in Table 8 do not cover the case where there is a cyclic chain of forward jumps. Programs are structurally congruent if they are the same after removing all chains of forward jumps in favour of direct jumps. Because a cyclic chain of forward jumps corresponds to #0, the rule from Table 9 can be read as follows: if \( X \) starts with a cyclic chain of forward jumps, then \( |X| \) equals \( D \). It is easy to see that the thread extraction operator assigns the same thread to structurally congruent programs. Therefore, the rule from Table 9 can be replaced by the following generalization:

\[
X \cong Y \Rightarrow |X| = |Y|
\]

Let \( E \) be a finite guarded recursive specification over BTA, and let \( P_X \) be a closed term of PGA for each \( X \in V(E) \). Let \( E' \) be the set of equations that results from replacing in \( E \) all occurrences of \( X \) by \( |P_X| \) for each \( X \in V(E) \). If \( E' \) can be obtained by applications of axioms PGA1–PGA4, the defining equations for the thread extraction operator, and the rule for cyclic jump chains, then \( |P_X| \) is the solution of \( E \) for \( X \). Such a finite guarded recursive specification can always be found. Thus, the behaviour of each closed PGA term is a thread that is definable by a finite guarded recursive specification over BTA. Moreover, each finite guarded recursive specification over BTA can be translated to a PGA program of which the behaviour is the solution of the finite guarded recursive specification concerned.

Closed terms of PGA are loosely called PGA programs. PGA programs in which the repetition operator does not occur are called finite PGA programs. Henceforth, we write \( P_{\text{fin}} \) for the set of all finite PGA programs. We write \( P_{\text{fin}}(A) \), where \( A \subseteq A \), for the set of all closed terms from \( P_{\text{fin}} \) that contain only basic instructions from \( A \).

In the remainder of this paper, with the exception of Section 11, we consider only finite PGA programs.

### 6. Stored Programs

In this short section, we make precise how to represent PGA programs in the memory of a Maurer machine.

It is assumed that a fixed but arbitrary finite set \( M_{\text{prog}} \) and a fixed but arbitrary bijection \( m_{\text{prog}} : [0, \text{card}(M_{\text{prog}}) - 1] \rightarrow M_{\text{prog}} \) have been given. \( M_{\text{prog}} \) is called the program
memory. We write \( \text{size}(M_{\text{prog}}) \) for \( \text{card}(M_{\text{prog}}) \). Let \( n, n' \in [0, \text{size}(M_{\text{prog}}) - 1] \) be such that \( n \leq n' \). Then, we write \( M_{\text{prog}}[n] \) for \( M_{\text{prog}}(n) \), and \( M_{\text{prog}}[n, n'] \) for \( \{ M_{\text{prog}}(k) \mid n \leq k \leq n' \} \).

The program memory is a memory of which the elements can be addressed by means of members of \( [0, \text{size}(M_{\text{prog}}) - 1] \). We write \( \mathcal{M}_{\text{prog}} \) for \( [0, \text{size}(M_{\text{prog}}) - 1] \) and \( \mathcal{M}_{\text{prog}}' \) for \( [0, \text{size}(M_{\text{prog}})] \).

The program memory elements are meant to contain the primitive instructions that form part of a finite PGA program.

We write \( \mathcal{I}_{\text{prog}} \) for \( \mathcal{I} \setminus \{ \#k \mid k > \text{size}(M_{\text{prog}}) - 1 \} \). \( \mathcal{I}_{\text{prog}} \) is the program memory base set. We write \( S_{\text{prog}} \) for the set of all functions \( S_{\text{prog}} : M_{\text{prog}} \rightarrow \mathcal{I}_{\text{prog}} \).

Let \( P = u_1 \ldots u_n \in \mathcal{P}_{\text{fin}} \) with \( n \leq \text{size}(M_{\text{prog}}) \). Then the stored representation of \( P \), written \( s_{\text{prog}}(P) \), is the unique function \( s_{\text{prog}} : M_{\text{prog}}[0, n - 1] \rightarrow \mathcal{I}_{\text{prog}} \) such that for all \( i \in [0, n - 1] \), \( s_{\text{prog}}(M_{\text{prog}}[i]) = u_{i+1} \). We call \( s_{\text{prog}}(P) \) a stored program.

Note that \( s_{\text{prog}}(u_1 \ldots u_n) \) is not defined if \( n > \text{size}(M_{\text{prog}}) \). The size of the program memory restricts the programs that can be stored.

7. Non-Pipelined Instruction Processing

In this section, we model a micro-architecture with non-pipelined instruction processing. We do not make the instruction set architecture for which this micro-architecture is modelled explicit. We start from an arbitrary Maurer machine and enhance it. That Maurer machine determines the instruction set architecture for which a micro-architecture is modelled. However, there are Maurer machines for which the enhancement is primarily intended. Those Maurer machines will be introduced in Section 12. Henceforth, we write “PGA instruction” for “primitive instruction of PGA”.

We enhance Maurer machines by extending the memory with a program memory \( (M_{\text{prog}}, \mathcal{P}_{\text{prog}}) \), a program counter upper bound register \( \text{pcbr} \), a program counter \( \text{pc} \), an instruction register \( \text{ir} \), a decoded instruction type register \( \text{ditr} \), a basic action register \( \text{bar} \), a displacement register \( \text{dr} \), an executed instruction type register \( \text{eitr} \), an instruction reply register \( \text{irr} \), a fetch reply register \( \text{rr}_{\text{fetch}} \), a pre-process reply register \( \text{rr}_{\text{pre}} \), an execute reply register \( \text{rr}_{\text{exec}} \) and a post-process reply register \( \text{rr}_{\text{post}} \), and the operation set with a fetch operation \( \text{O}_{\text{fetch}} \), a pre-process operation \( \text{O}_{\text{pre}} \), an execute operation \( \text{O}_{\text{exec}} \) and a post-process operation \( \text{O}_{\text{post}} \). Moreover, we replace the basic actions of the original Maurer machine by basic actions \text{fetch}, \text{pre} and \text{post}, with which the operations \( \text{O}_{\text{fetch}}, \text{O}_{\text{pre}}, \text{O}_{\text{exec}} \) and \( \text{O}_{\text{post}} \) are associated. The resulting Maurer machines are called SP-NPL-enhancements. SP stands for stored program and NPL stands for non-pipelined instruction processing. In SP-NPL-enhancements of Maurer machines, the five instruction types \text{bsc}, \text{ptst}, \text{ntst}, \text{fjmp} and \text{term} are distinguished. These types correspond to the five kinds of PGA instructions introduced in Section 5.

Henceforth, we write \( \mathcal{I} \) for the set \( \{ \text{bsc}, \text{ptst}, \text{ntst}, \text{fjmp}, \text{term} \} \). The memory elements \text{pcbr}, \text{pc}, \text{ir}, \text{ditr}, \text{bar}, \text{dr}, \text{eitr} and \text{irr} are used to communicate information between the execution handling operations \( \text{O}_{\text{fetch}}, \text{O}_{\text{pre}}, \text{O}_{\text{exec}} \) and \( \text{O}_{\text{post}} \). The memory elements \( \text{rr}_{\text{fetch}}, \text{rr}_{\text{pre}}, \text{rr}_{\text{exec}} \) and \( \text{rr}_{\text{post}} \) are the reply registers of the execution handling operations \( \text{O}_{\text{fetch}}, \text{O}_{\text{pre}}, \text{O}_{\text{exec}} \) and \( \text{O}_{\text{post}} \), respectively. It is assumed that \text{pcbr}, \text{pc}, \text{ir}, \text{ditr}, \text{bar}, \text{dr}, \text{eitr}, \text{irr}, \text{rr}_{\text{fetch}}, \text{rr}_{\text{pre}}, \text{rr}_{\text{exec}} \) and \( \text{rr}_{\text{post}} \) are pairwise different memory elements. Henceforth, we
write $M'_{ip}$ for $\{pcbr, pc, ir, ditr, bar, dr, eitr, irr\}$ and $M'_{rr}$ for $\{rr_{fetch}, rr_{prep}, rr_{exec}, rr_{postp}\}$. It is assumed that $M_{prog} \cap (M'_{ip} \cup M'_{rr}) = \emptyset$. Henceforth, we write $B$ for the set $\{T, F\}$. After giving the precise definition of an SP-NPL-enhancement, we will further explain how an SP-NPL-enhancement operates.

Let $H = (M, B, S, O, A, [])$ be a Maurer machine such that $M \cap (M_{prog} \cup M'_{ip} \cup M'_{rr}) = \emptyset$ and fetch, prep, exec, postp $\not\in A$, and let $(O_a, m_a) = [a]$ for all $a \in A$. Then the SP-NPL-enhancement of $H$ is the Maurer machine $H' = (M', B', S', O', A', [])$ such that

\[
\begin{align*}
M' &= M \cup M_{prog} \cup M'_{ip} \cup M'_{rr}, \\
B' &= B \cup MA_{prog} \cup IT_{prog} \cup IT \cup A \cup B, \\
S' &= \{S' : M' \rightarrow B' | \\
&\quad S' \mid M \in S \land S' \mid M_{prog} \in S_{prog} \land S'(pcbr) \in MA_{prog} \land \} \\
&\quad S'(pc) \in MA_{prog} \land S'(ir) \in IT_{prog} \land \\
&\quad S'(ditr) \in IT \land S'(bar) \in A \land S'(dr) \in MA_{prog} \land \\
&\quad S'(eitr) \in IT \land S'(dr) \in B \land \\
&\quad S'(rr_{fetch}) \in B \land S'(rr_{prep}) \in B \land S'(rr_{exec}) \in B \land S'(rr_{postp}) \in B, \\
O' &= \{O' : S' \rightarrow S' | \\
&\quad \exists O \in O \cdot \forall S' \in S' \cdot \} \\
&\quad (O'(S') \mid M = O(S' \mid M) \land O'(S') \mid (M' \setminus M) = S' \mid (M' \setminus M)) \\
&\quad \cup \{O_{fetch}, O_{prep}, O_{exec}, O_{postp}\}, \\
A' &= \{\text{fetch, prep, exec, postp}\}, \\
\|[a]\| &= (O_a, r_{aa}) \text{ for all } a \in A'.
\end{align*}
\]

$O_{fetch}$ is the unique function from $S'$ to $S'$ such that for all $S' \in S'$:

\[
\begin{align*}
O_{fetch}(S') \mid M &= S' \mid M, \\
O_{fetch}(S') \mid M_{prog} &= S' \mid M_{prog}, \\
O_{fetch}(S')(pcbr) &= S'(pcbr), \\
O_{fetch}(S')(pc) &= S'(pc) + 1 \quad \text{if } S'(pc) + 1 \leq S'(pcbr), \\
O_{fetch}(S')(pc) &= S'(pc) \quad \text{if } S'(pc) + 1 > S'(pcbr), \\
O_{fetch}(S')(ir) &= S'(M_{prog}[S'(pc)]) \quad \text{if } S'(pc) \leq S'(pcbr), \\
O_{fetch}(S')(ir) &= \#0 \quad \text{if } S'(pc) > S'(pcbr), \\
O_{fetch}(S') \mid \{\text{ditr, bar, dr}\} &= S' \mid \{\text{ditr, bar, dr}\}, \\
O_{fetch}(S') \mid \{\text{eitr, irr}\} &= S' \mid \{\text{eitr, irr}\}, \\
O_{fetch}(S')(rr_{fetch}) &= T \quad \text{if } S'(pc) \leq S'(pcbr), \\
O_{fetch}(S')(rr_{fetch}) &= F \quad \text{if } S'(pc) > S'(pcbr), \\
O_{fetch}(S') \mid (M'_{rr} \setminus \{rr_{fetch}\}) &= S' \mid (M'_{rr} \setminus \{rr_{fetch}\}).
\end{align*}
\]
$O_{\text{exec}}$ is the unique function from $S'$ to $S'$ such that for all $S' \in S'$:

\[
\begin{align*}
O_{\text{exec}}(S') \upharpoonright M & = O_{\text{exec}}(S') \upharpoonright M_{\text{prog}} \quad \text{if } \neg \text{opc}(S') , \\
O_{\text{exec}}(S') \upharpoonright M & = S' \upharpoonright M \quad \text{if } \text{opc}(S') , \\
O_{\text{exec}}(S') \upharpoonright M_{\text{prog}} & = S' \upharpoonright M_{\text{prog}} , \\
O_{\text{exec}}(S') \upharpoonright \{\text{pc, ir}\} & = S' \upharpoonright \{\text{pc, ir}\} , \\
O_{\text{exec}}(S') \upharpoonright \{\text{ditr, bar, dr}\} & = S' \upharpoonright \{\text{ditr, bar, dr}\} , \\
O_{\text{exec}}(S') \upharpoonright \{\text{eitr}\} & = S' \upharpoonright \{\text{eitr}\} , \\
O_{\text{exec}}(S') \upharpoonright \{\text{rr}_{\text{exec}}\} & = O_{\text{exec}}(S') \upharpoonright \{\text{rr}_{\text{exec}}\} (M_{\text{rr}} \upharpoonright \{\text{rr}_{\text{exec}}\}) \quad \text{if } \text{opc}(S') , \\
O_{\text{exec}}(S') \upharpoonright \{\text{tst}\} & = T \quad \text{if } \neg \text{opc}(S') , \\
O_{\text{exec}}(S') \upharpoonright (M_{\text{rr}} \upharpoontright \{\text{rr}_{\text{exec}}\}) & = S' \upharpoonright (M_{\text{rr}} \upharpoontright \{\text{rr}_{\text{exec}}\}) ,
\end{align*}
\]

where $\text{opc} : S' \to \mathbb{B}$ is defined as follows:

\[
\text{opc}(S') = T \quad \text{iff } S'(\text{ditr}) \in \{\text{bsc, ptst, ntst}\} .
\]
O_{\text{postp}} is the unique function from $S'$ to $S'$ such that for all $S' \in S'$:

$$
\begin{align*}
O_{\text{postp}}(S') &\upharpoonright M = S' \upharpoonright M, \\
O_{\text{postp}}(S') &\upharpoonright M_{\text{prog}} = S' \upharpoonright M_{\text{prog}}, \\
O_{\text{postp}}(S')(\text{pcbr}) &= S'(\text{pcbr}), \\
O_{\text{postp}}(S')(\text{pc}) &= \text{pcu}(S'), \\
O_{\text{postp}}(S')(\text{ir}) &= S'(\text{ir}), \\
O_{\text{postp}}(S') \upharpoonright \{\text{ditr, bar, dr}\} &= S' \upharpoonright \{\text{ditr, bar, dr}\}, \\
O_{\text{postp}}(S') \upharpoonright \{\text{eitr, irr}\} &= S' \upharpoonright \{\text{eitr, irr}\}, \\
O_{\text{postp}}(S')(\text{rr}_{\text{postp}}) &= T \quad \text{if } S'(\text{eitr}) \neq \text{term}, \\
O_{\text{postp}}(S')(\text{rr}_{\text{postp}}) &= F \quad \text{if } S'(\text{eitr}) = \text{term}, \\
O_{\text{postp}}(S') \upharpoonright (M'_{\text{rr}} \setminus \{\text{rr}_{\text{postp}}\}) &= S' \upharpoonright (M'_{\text{rr}} \setminus \{\text{rr}_{\text{postp}}\}).
\end{align*}
$$

where $\text{pcu} : S' \rightarrow M_{\text{prog}}'$ is defined as follows:

$$
\begin{align*}
\text{pcu}(S') &= S'(\text{pc}) & \text{if } S'(\text{eitr}) &= \text{bsc} \lor \\
&= S'(\text{pc}) + 1 & S'(\text{eitr}) &= \text{ptst} \land S'(\text{irr}) = T \lor \\
&= S'(\text{pc}) + 1 & S'(\text{eitr}) &= \text{ntst} \land S'(\text{irr}) = F \lor \\
&= S'(\text{pc}) + 1 & S'(\text{eitr}) &= \text{term}, \\
\text{pcu}(S') &= S'(\text{pc}) - 1 + S'(\text{dr}) & \text{if } S'(\text{eitr}) &= \text{fjmp} \land S'(\text{dr}) \neq 0 \land \\
&= S'(\text{pc}) - 1 + S'(\text{dr}) \leq S'(\text{pcbr}), \\
\text{pcu}(S') &= S'(\text{pcbr}) + 1 & \text{if } S'(\text{eitr}) &= \text{ptst} \land S'(\text{irr}) = F \lor \\
&= S'(\text{pc}) + 1 \leq S'(\text{pcbr}), \\
\text{pcu}(S') &= S'(\text{pc}) + 1 & S'(\text{eitr}) &= \text{ntst} \land S'(\text{irr}) = T \land \\
&= S'(\text{pc}) + 1 > S'(\text{pcbr}) \lor \\
&= S'(\text{eitr}) = \text{fjmp} \land \\
&= (S'(\text{dr}) = 0 \lor \\
&= S'(\text{pc}) - 1 + S'(\text{dr}) > S'(\text{pcbr})).
\end{align*}
$$

Figure 1 shows the structure of an SP-NPL-enhancement. The program counter $\text{pc}$ contains the address of the program memory element from which a PGA instruction is fetched next, unless its content is greater than the highest program address (contained in $\text{pcbr}$). Fetched PGA instructions are stored in $\text{ir}$. The program counter is incremented at every fetch. Pre-processing amounts to decoding the PGA instruction stored in $\text{ir}$: the type of that PGA instruction is stored in $\text{ditr}$, the basic action involved is stored in $\text{eitr}$ and either $\text{bar}$ or $\text{dr}$, from $O_{\text{exec}}$ to $O_{\text{postp}}$ via $\text{eitr}$ and $\text{ir}$. Moreover, each execution
handling operation has its own reply register. All this fits in well with the pipelined variant of SP-NPL-enhancements that will be introduced in Section 8.

Because the memory is extended with only finitely many memory elements, it is easy to check, using Proposition IV in Maurer (1966), that the SP-NPL-enhancement of a Maurer machine is indeed a Maurer machine. The same remark applies to the SP-PL-enhancement of a Maurer machine introduced in Section 8 as well.

Consider the guarded recursive specification over BTA that consists of the following equation:

$$CT = (\text{prep} \circ \text{exec} \circ (CT \triangleleft \text{postp} \triangleright S)) \triangleleft \text{fetch} \triangleright D.$$  

Let $P$ be a finite PGA program. Then applying thread $|P|$ to a state of Maurer machine $H$ has the same effect as applying the execution handling thread $CT$ to the corresponding state of the SP-NPL-enhancement of $H$ in which the program memory contains the stored representation of $P$. This is stated rigorously in the following theorem.

**Theorem 1 (SP-NPL-enhancement).** Let $H' = (M',B',S',O',A',[\_])$ be the SP-NPL-enhancement of $H = (M,B,S,O,A,[\_])$, let $P = u_1; \ldots; u_n \in \mathcal{P}_{\inf}(A)$ be such that $n \leq \text{size}(M_{\text{prog}})$, and let $S'_0 \in S'$ be such that $S'_0 |_{M_{\text{prog}}[0,n-1]} = s_{\text{prog}}(P)$, $S'_0(\text{pcbr}) = n-1$ and $S'_0(\text{pc}) = 0$. Then $|P| \bullet_{H} (S'_0 | M) = (CT \bullet_{H'} S'_0) | M$.  

Proof. Let $(O_a,m_a) = [a]$ for all $a \in A$, and let $(O_a,rr_a) = [a]'$ for all $a \in A'$. Then it is easy to see that for all $S' \in S'$ and $a \in A$ such that $S'(\text{pc}) \leq S'(\text{pcbr})$ and $S'(M_{\text{prog}}[S'(\text{pc})]) \in \{a,+a,-a\}$:

$$O_{\text{postp}}(O_{\text{exec}}(O_{\text{pre}}} \circ \text{exec} \circ (CT \triangleleft \text{postp} \triangleright S))))) \triangleright M = O_a(S' \triangleright M) , \quad (1)$$

$$O_{\text{postp}}(O_{\text{exec}}(O_{\text{pre}}} \circ \text{exec} \circ (CT \triangleleft \text{postp} \triangleright S)))))\triangleright \text{irr} = O_a(S' \triangleright M)(m_a) ; \quad (2)$$

and it is easy to see that for all $S' \in S'$ and $a \in A$ such that $S'(\text{pc}) \leq S'(\text{pcbr})$ and $S'(M_{\text{prog}}[S'(\text{pc})]) \not\in \{a,+a,-a\}$:

$$O_{\text{postp}}(O_{\text{exec}}(O_{\text{pre}}} \circ \text{exec} \circ (CT \triangleleft \text{postp} \triangleright S))))) \triangleright M = S' \triangleright M . \quad (3)$$
Maurer Computers for Pipelined Instruction Processing

Let \( (p_i', S'_i) \) be the \((i+1)st\) element in the full path of \((CT, S'_{0})\) on \(H'\). Then it is easy to prove by induction on \(i\) that

\[
\begin{align*}
    p_{4i+4}' &= CT & \text{if } S'_{4i+1}(\text{rr}_{\text{fetch}}) = T \land S'_{4i+1}(\text{rr}_{\text{post}}) = T, \\
    p_{4i+4}' &= S & \text{if } S'_{4i+1}(\text{rr}_{\text{fetch}}) = T \land S'_{4i+1}(\text{rr}_{\text{post}}) = F, \\
    p_{4i+1}' &= D & \text{if } S'_{4i+1}(\text{rr}_{\text{fetch}}) = F
\end{align*}
\]  

(\text{if } 4i+4 < ||(CT, S'_{0})||_H, \text{ in case } CT \text{ converges from } S'_{0} \text{ on } H'). Let \((p_i, S_i)\) be the \((i+1)st\) element in the full path of \(((P), S'_{0} \upharpoonright M)\) on \(H\), and let \((p_i', S'_i)\) be the \((i+1)st\) element in the full path of \((CT, S'_{0})\) on \(H'\) of which the first component equals \(CT\), \(S\) or \(D\) and the second component, say \(S'\), satisfies \(S'(M_{\text{prog}}[S'(\text{pc})]) \neq \#k\) for all \(k \in M_{\text{prog}}\). Then, using (1), (2), (3) and (4), it is straightforward to prove by induction on \(i\) and case distinction on the structure of finite PGA programs that

\[
p_i = |s_{\text{prog}}(P)(M_{\text{prog}}[S'_{4i}(\text{pc})])| \ldots |s_{\text{prog}}(P)(M_{\text{prog}}[n - 1])|,
\]

\[
S_i = S'_{4i} \upharpoonright M
\]

(\text{if } i < ||(P), S'_{0} \upharpoonright M||_H, \text{ in case } |P| \text{ converges from } S'_{0} \upharpoonright M \text{ on } H). \text{ From this, the theorem follows immediately.}

Henceforth, execution handling threads, like \(CT\), are called \textit{power threads}.

8. Pipelined Instruction Processing

In this section, we model a micro-architecture with pipelined instruction processing which is a variant of the micro-architecture with non-pipelined instruction processing modelled in Section 7. In the latter micro-architecture, PGA instructions are processed after one another, whereas, in the micro-architecture modelled here, four PGA instructions can be simultaneously overlapped in processing. We again start from an arbitrary Maurer machine and enhance it.

We enhance Maurer machines by extending the memory as in the case of SP-NPL-enhancements and additionally with an \textit{instruction skip flag} (isf), a \textit{jump decoded flag} (jdf), a \textit{jump processed flag} (jpf), a \textit{pipeline status register} (plsr) and a \textit{reply register} (rr), and the operation set with a \textit{step operation} (O_{step}), a \textit{pipeline control operation} (O_{plc}), and a \textit{halt operation} (O_{halt}). Moreover, we replace the basic actions of the original Maurer machine by basic actions \textit{step}, \textit{plc} and \textit{halt} with which the extra operations \(O_{\text{step}}\), \(O_{\text{plc}}\), and \(O_{\text{halt}}\) are associated. The resulting Maurer machines are called SP-PL-enhancements. SP again stands for stored program and PL stands for pipelined instruction processing. In SP-PL-enhancements of Maurer machines, the four \textit{pipeline stages} fetchst, prepst, execst and postpst are distinguished. Henceforth, we write \(PS\) for \{\text{fetchst, prepst, execst, postpst}\}. The memory elements isf, jdf, jpf and plsr are used to control the pipelined processing of PGA instructions and to produce a reply in rr at the completion of each step of the pipelined instruction processing. It is assumed that isf, jdf, jpf, plsr and rr are pairwise different memory elements. Henceforth, we write \(M'_{\text{plc}}\) for \{isf, jdf, jpf, plsr, rr\}. It is assumed that \((M_{\text{prog}} \cup M'_{ip} \cup M'_{ct}) \cap M'_{\text{plc}} = \emptyset\). After
giving the precise definition of an SP-PL-enhancement, we will further explain how an
SP-PL-enhancement operates.

Let \( H = (M, B, S, O, A, \lceil \cdot \rceil) \) be a Maurer machine such that \( M \cap (M_{\text{prog}} \cup M'_{\text{ip}} \cup M'_r \cup M'_{\text{plc}}) = \emptyset \) and \( \text{step, plc, halt} \notin A \), and let \( (O_a, m_a) = [a] \) for all \( a \in A \). Then the
SP-PL-enhancement of \( H \) is the Maurer machine \( H' = (M', B', S', O', A', \lceil \cdot \rceil') \) such that

\[
M' = M \cup M_{\text{prog}} \cup M'_r \cup M'_{\text{plc}}, \\
B' = B \cup M_{\text{prog}} \cup I_{\text{prog}} \cup \text{IT} \cup A \cup B \cup \mathcal{P}(PS), \\
S' = \{ S' : M' \rightarrow B' | \\
\begin{align*}
S' \upharpoonright M &\in \mathcal{S} \land S' \upharpoonright M_{\text{prog}} \in \mathcal{S}_{\text{prog}} \land S'(\text{pcbr}) \in \mathcal{MA}_{\text{prog}} \land S'(\text{pc}) \in \mathcal{MA}'_{\text{prog}} \land S'(\text{irr}) \in \mathcal{J}_{\text{prog}} \land S'(\text{bar}) \in A \land S'(\text{dr}) \in \mathcal{MA}_{\text{prog}} \land S'(\text{eitr}) \in \text{IT} \land S'(\text{pos}) \in \mathcal{MA}_{\text{prog}} \land S'(\text{pref}) \in \text{IT} \land S'(\text{if}) \in B \land S'(\text{jjf}) \in B \land S'(\text{psr}) \in \mathcal{P}(PS) \land S'(\text{rr}) \in B \}, \\
O' = \{ O' : S' \rightarrow S' | \\
\exists O \in O \land \forall S' \in S' \land \exists (O'(S') \upharpoonright M = O(S' \upharpoonright M) \land O'(S') \upharpoonright (M' \setminus M) = S' \upharpoonright (M' \setminus M)) \\
\cup \{ O_{\text{step}}, O_{\text{plc}}, O_{\text{halt}} \}, \\
A' = \{ \text{step, plc, halt} \}, \\
\lceil [a] \rceil' = (O_a, \text{rr}) \quad \text{for all } a \in A' .
\]

\( O_{\text{step}} \) is the unique function from \( S' \) to \( S' \) such that for all \( S' \in S' \):

\[
O_{\text{step}}(S') = O'_{\text{fetch}}(O'_{\text{prep}}(O'_{\text{exec}}(O'_{\text{postp}}(S')))) ,
\]

where \( O'_{\text{fetch}}, O'_{\text{prep}}, O'_{\text{exec}} \) and \( O'_{\text{postp}} \) are suboperations defined as follows:

\( O'_{\text{fetch}} \) is the unique function from \( S' \) to \( S' \) such that for all \( S' \in S' \):

\[
O'_{\text{fetch}}(S') = S' \quad \text{if } \text{fetchst} \notin S'(\text{psr}) , \\
O'_{\text{fetch}}(S') \upharpoonright (M' \setminus M'_{\text{plc}}) = O_{\text{fetch}}(S' \upharpoonright (M' \setminus M'_{\text{plc}})) \quad \text{if } \text{fetchst} \in S'(\text{psr}) ,
\]

\( O'_{\text{prep}} \) is the unique function from \( S' \) to \( S' \) such that for all \( S' \in S' \):

\[
O'_{\text{prep}}(S') = S' \quad \text{if } \text{prepst} \notin S'(\text{psr}) , \\
O'_{\text{prep}}(S') \upharpoonright (M' \setminus M'_{\text{plc}}) = O_{\text{prep}}(S' \upharpoonright (M' \setminus M'_{\text{plc}})) \quad \text{if } \text{prepst} \in S'(\text{psr}) , \\
O'_{\text{prep}}(S')(\text{jjf}) = jdc(S') \quad \text{if } \text{prepst} \in S'(\text{psr}) , \\
O'_{\text{prep}}(S') \upharpoonright (M'_{\text{plc}} \setminus \text{jjf}) = S' \upharpoonright (M'_{\text{plc}} \setminus \text{jjf}) \quad \text{if } \text{prepst} \in S'(\text{psr}) ,
\]

where \( jdc : S' \rightarrow B \) is the unique function from \( S' \) to \( B \) such that for all \( S' \in S' \):

\[
jdc(S') = \top \quad \text{iff } O_{\text{prep}}(S' \upharpoonright (M' \setminus M'_{\text{plc}}))(\text{ditr}) \in \{ \text{jmp, term} \} .
\]
$O'_{\text{exec}}$ is the unique function from $S'$ to $S'$ such that for all $S' \in S'$:

$$
O_{\text{exec}}'(S') = S' \quad \text{if } \text{execst} \not\in S'(\text{plsr}),
$$

$$
O_{\text{exec}}'(S') \upharpoonright (M' \setminus M_{\text{plc}}') = O_{\text{exec}}(S' \upharpoonright (M' \setminus M_{\text{plc}})) \quad \text{if } \text{execst} \in S'(\text{plsr}),
$$

$$
O_{\text{exec}}'(S')(\text{isf}) = isc(S') \quad \text{if } \text{execst} \in S'(\text{plsr}),
$$

$$
O_{\text{exec}}'(S') \upharpoonright (M_{\text{plc}}' \setminus \{\text{isf}\}) = S' \upharpoonright (M_{\text{plc}}' \setminus \{\text{isf}\}) \quad \text{if } \text{execst} \in S'(\text{plsr}),
$$

where $isc : S' \rightarrow B$ is the unique function from $S'$ to $B$ such that for all $S' \in S'$:

$$
isc(S') = T \text{ iff } S'(\text{ditr}) = \text{ptst} \land O_{\text{exec}}(S' \upharpoonright (M' \setminus M_{\text{plc}}))(\text{irr}) = F \lor S'(\text{ditr}) = \text{ntst} \land O_{\text{exec}}(S' \upharpoonright (M' \setminus M_{\text{plc}}))(\text{irr}) = T;
$$

$O'_{\text{postp}}$ is the unique function from $S'$ to $S'$ such that for all $S' \in S'$:

$$
O_{\text{postp}}'(S') = S' \quad \text{if } \text{postpst} \not\in S'(\text{plsr}),
$$

$$
O_{\text{postp}}'(S') \upharpoonright (M' \setminus M_{\text{plc}}') = O_{\text{postp}}(S' \upharpoonright (M' \setminus M_{\text{plc}})) \quad \text{if } \text{postpst} \in S'(\text{plsr}),
$$

$$
O_{\text{postp}}'(S')(\text{jpf}) = jpc(S') \quad \text{if } \text{postpst} \in S'(\text{plsr}),
$$

$$
O_{\text{postp}}'(S') \upharpoonright (M_{\text{plc}}' \setminus \{\text{jpf}\}) = S' \upharpoonright (M_{\text{plc}}' \setminus \{\text{jpf}\}) \quad \text{if } \text{postpst} \in S'(\text{plsr}),
$$

where $jpc : S' \rightarrow B$ is the unique function from $S'$ to $B$ such that for all $S' \in S'$:

$$
jpc(S') = T \text{ iff } S'(\text{eitr}) = \text{fjmp},
$$

and $O''_{\text{postp}}$ is defined as $O_{\text{postp}}$ in the case of the SP-NPL-enhancement, except for the replacement of the auxiliary program counter update function $pcu$ by the function $pcu'$ defined as follows:

$$
pcu'(S') = S'(\text{pc}) \quad \text{if } S'(\text{eitr}) \neq \text{fjmp},
$$

$$
pcu'(S') = S'(\text{pc}) + 2 + S'(\text{dr}) \quad \text{if } S'(\text{eitr}) = \text{fjmp} \land S'(\text{dr}) \neq 0 \land S'(\text{pc}) - 2 + S'(\text{dr}) \leq S'(\text{pcbr}),
$$

$$
pcu'(S') = S'(\text{pcbr}) + 1 \quad \text{if } S'(\text{eitr}) = \text{fjmp} \land S'(\text{dr}) = 0 \lor S'(\text{pc}) - 2 + S'(\text{dr}) > S'(\text{pcbr}).
$$

$O_{\text{pler}}$ is the unique function from $S'$ to $S'$ such that for all $S' \in S'$:

$$
O_{\text{pler}}(S') \upharpoonright (M' \setminus M_{\text{plc}}') = S' \upharpoonright (M' \setminus M_{\text{plc}}),
$$

$$
O_{\text{pler}}(S')(\text{jdf}) = F,
$$

$$
O_{\text{pler}}(S')(\text{isf}) = F,
$$

$$
O_{\text{pler}}(S')(\text{jpf}) = F,
$$

$$
O_{\text{pler}}(S')(\text{plsr}) = \text{plsr}(S'),
$$

$$
O_{\text{pler}}(S')(\text{rr}) = \text{ru}(S'),
$$

where $\text{plsr} : S' \rightarrow \mathcal{P}(PS)$ is the unique function from $S'$ to $\mathcal{P}(PS)$ such that for all
Fig. 2. Structure of an SP-PL-enhancement

\[ S' \in S' : \]
\[
\text{fetchst} \in \text{plsu}(S') \iff S'(\text{rrfetch}) = T \land \\
(\text{fetchst} \in S'('plsr) \land S'(\text{jdf}) = F \lor \\
S'(\text{isf}) = T \lor S'(\text{jpf}) = T) ,
\]
\[
\text{prepst} \in \text{plsu}(S') \iff S'(\text{rrfetch}) = T \land \\
(\text{fetchst} \in S'('plsr) \land S'(\text{jdf}) = F \lor \\
S'(\text{isf}) = T) ,
\]
\[
\text{execst} \in \text{plsu}(S') \iff \text{prepst} \in S'('plsr) \land S'(\text{isf}) = F ,
\]
\[
\text{postpst} \in \text{plsu}(S') \iff \text{execst} \in S'('plsr) ,
\]
and \( ru : S' \to B \) is the unique function from \( S' \) to \( B \) such that for all \( S' \in S' \):

\[
ru(S') = T \iff \text{plsu}(S') \neq \emptyset \land S'(\text{rrpostp}) = T .
\]

\( O_{\text{halt}} \) is the unique function from \( S' \) to \( S' \) such that for all \( S' \in S' \):

\[
O_{\text{halt}}(S')(M' \setminus \{\text{rr}\}) = S'(M' \setminus \{\text{rr}\}) ,
\]
\[
O_{\text{halt}}(S')(\text{rr}) = T \quad \text{if} \quad S'(\text{rrpostp}) = F ,
\]
\[
O_{\text{halt}}(S')(\text{rr}) = F \quad \text{if} \quad S'(\text{rrpostp}) = T .
\]

Figure 2 shows the structure of an SP-PL-enhancement. The suboperations \( O'_{\text{fetch}}, \)
\( O'_{\text{prep}} \) and \( O'_{\text{exec}} \) of \( O_{\text{step}} \) either do not affect the memory elements of \( M' \setminus M'_\text{plc} \) or do
affect these memory elements exactly in the way in which the operations \( O_{\text{fetch}}, O_{\text{prep}}, \) and
\( O_{\text{exec}} \) do.
and $O_{\text{exec}}$ of the SP-NPL-enhancement of $H$ would affect them. The suboperation $O'_{\text{postp}}$ of $O_{\text{step}}$ either does not affect the memory elements of $M' \setminus M'_{\text{plc}}$ or does affect these memory elements in a way that is similar to the way in which the operation $O_{\text{postp}}$ of the SP-NPL-enhancement of $H$ would affect them. The difference with $O_{\text{postp}}$ is due to the different way in which skipping of a PGA instruction is accomplished in pipelined instruction processing.

The suboperations $O'_{\text{fetch}}$, $O'_{\text{prep}}$, $O'_{\text{exec}}$ and $O'_{\text{postp}}$ of $O_{\text{step}}$ correspond to the pipeline stages that a PGA instruction being processed passes through successively. When the suboperation corresponding to a stage other than the last one has handled a PGA instruction, the suboperation corresponding to the next stage is enabled to handle that PGA instruction in the next step, subject to the exceptions mentioned below. $O'_{\text{fetch}}$, the suboperation corresponding to the first stage, is always enabled to fetch a PGA instruction in the next step, subject to the exceptions mentioned below. The exceptions are the following:

- when $O'_{\text{prep}}$ has decoded a jump or termination instruction, pipelined instruction processing is stalled beginning with the PGA instruction fetched in the same step;
- when $O'_{\text{exec}}$ has executed either a positive test instruction with a negative reply as result or a negative test instruction with a positive reply as result, the PGA instruction fetched immediately after the test instruction is further discarded and pipelined instruction processing is started again with the next step if the latter instruction is a jump or termination instruction;
- when $O'_{\text{postp}}$ has adjusted the program counter on a jump instruction, the last fetched PGA instruction is discarded and pipelined instruction processing is started again with the next step.

Thus, the suboperations $O'_{\text{fetch}}$, $O'_{\text{prep}}$, $O'_{\text{exec}}$ and $O'_{\text{postp}}$ are not all enabled to handle a PGA instruction in every step of the pipelined instruction processing. The content of the pipeline status register indicates which of the suboperations are enabled. Enabledness is controlled by the pipeline control operation $O_{\text{plctr}}$. This operation is intended to be performed immediately after $O_{\text{step}}$. It takes parts of the output of the suboperations of $O_{\text{step}}$ to fix up the enabledness of these suboperations for the next step.

The idea is that in each step the suboperations $O'_{\text{fetch}}$, $O'_{\text{prep}}$, $O'_{\text{exec}}$ and $O'_{\text{postp}}$ are performed in parallel. To justify the use of the term pipeline here, we have to show that the suboperations can actually be performed in parallel. We come back to this issue in Section 9.

Consider the guarded recursive specification over BTA that consists of the following equation:

$$CT' = \text{step} \circ (CT' \leq O_{\text{plctr}} \geq (S \leq \text{halt} \geq D)).$$

Let $P$ be a finite PGA program. Then applying thread $\langle P \rangle$ to a state of the Maurer machine $H$ has the same effect as applying power thread $CT'$ to the corresponding state of the SP-PL-enhancement of $H$ in which the program memory contains the stored representation of $P$. This is stated rigorously in the following theorem.

**Theorem 2 (SP-PL-enhancement).** Let $H' = (M', B', S', O', A', \llbracket . \rrbracket')$ be the SP-
PL-enhancement of $H = (M, B, S, O, A, [.])$, let $P = u_1; \ldots; u_n \in P_{\text{proj}}(A)$ be such that $n \leq \text{size}(M_\text{proj})$, let $S'_0 \in S'$ be such that $S'_0 \mid M_\text{proj}[0, n-1] = s_\text{proj}(P)$, $S'_0(p_{cb}) = n - 1$, $S'_0(pc) = 0$, $S'_0(\text{rr}_{\text{fetch}}) = T$, $S'_0(\text{df}) = S'_0(\text{if}) = S'_0(\text{jp}) = F$ and $S'_0(\text{psr}) = \{\text{fetchst}\}$. Then $|P| \bullet_H (S'_0 \mid M) = (CT' \bullet_H S'_0) \mid M$.

Proof. We prove that $(CT \bullet_{H'''} (S'_0 \mid (M' \setminus M'_{\text{pl}}))) \mid M = (CT'' \bullet_{H'''} S'_0) \mid M$, where $H''$ is the SP-NPL-enhancement of $H$. From this and Theorem 1, the theorem follows immediately.

We use the following notation in the proof. For each $S' \in S'$ and each $n > 0$, $\text{cycle}^n(S')$ is defined by induction on $n$ as follows: $\text{cycle}^1(S') = O_{\text{plc}}(O_{\text{step}}(S'))$ and $\text{cycle}^{n+1}(S') = O_{\text{plc}}(O_{\text{step}}(\text{cycle}^n(S')))$. For each $S' \in S'$, $\text{tip}(S')$ is defined as follows: $\text{tip}(S') = \{\text{fetchst} \in S'(\text{plc}) \land \text{prefst} \in \text{cycle}^1(S')(\text{psr}) \land \text{postp} \in \text{cycle}^3(S')(\text{psr})\}$. Thus, $\text{tip}(S')$ indicates that some instruction will be totally processed from state $S'$.

Analysis of input and output regions yields three potential sources of interference between the suboperations of $O_{\text{step}}$: $\text{OR}(O'_{\text{postp}}) 
\cap \text{OR}(O'_{\text{fetch}}) = \{\text{pc}\}$, $\text{OR}(O'_{\text{postp}}) 
\cap \text{IR}(O'_{\text{fetch}}) = \{\text{pc}\}$ and $\text{IR}(O'_{\text{postp}}) 
\cap \text{OR}(O'_{\text{fetch}}) = \{\text{pc}\}$. It is easy to see that, by stalling pipelined instruction processing when $O'_{\text{prep}}$ has decoded a jump instruction, interference does not really happen: $O'_{\text{fetch}}$ does not change any memory element if $O'_{\text{postp}}$ has changed $pc$ in the same step, and $O'_{\text{postp}}$ does not change any memory element if $O'_{\text{fetch}}$ has changed $pc$ in the previous step. Because of this, it is not difficult to see that for all $S' \in S'$:

\begin{align}
\text{tip}(S') \Rightarrow \\
\text{cycle}^1(S') \mid M = O_{\text{op}}(O_{\text{exec}}(O_{\text{prep}}(O_{\text{fetch}}(S' \mid (M' \setminus M'_{\text{pl}})))) ) \mid M.
\end{align}

We have that $\text{tip}(S'_0)$ holds. Moreover, $\text{tip}$ is preserved by the total processing of an instruction if there is a next instruction to be processed:

- if $S'(M_{\text{proj}}[S'(pc)]) = a$ and $S'(pc) + 1 \leq S'(pcb)$, then $\text{tip}(S') \Rightarrow \text{tip}(\text{cycle}^1(S'))$;
- if $S'(M_{\text{proj}}[S'(pc)]) \in \{+a, -a\}$, $\text{cycle}^2(S')(\text{isf}) = F$ and $S'(pc) + 1 \leq S'(pcb)$, then $\text{tip}(S') \Rightarrow \text{tip}(\text{cycle}^1(S'))$;
- if $S'(M_{\text{proj}}[S'(pc)]) \in \{+a, -a\}$, $\text{cycle}^3(S')(\text{isf}) = F$ and $S'(pc) + 2 \leq S'(pcb)$, then $\text{tip}(S') \Rightarrow \text{tip}(\text{cycle}^2(S'))$;
- if $S'(M_{\text{proj}}[S'(pc)]) = \#k$ and $S'(pc) + k \leq S'(pcb)$, then $\text{tip}(S') \Rightarrow \text{tip}(\text{cycle}^3(S'))$.

Let $(p_i, S_i)$ be the $(i+1)$st element in the full path of $(CT, S'_0 \mid (M' \setminus M'_{\text{pl}}))$ on $H''$. Then it is easy to prove by induction on $i$ that

\begin{align}
p_{4i+4} = CT & \quad \text{if } S'_{4i+4}(\text{rr}_{\text{fetch}}) = T \land S'_{4i+4}(\text{rr}_{\text{postp}}) = T, \\
p_{4i+4} = S & \quad \text{if } S'_{4i+4}(\text{rr}_{\text{fetch}}) = T \land S'_{4i+4}(\text{rr}_{\text{postp}}) = F, \\
p_{4i+4} = D & \quad \text{if } S'_{4i+4}(\text{rr}_{\text{fetch}}) = F
\end{align}

(4i + 4 < ||(CT, S'_0 \mid (M' \setminus M'_{\text{pl}}))||_H).$
Table 11. Pipelined instruction processing of $a; +b; \#3; c; \#2; d; !$

<table>
<thead>
<tr>
<th></th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
<th>11</th>
<th>12</th>
</tr>
</thead>
<tbody>
<tr>
<td>a</td>
<td>fetch</td>
<td>prep</td>
<td>exec</td>
<td>postp</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>$+b$</td>
<td>fetch</td>
<td>prep</td>
<td>exec</td>
<td>postp</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>$#3$</td>
<td>fetch</td>
<td>prep</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>$c$</td>
<td>fetch</td>
<td>prep</td>
<td>exec</td>
<td>postp</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>$#2$</td>
<td>fetch</td>
<td>prep</td>
<td>exec</td>
<td>postp</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>$d$</td>
<td>fetch</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>$!$</td>
<td>fetch</td>
<td>prep</td>
<td>exec</td>
<td>postp</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

It is easy to prove by induction on $i$ that

$$p_{i+1} = \begin{cases} CT' & \text{if tip}(S_i) \land S_{i+1}(rr\text{fetch}) = T \land S_{i+1}(r\text{fpostp}) = T, \\ S' & \text{if tip}(S_i) \land S_{i+1}(rr\text{fetch}) = T \land S_{i+1}(r\text{fpostp}) = F, \\ D & \text{if tip}(S_i) \land S_{i+1}(rr\text{fetch}) = F, \end{cases}$$

(7)

where $S'$ and $T'$ are cases.

Then, using (5), (6), (7) and the preservation properties of $\text{tip}$, it is straightforward to prove by induction on $i$ and case distinction on the kinds of primitive instructions of PGA that

$$(p_i = CT \iff p_i' = CT') \land (p_i = S \iff p_i' = S) \land (p_i = D \iff p_i' = D),$$

and

$$(S_i \upharpoonright (M' \setminus M'_{\text{plc}})) = (S_i' \upharpoonright (M' \setminus M'_{\text{plc}})).$$

If $i < \lvert (CT', S_0') \rvert_{H'}$, in case $CT'$ converges from $S_0'$ on $H'$. From this, the theorem follows immediately.

Example (Pipelined instruction processing). Table 11 shows the pipelined instruction processing of the PGA program $a; +b; \#3; c; \#2; d; !$. It is assumed that the execution of $+b$ results in a negative reply. We see that the pipelined instruction processing of this PGA program is stalled three times: after the jump instruction $\#3$ has been decoded in step 4, after the jump instruction $\#2$ has been decoded in step 6, and after the termination instruction $!$ has been decoded in step 10. Because the execution of the positive test instruction $+b$ has produced a negative reply in step 4, the next instruction in the pipeline, i.e. the jump instruction $\#3$, is not executed and post-processed in later steps. Pipelined instruction processing is started again from step 5, because there is no longer a jump instruction in the pipeline. The jump instruction $\#2$ passes all four pipeline stages before pipelined instruction processing is started again from step 9. Moreover, because the jump is actually taken, the prematurely fetched instruction $d$ is discarded.
Table 12. Pipelined instruction processing of \( a + b + c + d + e \)

<table>
<thead>
<tr>
<th></th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
</tr>
</thead>
<tbody>
<tr>
<td>( a )</td>
<td>fetch</td>
<td>prep</td>
<td>exec</td>
<td>postp</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>( + b )</td>
<td>fetch</td>
<td>prep</td>
<td>exec</td>
<td>postp</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>( c )</td>
<td>fetch</td>
<td>prep</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>( #3 )</td>
<td>fetch</td>
<td>prep</td>
<td>exec</td>
<td>postp</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>( d )</td>
<td>fetch</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>( e )</td>
<td>fetch</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table 12 shows the pipelined instruction processing of the program \( a + b; c; \#3; d; e \). It is assumed that the execution of \( + b \) results in a negative reply. We see that the pipelined instruction processing of this PGA program is stalled once: after the jump instruction \( \#3 \) has been decoded in step 5. Because the execution of the positive test instruction \( + b \) has produced a negative reply in step 4, the next instruction in the pipeline, i.e. the void basic instruction \( c \), is not executed and post-processed in later steps. The jump instruction \( \#3 \) passes all four pipeline stages before pipelined instruction processing is started again from step 8. Moreover, because the jump is actually taken, the prematurely fetched instruction \( d \) is discarded when pipelined instruction processing is started again. The attempt to fetch another instruction in step 8 does not succeed because the jump instruction \( \#3 \) has brought the program counter beyond the last instruction of the PGA program. Instruction processing stops after step 8, because fetching fails in that step while there is no other instruction in the pipeline. This situation corresponds to a programming error, such as a jump out of the program, as a result of which further instruction processing is blocked.

With pipelined instruction processing, execution of the first example program takes 12 steps and execution of the second example program takes 8 steps. With non-pipelined instruction processing, these would take 20 steps and 13 steps, respectively. However, there will be no real gain unless \( O'_{\text{fetch}}, O'_{\text{prep}}, O'_{\text{exec}} \) and \( O'_{\text{postp}} \) can be performed in parallel.

9. Parallel Composability

In this section, we justify the use of the term pipeline in Section 8 by showing that the suboperations \( O'_{\text{fetch}}, O'_{\text{prep}}, O'_{\text{exec}} \) and \( O'_{\text{postp}} \) of \( O_{\text{step}} \) can actually be performed in parallel.

In the case under consideration, performing a number of operations in parallel amounts to accomplishing the state transformations going with the different operations simulta-
neously. It should be borne in mind that accomplishing them simultaneously and accomplishing them in arbitrary order do not always yield the same result.

Let $(M, B, S, O)$ be a Maurer computer, let $O \in O$, and let $O_1, O_2 : S \to S$ be such that $O_2(O_1(S)) = O(S)$ for all $S \in S$. Then $O$ is parallel composable of $O_1$ and $O_2$ if the following conditions are fulfilled:

- $O_1$ is consistent with $O_2$: if $O_1$ and $O_2$ affect the same memory element, then they affect that memory element the same;
- $O_1$ is transparent to $O_2$: if $O_1$ affects a memory element, then that memory element does not affect any memory element under $O_2$.

More precisely, $O$ is parallel composable of $O_1$ and $O_2$ iff $O_1 \con O_2 \land O_1 \tra O_2$, where $\con$ and $\tra$ are defined as follows:

\[ O_1 \con O_2 \iff \forall m \in OR(O_1) \cap OR(O_2), S \in S \cdot (O_1(S)(m) \neq S(m) \land O_2(S)(m) \neq S(m) \Rightarrow O_1(S)(m) = O_2(S)(m)), \]

\[ O_1 \tra O_2 \iff \forall m \in OR(O_1) \cap IR(O_2), S \in S \cdot (O_1(S)(m) \neq S(m) \Rightarrow \neg (\exists S' \in S \cdot (\forall m' \in M \setminus \{m\} \cdot O_1(S)(m') = S'(m') \land \exists m'' \in OR(O_2) \cdot O_2(S)(m'') \neq O_2(S')(m'')))). \]

Sufficient conditions for $O_1 \con O_2$ and $O_1 \tra O_2$ to hold are $OR(O_1) \cap OR(O_2) = \emptyset$ and $OR(O_1) \cap IR(O_2) = \emptyset$, respectively.

Let $(M, B, S, O)$ be a Maurer computer, let $O \in O$, and let $O_1, O_2 : S \to S$ be such that $O_2(O_1(S)) = O(S)$ for all $S \in S$. Then $O_1$ and $O_2$ are commutative if $O_2(O_1(S)) = O_1(O_2(S))$ for all $S \in S$. We have that, for $O$ to be parallel composable of $O_1$ and $O_2$, $O_1$ and $O_2$ do not have to be commutative. We have also that, for $O_1$ and $O_2$ to be commutative, $O$ does not have to be parallel composable of $O_1$ and $O_2$. In other words, neither parallel composability implies commutativity nor the other way round.

Parallel composability generalizes easily to $n$ operations (for $n \geq 2$).

Let $(M, B, S, O)$ be a Maurer computer, let $O \in O$, and let $O_1, \ldots, O_n : S \to S$ be such that $O_n(\ldots O_1(S) \ldots) = O(S)$ for all $S \in S$. Then $O$ is parallel composable of $O_1, \ldots, O_n$ iff

\[ \bigwedge_{1 \leq i < n} \bigwedge_{1 \leq j \leq n} (O_i \con O_j \land O_i \tra O_j). \]

The suboperations $O'_{\text{fetch}}$, $O'_{\text{prep}}$, $O'_{\text{exec}}$ and $O'_{\text{postp}}$ of $O_{\text{step}}$ from Section 8 can be performed in parallel. This is stated rigorously in the following theorem.

**Theorem 3 (Parallel composability).** Take the SP-PL-enhancement of a Maurer machine $H$ as in Section 8. Then $O_{\text{step}}$ is parallel composable of $O'_{\text{postp}}$, $O'_{\text{exec}}$, $O'_{\text{prep}}$ and $O'_{\text{fetch}}$. 
O that has decoded a jump instruction is crucial for parallel composability. It is easy to see that, if \( O'_{postp} \) changes \( pc \) in state \( S' \), then \( O'_{exec} \) must not have set \( isf \) one step back and \( O'_{pref} \) must have set \( jdf \) two steps back. It is also easy to see that, as a consequence, \( O'_{fetch} \) does not change any memory element in states \( S' \) and \( O'_{pref}(S') \). Hence both the consistency condition and the transparency condition are trivially met.

The proof of Theorem 3 shows that stalling pipelined instruction processing when \( O'_{pref} \) has decoded a jump instruction is crucial for parallel composability. It is easy to see that \( O'_{step} \) is not parallel composable of \( O'_{postp}, O'_{exec}, O'_{pref}, O'_{fetch} \) and \( O'_{pictr} \). This is to be expected. For example, the flags \( jdf, isf \) and \( jpf \) are set by \( O'_{pref}, O'_{exec} \) and \( O'_{postp} \) to influence how \( plsr \) is updated by \( O'_{pictr} \).

10. Conditional Jump Instructions

In this section, we extend PGA with conditional jump instructions and look at the effect of this on non-pipelined and pipelined instruction processing.

We add to PGA the following primitive instructions:

— for each \( a \in \mathbb{A} \) and \( k \in \mathbb{N} \), a positive conditional jump instruction \( +a\#k \);

— for each \( a \in \mathbb{A} \) and \( k \in \mathbb{N} \), a negative conditional jump instruction \( -a\#k \).

A positive conditional jump instruction \( +a\#k \) has the same effect as \( +a;\#k \), but counts as one instruction; and a negative conditional jump instruction \( -a\#k \) has the same effect as \( -a;\#k \), but counts as one instruction. In Bergstra and Loots (2002), PGA is extended with a unit instruction operator \( \mathbf{u} \) which turns PGA programs into single instructions. In that extension of PGA, called PGA\(_u\), \( +a\#k \) and \( -a\#k \) can be taken as abbreviations for \( \mathbf{u}(+a;\#k) \) and \( \mathbf{u}(-a;\#k) \), respectively. In Ponse (2002), thread extraction for PGA\(_u\) programs is described by means of a mapping from PGA\(_u\) programs to PGA programs.

The SP-NPL-enhancement of a Maurer machine changes only slightly when conditional jump instructions are added. Only the set \( IT \) and the auxiliary functions \( dec, opc \) and \( pceu \) used in the definition of the SP-NPL-enhancement of a Maurer machine from Section 7 have to be redefined. The set \( IT \) is redefined because the two kinds of conditional jump instructions give rise to two additional instruction types: \( pcfjmp \) and \( ncfjmp \). The function \( dec \) is redefined in order to deal with the decoding of conditional jump instructions. The function \( opc \) is redefined because conditional jump instructions cause an operation to be performed. The function \( pceu \) is redefined in order to deal with the adjustment of the
program counter in the case of conditional jump instructions.

$IT$ is redefined to be the set \{bsc, ptst, ntst, fjmp, pcfjmp, ncfjmp, term\}.

The function $dec : S' \rightarrow IT \times A \times MA_{prog}$ is redefined as follows:

\[
\begin{align*}
    dec(S') &= (bsc, a, S'(dr)) & \text{if } S'(ir) = a , \\
    dec(S') &= (ptst, a, S'(dr)) & \text{if } S'(ir) = +a , \\
    dec(S') &= (ntst, a, S'(dr)) & \text{if } S'(ir) = -a , \\
    dec(S') &= (fjmp, S'(bar), k) & \text{if } S'(ir) = \#k , \\
    dec(S') &= (pcfjmp, a, k) & \text{if } S'(ir) = +a\#k , \\
    dec(S') &= (ncfjmp, a, k) & \text{if } S'(ir) = -a\#k , \\
    dec(S') &= (term, S'(bar), S'(dr)) & \text{if } S'(ir) = ! .
\end{align*}
\]

The function $opc : S' \rightarrow \mathbb{B}$ is redefined as follows:

\[
opc(S') = T \text{ if } S'(ditr) \in \{bsc, ptst, ntst, pcfjmp, ncfjmp\} .
\]

The function $pcu : S' \rightarrow MA'_{prog}$ is redefined as follows:

\[
\begin{align*}
    pcu(S') &= S'(pc) & \text{if } S'(eitr) = bsc \lor \\
    & & S'(eitr) = ptst \land S'(irr) = T \lor \\
    & & S'(eitr) = ntst \land S'(irr) = F \lor \\
    & & S'(eitr) = pcfjmp \land S'(irr) = F \lor \\
    & & S'(eitr) = ncfjmp \land S'(irr) = T \lor \\
    & & S'(eitr) = \text{term} , \\
    pcu(S') &= S'(pc) + 1 & \text{if } (S'(eitr) = ptst \land S'(irr) = F) \land \\
    & & S'(eitr) = ntst \land S'(irr) = T) \land \\
    & & S'(pc) + 1 \leq S'(pcbr) , \\
    pcu(S') &= S'(pc) - 1 + S'(dr) & \text{if } (S'(eitr) = fjmp \lor \\
    & & S'(eitr) = pcfjmp \land S'(irr) = T \lor \\
    & & S'(eitr) = ncfjmp \land S'(irr) = F) \lor \\
    & & S'(dr) \neq 0 \land \\
    & & S'(pc) - 1 + S'(dr) \leq S'(pcbr) , \\
    pcu(S') &= S'(pcbr) + 1 & \text{if } (S'(eitr) = ptst \land S'(irr) = F \lor \\
    & & S'(eitr) = ntst \land S'(irr) = T) \land \\
    & & S'(pc) + 1 > S'(pcbr) \lor \\
    & & (S'(eitr) = fjmp \lor \\
    & & S'(eitr) = pcfjmp \land S'(irr) = T \lor \\
    & & S'(eitr) = ncfjmp \land S'(irr) = F) \lor \\
    & & (S'(dr) = 0 \lor \\
    & & S'(pc) - 1 + S'(dr) > S'(pcbr)) .
\end{align*}
\]

Like the SP-NPL-enhancement of a Maurer machine, the SP-PL-enhancement of a Maurer machine changes only slightly when conditional jump instructions are added. The memory has to be extended with a conditional jump flag (cjf) which, like the other flags, contains a Boolean value. The set $M'_{plc}$, the auxiliary functions $jpc$ and $pcu'$, the suboperation $O'_{exec}$ and the operation $O_{pcte}$ used in the definition of the SP-PL-enhancement
of a Maurer machine from Section 8 have to be redefined. The flag \(cjf\) is needed in order to control the pipelined processing of instructions in the presence of conditional jump instructions. The set \(M'_\text{plc}\) is redefined because of the addition of the flag \(cjf\). The function \(jpc\) is redefined because, after adjustment of the program counter on conditional jump instructions, pipelined instruction processing must be restarted as in the case of unconditional jump instructions. Just like \(pcu\) before, the function \(pcu'\) is redefined in order to deal with the adjustment of the program counter in the case of conditional jump instructions. The suboperation \(O'_{\text{exec}}\) is redefined in order to set the additional flag \(cjf\) when, in the case of conditional jump instructions, the reply value is produced on which the jump concerned must actually take place. The operation \(O_{\text{plctr}}\) is redefined in order to control the pipelined processing of instructions in the presence of conditional jump instructions.

\(M'_\text{plc}\) is redefined to be the set \(\{isf, jdf, jpf, cjf, plsr, rr\}\).

The function \(jpc : S' \rightarrow \mathbb{B}\) is redefined as follows:

\[
jpc(S') = T \text{ iff } S'(\text{eitr}) = \text{fjmp} \lor S'(\text{eitr}) = \text{pcfjmp} \land S'(\text{irr}) = T \lor S'(\text{eitr}) = \text{ncfjmp} \land S'(\text{irr}) = F.
\]

The function \(pcu' : S' \rightarrow MA'_{\text{prog}}\) is redefined as follows:

\[
\begin{align*}
\text{pcu}'(S') &= S'(\text{pc}) \quad \text{if } S'(\text{eitr}) \in \{\text{bsc, ptst, ntxt, term} \} \lor S'(\text{eitr}) = \text{pcfjmp} \land S'(\text{irr}) = F \lor S'(\text{eitr}) = \text{ncfjmp} \land S'(\text{irr}) = T, \\
\text{pcu}'(S') &= S'(\text{pc}) - 2 + S'(\text{dr}) \quad \text{if } S'(\text{eitr}) = \text{fjmp} \land S'(\text{dr}) \neq 0 \land S'(\text{pc}) - 2 + S'(\text{dr}) \leq S'(\text{pcbr}), \\
\text{pcu}'(S') &= S'(\text{pc}) - 3 + S'(\text{dr}) \quad \text{if } (S'(\text{eitr}) = \text{pcfjmp} \land S'(\text{irr}) = T \lor S'(\text{eitr}) = \text{ncfjmp} \land S'(\text{irr}) = F) \land S'(\text{dr}) \neq 0 \land S'(\text{pc}) - 3 + S'(\text{dr}) \leq S'(\text{pcbr}), \\
\text{pcu}'(S') &= S'(\text{pcbr}) + 1 \quad \text{if } S'(\text{eitr}) = \text{fjmp} \land (S'(\text{dr}) = 0 \lor S'(\text{pc}) - 2 + S'(\text{dr}) > S'(\text{pcbr})) \lor (S'(\text{eitr}) = \text{pcfjmp} \land S'(\text{irr}) = T \lor S'(\text{eitr}) = \text{ncfjmp} \land S'(\text{irr}) = F) \land (S'(\text{dr}) = 0 \lor S'(\text{pc}) - 3 + S'(\text{dr}) > S'(\text{pcbr})).
\end{align*}
\]

The suboperation \(O'_{\text{exec}}\) is redefined as follows:

\[
\begin{align*}
O'_{\text{exec}}(S') &= S' \quad \text{if } \text{execst} \notin S'(\text{plsr}), \\
O'_{\text{exec}}(S') \setminus (M' \setminus M'_\text{plc}) &= O_{\text{exec}}(S') \setminus (M' \setminus M'_\text{plc}) \quad \text{if } \text{execst} \in S'(\text{plsr}), \\
O'_{\text{exec}}(S')(\text{isf}) &= \text{isc}(S') \quad \text{if } \text{execst} \in S'(\text{plsr}), \\
O'_{\text{exec}}(S')(\text{cjf}) &= \text{cjfc}(S') \quad \text{if } \text{execst} \in S'(\text{plsr}), \\
O'_{\text{exec}}(S') \setminus (M'_\text{plc} \setminus \{\text{isf, cjf}\}) &= S' \setminus (M'_\text{plc} \setminus \{\text{isf, cjf}\}) \quad \text{if } \text{execst} \in S'(\text{plsr}).
\end{align*}
\]
where \( isc : S' \rightarrow \mathbb{B} \) is defined as in the case without conditional jump instructions and \( cjc : S' \rightarrow \mathbb{B} \) is the unique function from \( S' \) to \( \mathbb{B} \) such that for all \( S' \in S' \):

\[
cjc(S') = \begin{cases} 
T & \text{if } S'(\text{dir}) = \text{pcfjmp} \land O_{\text{exec}}(S' \upharpoonright (M' \setminus M'_{\text{plc}}))(\text{irr}) = T \lor \\
& S'(\text{dir}) = \text{ncfjmp} \land O_{\text{exec}}(S' \upharpoonright (M' \setminus M'_{\text{plc}}))(\text{irr}) = F.
\end{cases}
\]

\( O_{\text{plctr}} \) is redefined as follows:

\[
O_{\text{plctr}}(S' \upharpoonright (M' \setminus M'_{\text{plc}})) = S' \upharpoonright (M' \setminus M'_{\text{plc}}), \]

\[
O_{\text{plctr}}(S')(jdf) = S'(jdf), \]

\[
O_{\text{plctr}}(S')(isf) = S'(isf), \]

\[
O_{\text{plctr}}(S')(jpf) = S'(jpf), \]

\[
O_{\text{plctr}}(S')(cjf) = S'(cjf), \]

\[
O_{\text{plctr}}(S')(plsr) = p\text{lsu}(S'), \]

\[
O_{\text{plctr}}(S')(rr) = r\text{u}(S'),
\]

where \( p\text{lsu} : S' \rightarrow \mathcal{P}(\mathcal{P}(S)) \) is the unique function from \( S' \) to \( \mathcal{P}(\mathcal{P}(S)) \) such that for all \( S' \in S' \):

\[
\begin{align*}
\text{fetch} &\in p\text{lsu}(S') \quad \text{iff} \quad (\text{fetch} \in S'(\text{plsr}) \land S'(jdf) = F \land S'(cjf) = F \lor \\
& S'(isf) = T \lor S'(jpf) = T), \\
\text{prest} &\in p\text{lsu}(S') \quad \text{iff} \quad (\text{prest} \in S'(\text{plsr}) \land S'(isf) = F \land S'(cjf) = F \lor \\
& S'(jdf) = T), \\
\text{execst} &\in p\text{lsu}(S') \quad \text{iff} \quad (\text{execst} \in S'(\text{plsr}) \land S'(isf) = F \land S'(cjf) = F \lor \\
& S'(jpf) = T), \\
\text{postpst} &\in p\text{lsu}(S') \quad \text{iff} \quad (\text{postpst} \in S'(\text{plsr}) \land S'(isf) = F \land S'(cjf) = F \lor \\
& S'(jdf) = T).
\end{align*}
\]

11. Backward Jump Instructions

In this short section, we discuss backward jump instructions and sketch the effect of their inclusion on non-pipelined and pipelined instruction processing.

In the preceding sections, we have considered only finite PGA programs, i.e. closed terms of PGA in which the repetition operator does not occur. This means that programs that are infinite sequences of primitive instructions are excluded. In other words, programs of which the execution goes on indefinitely are not covered. However, in a setting with backward jump instructions, there exists for each such program a behaviourally equivalent program that is a finite sequence of primitive instructions.

In a setting with backward jump instructions, there are, in addition to the primitive instructions of PGA introduced earlier, the following primitive instructions:

— for each \( k \in \mathbb{N} \), a backward jump instruction \( \backslash\#k \).

We write \( \mathcal{J}' \) for the set that consists of all primitive instructions of PGA and all backward jump instructions. A PGLB program is a closed term that can be built from:

— for each \( u \in \mathcal{J}' \), an instruction constant \( u \);
— the concatenation operator \( \_;\_ \).
In Bergstra and Loots (2002), the meaning of PGLB programs is described by means of a mapping from PGLB programs to PGA programs. For each PGA program, there exists a PGLB program that is mapped to a PGA program with the same behaviour. In other words, the expressiveness is not decreased by replacing the repetition operator by backward jump instructions.

The addition of backward jump instructions gives rise to trivial changes of the SP-NPL-enhancement and SP-PL-enhancement of Maurer machines: forward jump instructions and backward jump instructions can be treated in the same way.

Only the set $IT$ and the auxiliary functions $dec$ and $pcu$ used in the definition of the SP-NPL-enhancement of a Maurer machine from Section 7 and the auxiliary function $pcu'$ used in the definition of the SP-PL-enhancement of a Maurer machine from Section 8 have to be redefined. The set $IT$ must be redefined because the backward jump instructions give rise to an additional instruction type: $bjmp$. The function $dec$ must be redefined in order to deal with the decoding of backward jump instructions. The function $pcu$ and $pcu'$ must be redefined in order to deal with the adjustment of the program counter in the case of backward jump instructions.

It is easy to see that with the correct redefinitions, Theorems 1 and 2 go through after the addition of backward jump instructions. Conditional backward jump instructions can be added in the same way as conditional forward jump instructions have been added in Section 10.

12. Instruction Set Architectures

In this section, we introduce the concept of a strict load/store Maurer instruction set architecture. This concept takes its name from the following: it is described in the setting of Maurer’s model for computers, it concerns only load/store architectures, and the load/store architectures concerned are strict in some respects that will be explained after its formalization.

The concept of a strict load/store Maurer instruction set architecture, or shortly a strict load/store Maurer ISA, is an approximation of the concept of a load/store instruction set architecture. It is focussed on instructions for data manipulation and data transfer. Instructions for transfer of program control are treated in a uniform way over different strict load/store Maurer ISAs. Instances of the concept of a strict load/store Maurer ISA are those Maurer machines for which SP-NPL-enhancement and SP-PL-enhancement are primarily intended. The SP-NPL-enhancement and SP-PL-enhancement of a strict load/store Maurer ISA can be viewed as implementations of that ISA.

Each Maurer machine has a number of basic actions with which an operation is associated. In this section, when speaking about Maurer machines that are strict load/store Maurer ISAs, such basic actions are loosely called basic instructions. The term basic action is uncommon where we are concerned with ISAs, and moreover basic instructions and basic actions are identified in the semantics of PGA.

The basic idea underlying the concept of a strict load/store Maurer ISA is that there is a main memory of which the elements contain data, an operating unit with a small internal memory by which data can be manipulated, and an interface between the main
memory and the operating unit for data transfer between them. For the sake of simplicity, data is restricted to the natural numbers between 0 and some upper bound. Other types of data that could be supported can always be represented by the natural numbers provided. Moreover, the data manipulation instructions offered by a strict load/store Maurer ISA are not restricted and may include ones that are tailored to manipulation of representations of other types of data. Therefore, we believe that nothing essential is lost by the restriction to natural numbers.

The concept of a strict load/store Maurer ISA is parametrized by:
- an address width $k$;
- a word length $l$;
- a bit size $m$ of the operating unit;
- a number $u$ of pairs of address and data registers for load instructions;
- a number $v$ of pairs of address and data registers for store instructions;
- a set $A'$ of basic instructions for data manipulation.

It is assumed that a fixed but arbitrary set $M_{\text{data}}$ of cardinality $2^k$ and a fixed but arbitrary bijection $m_{\text{data}} : [0, 2^k - 1] \to M_{\text{data}}$ have been given. $M_{\text{data}}$ is called the data memory. The data memory is a memory of which the elements can be addressed by means of natural numbers in the interval $[0, 2^k - 1]$. The address width $k$ can be regarded as the number of bits used for the binary representation of addresses of data memory elements. We write $B_{\text{addr}}$ for $[0, 2^k - 1]$. The data memory elements are meant for containing data. They can contain natural numbers in the interval $[0, 2^k - 1]$. The word length $l$ can be regarded as the number of bits used to represent data in data memory elements. We write $B_{\text{data}}$ for $[0, 2^l - 1]$.

It is assumed that a fixed but arbitrary set $M_{\text{ou}}$ of cardinality $m$, called the operating unit memory, has been given. The operating unit memory is a memory of which the elements can contain natural numbers in the set $\{0, 1\}$, i.e. bits. We write $B_2$ for $\{0, 1\}$. The bit size $m$ can be regarded as the number of bits that the internal memory of the operating unit contains. Usually, a part of the operating unit memory is partitioned into groups to which data manipulation instructions can refer.

It is assumed that fixed but arbitrary sets $M_{\text{id}}$ and $M_{\text{ua}}$ of cardinality $u$ and fixed but arbitrary bijections $m_{\text{id}} : [0, u - 1] \to M_{\text{id}}$ and $m_{\text{ua}} : [0, u - 1] \to M_{\text{ua}}$ have been given. It is also assumed that fixed but arbitrary sets $M_{\text{id}}$ and $M_{\text{ua}}$ of cardinality $v$ and fixed but arbitrary bijections $m_{\text{id}} : [0, v - 1] \to M_{\text{id}}$ and $m_{\text{ua}} : [0, v - 1] \to M_{\text{ua}}$ have been given. The members of $M_{\text{ua}}$ and $M_{\text{id}}$ are called load address registers and load data registers, respectively. The members of $M_{\text{ua}}$ and $M_{\text{id}}$ are called store address registers and store data registers, respectively. The load and store registers are special memory elements meant for transferring data between the data memory and the operating unit memory. The members of $M_{\text{ua}}$ and $M_{\text{id}}$ can contain addresses, i.e. members of $B_{\text{addr}}$. The members of $M_{\text{id}}$ and $M_{\text{id}}$ can contain data, i.e. members of $B_{\text{data}}$. It is assumed that $M_{\text{data}}$, $M_{\text{ou}}$, $M_{\text{id}}$, $M_{\text{ua}}$, $M_{\text{sa}}$, $M_{\text{ld}}$, $M_{\text{sa}}$, $M_{\text{ld}}$, and $\{m_a : a \in A\}$ are pairwise disjoint sets.

Let $n \in [0, 2^k - 1]$, $n' \in [0, u - 1]$ and $n'' \in [0, v - 1]$. Then, we write $M_{\text{data}}[n]$ for $m_{\text{data}}(n)$, $M_{\text{ld}}[n']$ for $m_{\text{ld}}(n')$, $M_{\text{sa}}[n']$ for $m_{\text{sa}}(n')$, $M_{\text{ld}}[n'']$ for $m_{\text{ld}}(n'')$ and $M_{\text{sa}}[n'']$ for $m_{\text{sa}}(n'')$.

A strict load/store Maurer instruction set architecture with parameters $k$, $l$, $m$, $u$, $v$
and $A'$ is a Maurer machine $H = (M, B, S, O, A, \cdot)$ with

$$M = M_{\text{data}} \cup M_{\text{ou}} \cup M_{\text{id}} \cup M_{\text{ld}} \cup M_{\text{sa}} \cup \{rr_a \mid a \in A\},$$

$$B = B_{\text{data}} \cup B_{\text{addr}} \cup B,$$

$$S = \{S : M \rightarrow B \mid \forall m \in M_{\text{data}} \cup M_{\text{id}} \cup M_{\text{ld}} \cdot S(m) \in B_{\text{data}} \land \forall m \in M_{\text{sa}} \cup M_{\text{ld}} \cdot S(m) \in B_{\text{addr}} \land \forall m \in M_{\text{ou}} \cdot S(m) \in B \land \forall a \in A \cdot S(rr_a) \in B\},$$

$$O = \{O_a \mid a \in A\},$$

$$A = \{\text{load}:n \mid n \in [0, u - 1]\} \cup \{\text{store}:n \mid n \in [0, v - 1]\} \cup A',$$

$$[a] = (O_a, rr_a) \quad \text{for all } a \in A,$$

where, for all $n \in [0, u - 1]$, $O_{\text{load}:n}$ is the unique function from $S$ to $S$ such that for all $S \in S$:

$$O_{\text{load}:n}(S) \upharpoonright (M \setminus \{M_{\text{id}}[n], rr_{\text{load}:n}\}) = S \upharpoonright (M \setminus \{M_{\text{id}}[n], rr_{\text{load}:n}\}),$$

$$O_{\text{load}:n}(S)(M_{\text{id}}[n]) = S(M_{\text{data}}[S(M_{\text{id}}[n])]),$$

$$O_{\text{load}:n}(S)(rr_{\text{load}:n}) = T,$$

and, for all $n \in [0, v - 1]$, $O_{\text{store}:n}$ is the unique function from $S$ to $S$ such that for all $S \in S$:

$$O_{\text{store}:n}(S) \upharpoonright (M \setminus \{M_{\text{data}}[S(M_{\text{sa}}[n])], rr_{\text{store}:n}\}) = S \upharpoonright (M \setminus \{M_{\text{data}}[S(M_{\text{sa}}[n])], rr_{\text{store}:n}\}),$$

$$O_{\text{store}:n}(S)(M_{\text{data}}[S(M_{\text{sa}}[n])]) = S(M_{\text{sa}}[n]),$$

$$O_{\text{store}:n}(S)(rr_{\text{store}:n}) = T,$$

and, for all $a \in A'$, $O_a$ is a function from $S$ to $S$ such that:

$$IR(O_a) \subseteq M_{\text{ou}} \cup M_{\text{id}},$$

$$OR(O_a) \subseteq M_{\text{ou}} \cup M_{\text{id}} \cup M_{\text{sa}} \cup \{rr_a\}.$$

On purpose, we consider only load/store architectures. We believe that load/store architectures give rise to a relatively simple interface between the data memory and the operating unit. For example, with an architecture other than a load/store architecture, it is more difficult to establish statically, when it concerns instructions for data manipulation and/or data transfer, the cases in which the operations associated with instructions that follow each other can be safely performed in a different order or in parallel.

A strict load/store Maurer ISA is strict in the following respects:

— with data transfer between the data memory and the operating unit, a strict separation is made between memory elements used for loading data, loading addresses, storing data, and storing addresses;

— from these memory elements, only the memory elements used for loading data are allowed in the input regions of data manipulation operations;

— a data memory of which the size is less than the number of addresses determined by the address width is not allowed.

The first two ways in which a strict load/store Maurer ISA is strict concern the interface between the data memory and the operating unit. We believe that they yield the most conveniently arranged interface for theoretical work relevant to micro-architecture design.
Less simple interfaces are found in many load/store architectures of which there exist implementations. The third way in which a strict load/store Maurer ISA is strict saves the need to deal with addresses that do not address a memory element. Such addresses can be dealt with in many different ways, each of which complicates the architecture considerably. We consider their exclusion desirable in much theoretical work relevant to micro-architecture design.

An anonymous referee draw our attention to the fact that a strict separation between memory elements used for loading data, loading addresses, storing data, and storing addresses is also made in Cray and Thornton’s design of the CDC 6600 computer (Thornton, 1970), which is arguably the first implemented load/store architecture. However, in their design, the memory elements used for storing data are also allowed in the input regions of data manipulation operations.

13. Conclusions

We have modelled micro-architectures with non-pipelined instruction processing and pipelined instruction processing, using Maurer machines, basic thread algebra and program algebra. Because our descriptions of micro-architectures are more precise than those usually given, we have been able to verify that stored programs are executed as intended with these micro-architectures. A thorough understanding of the issues relevant to pipelined instruction processing can be acquired by modelling micro-architectures based on different pipeline organizations as well.

In this paper, pipelined instruction processing deals with control conflicts, but does not deal with data conflicts. Because memory access is not made explicit, data conflicts simply do not occur in the model presented in this paper. Models in which memory access is made explicit may have it placed in a separate pipeline stage, as a result of which data conflicts may occur. In those models, additional assumptions are needed about the instruction set architecture. Such additional assumptions are incorporated in the concept of a strict load/store Maurer instruction set architecture introduced in this paper.

Several techniques for speeding up instruction processing involve multi-threading, a form of concurrency where some interleaving strategy determines how threads that exist concurrently are interleaved (see also Bergstra and Middelburg (2007c, 2005)). When modelling micro-architectures for those techniques, the enabledness of basic actions discussed in Section 4 is likely to be relevant. It certainly is relevant in the case of micro-threading (Bolychevsky et al., 1996; Jesshope and Luo, 2000).

There are many options for future work. We mention only the modelling of micro-architectures for different combinations of instruction set architecture and technique for speeding up instruction processing. By that, the work presented in this paper may grow into a theoretical basis for micro-architecture design.

The work presented in this paper, as well as the preceding work presented in Bergstra and Middelburg (2007a), has convinced us that a special notation for the description of micro-architectures is desirable. For example, it is annoying that, for each memory element that is not affected by an operation, this must be described explicitly. However, we
found that fixing an appropriate notation still requires some significant design decisions. We aim at a notation of which the semantics can simply be given by a translation to logical formulas, much in the spirit of predicative methodology (Hehner et al., 1986). The following alternative description of the operation \( O_{\text{fetch}} \) from Section 7 shows how an appropriate notation could look like:

\[
O_{\text{fetch}} : \begin{cases} 
\text{if } pc + 1 \leq pcbr \text{ then } pc := pc + 1, \\
\text{if } pc \leq pcbr \text{ then } (ir := M_{\text{prog}}[pc]; rr := T) \text{ else } (ir := \#0; rr := F).
\end{cases}
\]

The work presented in Bergstra and Middelburg (2007a) and this paper has also convinced us that modularity is material to this work: it is about combining and extending models and about renaming and hiding names used in those models. All this has until now been done informally, but in the future there may arise a need to formalize it. We believe that module algebra (Bergstra et al., 1990) is a suitable formalism on which to base that formalization.

Parallel composability in connection with pipelined instruction processing is studied in a different setting in Hoe and Arvind (2004). Using algebraic techniques from Harman and Tucker (1996), three simple pipelined systems and a pipelined implementation of a micro-processor are both modelled and verified in Fox and Harman (2003) and Fox (1998), respectively. The simple pipelined systems as well as the pipelined implementation of a micro-processor are modelled as iterated maps. By modelling a pipelined micro-processor as an iterated map, it is modelled at a level of abstraction that is higher than that at which micro-architecture design takes place. We focus our attention on modelling at the latter level of abstraction. A very extensive and up-to-date overview of interesting work on modelling and verifying pipelined micro-processors can also be found in Fox and Harman (2003).

Acknowledgements

We thank Bob Diertens from the University of Amsterdam, Programming Research Group, for carefully reading a draft of this paper, for contributing the pictures included in this paper, and for implementing a simulator for the micro-architectures modelled in this paper. We are grateful to an anonymous referee for his/her valuable comments both on technical points and matters of presentation.

References


