# Specification and Verification of Memory Consistency Models for Shared-Memory Multiprocessor Systems Shiro Takata,<sup>11,12</sup> Kenji Taguchi,<sup>13</sup> Kazuki Joe<sup>14</sup> and Akira Fukuda <sup>12</sup> In this paper we formally specify memory consistency models for shared-memory multiprocessor systems, focusing on causal memory consistency model, by use of a formal specification technique proposed by Taguchi and Araki. The formal specification technique includes a language, which is based on the combination of the Z notation and CCS (Calculus of Communicating Systems), and the state-based CCS semantics, which integrates Z and CCS semantics. Then, we verify that the specified causal memory meets the causal memory consistency condition using the extended state-based CCS semantics. #### 1. Introduction In this paper we propose a formal method to specify and verify memory consistency models for shared-memory multiprocessor systems using a formal specification technique proposed by Taguchi and Araki<sup>1)</sup>. The formal specification technique is based on a language that has combination characteristics of the Z notation and CCS (Calculus of Communicating Systems)<sup>2)</sup>. The Z notation is a model-based specification language based on set theory and first-order predicate logic. It has rich data structures and facilities to define various operations. Thus it is suited for modeling states and operations. But the Z notation does not have enough facilities to specify concurrency aspects. CCS is a process algebra that is suitable vehicle for modeling mathematical structure of concurrency aspects. However CCS has no explicit modeling facilities for states and operations. Therefore, the combination of the Z notation and CCS, which complements each other, would result in a versatile specification language<sup>1)</sup>. Taguchi and Araki also proposed the state-based CCS semantics that integrates Z and CCS semantics in order to verify given systems<sup>1)</sup>. Memory consistency models for shared-memory multiprocessor systems define the behavior of memory with respect to read and write operations. In this paper, we focus on causal memory consistency model proposed by Hutto<sup>3</sup>). Causal memory is an implementation of the memory mechanism which satisfies the causal memory consistency condition: any read operation to shared-memory obtains the value which is consistent with other causally related read and write operations. A formal definition, implementation and verification of the causal memory have already been presented by Ahamad and Hutto<sup>4</sup>). Regardless of their results, they are inefficient for us to formalize every memory consistency model since they just use algebra. In this paper, we give more formal and therefore sufficient specification of the process components (states and operations in Z) and the concurrency aspects of the causal memory by the combination use of Z and CCS. We also specify the causal memory by ory consistency model and verify that the specification of causal memory meets the causal memory consistency condition using the state-based CCS semantics and its extension to a sequence of actions in finite length. This paper is structured as follows. In section 2, weak vector clocks<sup>5),6)</sup> based on the causally-precedes relation defined by Lamport<sup>7)</sup> is described. In section 3, the state-based CCS semantics and its extension are explained. In section 4, definition of causal memory consistency model using the extended state-based CCS is given. In section 5, a description of causal memory is described by using the combination of Z and CCS. Verification of causal memory is presented in section 6. Finally, we conclude and indicate our future works in section 7. ## 2. Weak Vector Clocks Vector clocks<sup>8)</sup> are used in distributed systems to determine whether a pair of events $e_i$ , $e_j$ has causal relation denoted by $e_i \rightarrow e_j$ where $\rightarrow$ is the causally-precedes relation defined by Lamport<sup>7)</sup>. Using the vector clocks, a timestamp is recorded when any event is detected, and the causal relationship of pairs of events is determined by comparing the timestamps. The timestamp is an n-tuple of integers, where n is the number of processes. Given two events $e_i$ , $e_j$ and their associated vector timestamps $t(e_i)$ , $t(e_j)$ , the following relation holds: $$t(e_{i}) \prec t(e_{j}) \stackrel{\text{def}}{=} (\forall k : 1 \dots n \bullet t(e_{i})[k] \leq t(e_{j})[k])$$ $$\land (\exists l : 1 \dots n \bullet t(e_{i})[l] < t(e_{j})[l])$$ $$t(e_{i}) \preceq t(e_{j}) \stackrel{\text{def}}{=} (t(e_{i}) \prec t(e_{j})) \lor (t(e_{i}) = t(e_{j}))$$ $$t(e_{i}) \preceq t(e_{j}) \Leftrightarrow e_{i} \rightarrow e_{j}$$ With the traditional vector clocks, the local counter $t_i[i]$ of a process $P_i$ increases whenever the $P_i$ executes each event. In contrast, with weak vector clocks<sup>5</sup>, $t_i[i]$ increases only when $P_i$ executes an event that potentially leads to change the system property which is expressed by some state variables. In either case, $P_i$ sends a message that contains $P_i$ 's state change information with its vector timestamp $t_i$ to all other processes whenever its vector clock changes. When such a message is received, $P_j$ updates its vector timestamp $t_j$ as follows: $\forall k: 1 \dots n \bullet t_{j}[k] = \max(t_{j}[k], t_{j}[k])$ <sup>†1</sup> Nara Institute of Science and Technology <sup>†2</sup> Keihanna Interaction Plaza Inc. <sup>3</sup> Kyushu University <sup>†4</sup> Wakayama University Figure 1 illustrates a history of events in the Fig. 1 A history of events in a causal memory system causal memory system described in section 5 which adopts weak vector clocks. Note that $t_i[j]$ is the number of write operations by $P_j$ because ti is initialized to the 0 vector. #### The State-Based CCS Semantics and its Extension In <sup>2)</sup>, Milner provides the operational semantics of CCS in terms of the following labeled transition system: $\langle \mathcal{E}, Act, \{\stackrel{\sim}{\rightarrow} \mid \alpha \in Act \} \rangle$ which consists of the set $\mathcal{E}$ of agent expressions in CCS, the set Act of actions, and the transition relation $\overset{\alpha}{\to} \subseteq \mathcal{E} \times \mathcal{E}$ for each $\alpha \in Act$ . For example, a process E which evolves another process E' by an action $\alpha$ is denoted by the following transition In the same way, Taguchi and Araki regard operation schemas in Z as transitions from old states to new states so that they provide the operational semantics of Z in terms of the following labeled transition system<sup>1)</sup>: $\langle St, Op, \{\stackrel{\sim}{\rightarrow} \mid \alpha \in Op \} \rangle$ which consists of the set St of states in Z, the set Op of operation schemas, and the transition relation $\stackrel{\sim}{\rightarrow} \subseteq St \times St$ for each $\alpha \in Op$ . For example, a state s which evolves another state s' by an operation schema $\alpha$ is denoted by the following transition relation: In addition, Taguchi and Araki<sup>1)</sup> combine the Z notation with CCS to provide the operational semantics of this combined language, named the state-based CCS, in terms of the following labeled transition system. $(\alpha.E,s) \xrightarrow{\alpha} (E,s') \Leftrightarrow \alpha.E \xrightarrow{\alpha} E \wedge s \xrightarrow{\alpha} s'$ provided that $s,s' \models \llbracket \Theta \rrbracket$ , where $\Theta$ is the first-order representation of an operation schema $\alpha$ . In1), Taguchi and Araki also provide the following transition rules. Prefix operator (1) $$\frac{}{\langle \alpha.E, s \rangle \xrightarrow{\alpha} \langle E, s' \rangle} (\alpha \in Op, s \xrightarrow{\alpha} s')$$ Prefix operator (2) $$\frac{\langle \alpha, E, s \rangle \xrightarrow{\alpha} \langle E, s \rangle}{\langle \alpha, E, s \rangle} (\alpha \in Act)$$ Prefix operator (3) $$\frac{\langle \overline{\alpha}(x!).E, s \rangle \xrightarrow{\overline{\alpha}(c)} \langle E, s \rangle}{\langle \overline{\alpha}(x!).E, s \rangle \xrightarrow{\overline{\alpha}(c)} \langle E, s \rangle} (s' = s \{c/x?\})$$ Prefix operator (4) $$\frac{\langle \alpha(x?).E, s \rangle \xrightarrow{\alpha} \langle E, s' \rangle}{\langle \alpha(x?).E, s \rangle \xrightarrow{\alpha} \langle E, s' \rangle} (s' = s \{c/x?\})$$ Recursion $$\frac{\langle E, s \rangle \xrightarrow{\alpha} \langle F, s' \rangle}{\langle P, s \rangle \xrightarrow{\alpha} \langle F, s' \rangle} (P \stackrel{\text{def}}{=} E)$$ Sum(Non-deterministic Choice) $$\frac{\langle E_1, s \rangle \xrightarrow{\alpha} \langle F, s' \rangle}{\langle E_1 + E_2, s \rangle \xrightarrow{\alpha} \langle F, s' \rangle} (E_2, s) \xrightarrow{\alpha} \langle F, s' \rangle$$ Concurrent Composition (1) $$\frac{\langle E, s \rangle \xrightarrow{\alpha} \langle E', s' \rangle}{\langle E \mid F, s \rangle \xrightarrow{\alpha} \langle E \mid F', s' \rangle} (E \mid F, s) \xrightarrow{\alpha} \langle E \mid F', s' \rangle$$ Concurrent Composition (2) $$\frac{\langle E, s \rangle \xrightarrow{\alpha} \langle E' \mid F, s' \rangle}{\langle E \mid F, s \rangle \xrightarrow{\alpha} \langle E \mid F', s' \rangle} (E \mid F, s) \xrightarrow{\alpha} \langle E \mid F', s' \rangle$$ Restriction $$\frac{\langle E, s \rangle \xrightarrow{\alpha} \langle F, s' \rangle}{\langle E \mid K, s \rangle \xrightarrow{\alpha} \langle F, s' \rangle} (\alpha \notin L, \alpha \in Act)$$ Renaming $$\frac{\langle E, s \rangle \xrightarrow{\alpha} \langle F, s' \rangle}{\langle E \mid F, s \rangle \xrightarrow{\alpha} \langle F, s' \rangle} (\alpha \in Act, \alpha = f(\beta))$$ Stirling<sup>9</sup> defined a natural extension of a single transition relation $\alpha$ to a sequence of actions in $$\frac{\langle E, s \rangle \xrightarrow{\beta} \langle F, s' \rangle}{\langle E[f], s \rangle \xrightarrow{\alpha} \langle F[f], s' \rangle} \ (\alpha \in Act, \alpha = f(\beta))$$ transition relation $\stackrel{\alpha}{\rightarrow}$ to a sequence of actions in finite length, or $traces \alpha_1 \dots \alpha_n$ to provide the following transition rules: Let $\omega$ be such a sequence with $\varepsilon$ as an empty trace. The notation $E \xrightarrow{\omega} F$ represents E may per- form the trace $$\omega$$ and become $F$ . $$E \xrightarrow{\Sigma} E' \xrightarrow{E' \to F}$$ $$E \xrightarrow{\omega} F$$ We propose the following natural extension of transition relations and trace transition rules: Let $\omega$ be a sequence of actions including opera- tion schemas $\alpha_1 \dots \alpha_n$ . $(E, s_1) \xrightarrow{\omega} (F, s'_n) \Leftrightarrow E \xrightarrow{\omega} F \wedge s_1 \xrightarrow{\omega} s'_n$ provided that $\forall i : 1 \dots n \bullet s_i, s'_i \models \llbracket \Theta_i \rrbracket$ , where $s_i$ and $s_i'$ are regarded as old state and new state of a operation schema $\alpha_i$ as a transition, respectively, $\Theta_i$ is the first-order representation of $\alpha_i$ , and n is the number of operation schemas in $\omega$ . $$\frac{\overline{\langle E, s \rangle \xrightarrow{\leftarrow} \langle E, s \rangle}}{\langle E, s_1 \rangle \xrightarrow{\alpha} \langle E', s_1' \rangle} (E', s_1') \xrightarrow{\omega} \langle F, s_n' \rangle} (\alpha \in Op)$$ $$\frac{\langle E, s_1 \rangle \xrightarrow{\alpha} \langle E', s_1 \rangle}{\langle E, s_1 \rangle \xrightarrow{\alpha} \langle E', s_1 \rangle} (E', s_1') \xrightarrow{\omega} \langle F, s_n' \rangle} (\alpha \in Act)$$ ## 4. Definition of Causal Memory Consistency Model This section explains the causal memory consistency model proposed by Hutto and Ahamad in3),4), then defines the model in terms of the extended state-based CCS semantics. ### 4.1 Shared Memory Parallel Computer Model In4), Hutto and Ahamad define a shared mem- ory parallel computer model as follows: • It is a finite set $\mathcal{P}$ of processes $\{P_1, \ldots, P_n\}$ that interact by a series of read and write operations via a shared memory that consists of a finite set of locations. A write operation by a process $P_i$ , denoted by $w_i(x, v)$ here, stores the value v in location x. A read operation, denoted by $r_i(x, v)$ here, notifies Pi that v is stored in location x A local execution history $L_i$ of process $P_i$ is a sequence of read and write operations. An execution history $H = \langle L_1, L_2, \dots, L_n \rangle$ is a collection of local histories. Let A be a set of all operations in H and $A_{i+m}^H$ be a set of all operations by $P_i$ and all write operations in H. Two kinds of program orders, serialization and "respect" are defined as follows: $o_1 \xrightarrow{i} o_2$ , if operation $o_1$ precedes $o_2$ in $L_i$ . $o_1 \xrightarrow{i} o_2$ , if operation $o_1$ precedes $o_2$ in H. $S_i$ for $P_i$ is a serialization of A, if $S_i$ is a linear sequence containing exactly the operations in A such that each read operation from a location returns the value written by the most recent preceding write to the location. If a read operation has no preceding write, an initial value $\perp$ is assumed to be returned. Serialization $S_i$ of A respects order $\rightarrow$ , if, for Serialization S<sub>i</sub> of A respects order →, 11, 10r any operations o<sub>1</sub> and o<sub>2</sub> in A, o<sub>1</sub> → o<sub>2</sub> implies that o<sub>1</sub> precedes o<sub>2</sub> in S<sub>i</sub>. Let ω be a sequence A\* of operations in H with ε as an empty trace. Let per<sub>i</sub>(o) be an operation that P<sub>i</sub> "perceives" the operation o. For example, per<sub>i</sub>(r<sub>i</sub>(x, v)), per<sub>i</sub>(r<sub>j</sub>(x, v)), per<sub>i</sub>(w<sub>i</sub>(x, v)) and per<sub>i</sub>(w<sub>i</sub>(x, v)) are r<sub>i</sub>(x, v) by P<sub>i</sub>, r<sub>j</sub>(x, v) by P<sub>i</sub>, w<sub>i</sub>(x, v) by P<sub>i</sub> and Apply; such that P<sub>i</sub> applies $w_j(x,v)$ to its local memory, respectively. (See an operation schema $Apply_i$ that will be described in section 5.) Now we define the above definitions in terms of the CCS semantics as follows respectively: $\bullet \quad o_1, o_2 \in L_1, \exists \, \omega \in A^* \bullet$ $\begin{array}{l} \langle o_1, e_2 \rangle \cup \langle o_1, E_1, s_1 \rangle \xrightarrow{\omega} \langle o_2, E_2, s_2 \rangle \\ \langle o_1, o_2 \in \mathcal{H}, \exists \omega \in A^* \bullet \\ \langle o_1, E_1, s_1 \rangle \xrightarrow{\omega} \langle o_2, E_2, s_2 \rangle \\ \forall \sigma \in A \bullet \exists per_i(\sigma) \in S_i) \end{array}$ $\begin{array}{l} j,k:1\dots n,\forall \ per_i(r_k(x_i,v_1))\in S_i\bullet\\ ((\exists \ per_i(w_j(x_i,v_i))\in S_i,\\ per_i(w_j(x_i,v_m))\notin \omega\bullet\\ (per_i(w_j(x_i,v_m)),E_j,s_j) \end{array}$ $\stackrel{\text{per}_{\mathbf{i}}(\mathbf{w}_{\mathbf{j}}(\mathbf{x}_{\mathbf{i}},\mathbf{v}_{\mathbf{i}}))\omega}{\rightarrow} \langle per_{\mathbf{i}}(r_{\mathbf{k}}(x_{\mathbf{i}},v_{\mathbf{i}})).E_{\mathbf{i}},s_{\mathbf{i}}\rangle)$ $\begin{array}{l} (((\exists per_i(w_j(x_1, v_m)) \in S_i, \\ per_i(w_j(x_1, v_m)) \notin \omega \bullet \\ (start.E_0, s_0) \xrightarrow{\omega} (per_i(r_k(x_1, v_1)).E_i, s_i)) \\ \lor per_i(w_j(x_1, v_m)) \notin S_i \Rightarrow v_1 = \bot))) \\ \bullet \ \forall o_1, o_2 \in A, \exists \omega_1, \omega_2 \in A^* \bullet \end{array}$ $\langle o_1.E_1, s_1 \rangle \xrightarrow{\omega_1} \langle o_2.E_2, s_2 \rangle \Rightarrow$ $\langle per_i(o_1).E_1', s_1' \rangle \stackrel{\omega_2}{\longrightarrow} \langle per_i(o_2).E_2', s_2' \rangle$ ## 4.2 Definition of Causal Memory Consistency Model In4), Hutto and Ahamad define write-into order and causality order for the definition of the causal memory consistency model as follows: A write-into order $\mapsto$ on H is any relation with the following properties: if $o_1 \mapsto o_2$ , then x and v exist such that $o_1 =$ w(x, v) and $o_2 = r(x, v)$ ; for any operation $o_2$ , there is at most one $o_1$ such that $o_1 \mapsto o_2$ ; if $o_2 = r(x, v)$ for some x and there is no $o_1$ such that $o_1 \mapsto o_2$ , then $v = \perp$ ; that is, a read with no write must read the initial value. A causality order $o_1 \rightsquigarrow o_2$ on H if and only if one of the following cases holds: • $o_1 \xrightarrow{i} o_2$ for some $p_i$ ( $o_1$ precedes $o_2$ in $L_i$ ); $o_1 \mapsto o_2(o_2 \text{ reads the value written by } o_1)$ ; or there is some other operation o' such that $o_1 \sim o' \sim o_2$ (If the relation is cyclic, then it is not the causality order.) A history H is causal if it has the causality order such that: CM: for each process $P_i$ , there is a serialization $S_i$ of $A_{i+w}^H$ that respects $\sim$ . Now we define write-into order on H in terms of the CCS semantics. $$\begin{array}{l} i,j:1\dots n,\forall \ r_i(x_i,v_1)\in A \bullet \\ ((\exists \ w_j(x_i,v_1)\in A,\omega\in A^* \bullet \\ \langle w_j(x_i,v_1).E_j,s_j\rangle \overset{\leftarrow}{\to} \langle r_i(x_i,v_1).E_i,s_i\rangle) \\ \lor \\ (((\exists \ w_j(x_1,v_m)\in A,w_j(x_1,v_m)\notin \omega \bullet \\ \langle start.E_0,s_0\rangle \overset{\leftarrow}{\to} \langle r_i(x_i,v_1).E_i,s_i\rangle) \\ \lor \ w_j(x_1,v_m)\notin A \Rightarrow v_1=\bot))) \end{array}$$ By iteration of applying trace transition rule: $\exists o_1, o', o_2 \in A, \omega_1, \omega_2 \in A^* \bullet$ $\langle o_1.E_1, s_1 \rangle \xrightarrow{\omega_1} \langle o'.E', s' \rangle \quad \langle o'.E', s' \rangle \xrightarrow{\omega_2} \langle o_2.E_2, s_2' \rangle$ $\langle o_1.E, s_1 \rangle \xrightarrow{\omega_1 \omega_2} \langle o_2.E_2, s_2 \rangle$ Then, if there are a trace $\omega_1$ corresponding to $o_1 \rightarrow o'$ and a trace $\omega_2$ corresponding to $o' \rightarrow o_2$ , a trace $\omega_1 \omega_2$ corresponding to $o_1 \rightarrow o' \rightarrow o_2$ always Since the causally-precedes relation → which the extended state-based CCS semantics uses is a partial order, - on traces with vector clocks is acyclic. Now we can define the causal memory consistency condition in terms of the CCS semantics as follows: ``` A history H is causal ⇔ j:1\ldots n, \forall i:1\ldots n \bullet ((\forall o \in A_{i+w}^{H} \bullet \exists per_{i}(o) \in S_{i} of A_{i+w}^{H}) (\forall \ r_{\mathtt{i}}(x_{\mathtt{l}},v_{\mathtt{l}}) \in S_{\mathtt{i}} \ of \ A_{\mathtt{i}+\mathtt{w}}^{\mathtt{H}} \bullet ((\exists per_{i}(w_{j}(x_{1},v_{1})) \in S_{i} of A_{i+w}^{H}, \begin{array}{l} per_{\mathbf{i}}(w_{\mathbf{j}}(x_{1},v_{\mathbf{m}})) \notin \omega \bullet \\ \langle per_{\mathbf{i}}(w_{\mathbf{j}}(x_{1},v_{1})).E_{\mathbf{j}},s_{\mathbf{j}} \rangle \end{array} \vee (((\exists per_{i}(w_{j}(x_{1},v_{m})) \in S_{i} of A_{i+w}^{H}, per_{\mathbf{i}}(w_{\mathbf{j}}(x_{1}, v_{\mathbf{m}})) \notin \omega \bullet \\ \langle start. E_{0}, s_{0} \rangle \xrightarrow{\omega} \langle r_{\mathbf{i}}(x_{1}, v_{1}). E_{\mathbf{i}}, s_{\mathbf{i}} \rangle) \vee per_{i}(w_{j}(x_{1}, v_{m})) \notin S_{i} \text{ of } A_{i+w}^{H} \Rightarrow v_{1} = \bot)))) ((\forall o_1, o_2 \in L_j \text{ of } A_{i+w}^{\mathsf{H}} \bullet \exists \omega_1, \omega_2 \in A^* \bullet \langle o_1.E_1, s_1 \rangle \stackrel{\omega_1}{\rightarrow} \langle o_2.E_2, s_2 \rangle) \Rightarrow (per_i(o_1).E'_1,s'_1) \xrightarrow{\omega_2} (per_i(o_2).E'_2,s'_2)) (\forall r_{i}(x_{1}, v_{1}) \in S_{i} \text{ of } A_{i+w}^{H} \bullet \exists \omega_{1}, \omega_{2} \in A^{*} \bullet \langle w_{\mathtt{j}}(x_{\mathtt{l}}, v_{\mathtt{l}}). E_{\mathtt{j}}, s_{\mathtt{j}} \rangle \xrightarrow{\omega_{\mathtt{l}}} \langle r_{\mathtt{i}}(x_{\mathtt{l}}, v_{\mathtt{l}}). E_{\mathtt{i}}, s_{\mathtt{i}} \rangle ) \Rightarrow \langle per_{\mathbf{i}}(w_{\mathbf{j}}(x_{1},v_{1})).E'_{\mathbf{j}},s'_{\mathbf{j}}\rangle \stackrel{\omega_{2}}{\longrightarrow} \langle r_{\mathbf{i}}(x_{1},v_{1}).E'_{\mathbf{i}},s'_{\mathbf{i}}\rangle)\rangle) ``` ## 5. A Description of Causal Memory In this section, using the combination of Z and CCS, a formal specification of causal memory, which was proposed by Ahamad et al4), is described. First, the states and operation schemas of each process are specified in Z. Weak vector clocks are adopted here as logical time of distributed systems like Marzullo and Neiger did6). Second, the concurrency aspects of causal memory are specified in CCS. 5.1 Specifying the component of each process in Z Each process $P_i$ has a schema $s_i$ of the state in Z. Each schema $s_i$ consists of seven local data structures; a process identity number i, a local memory $M_i$ of the abstract shared causal memory $\mathcal{M}$ , a vector timestamp $t_i$ which is used for updating the local timestamp, two message queues dating the local timestamp, two message queues $OutQueue_i$ and $InQueue_i$ , a local execution history $L_i$ of A which is a set of read and write operations, and a serialization $S_i$ of $A_{i+w}^H$ which is a set of all operations by $P_i$ and all write operations in H. $OutQueue_i$ is a first-in-first-out queue and contains information about write operations to be I. contains information about write operations to local memory that have not been communicated to other processes yet. InQueue; is ordered by vector timestamps and contains information about remote write operations to its remote memory that have not been written to local memory yet. The schema $s_i$ of $P_i$ 's state is described using Z as follows: [M, A, Val] $write\_tuple == N_1 \times \mathcal{M} \times Val \times seq N$ Number Of Processes: N1 MaxOutQueue, MaxInQueue: N1 MaxSerial, MaxLocalHis: N1 $priority\_queue: (seq write\_tuple) \times write\_tuple$ + seq write\_tuple $i: \mathbb{N}_1$ $M_i: \mathcal{M} \rightarrow (Val \cup \{\bot\})$ $t_{\mathtt{i}}: \mathsf{seq}\, \mathbb{N}$ OutQueue: seq write\_tuple InQueue: seq write\_tuple $L_{\mathtt{i}}: \operatorname{seq} A$ $S_i$ : seq A $#t_i = NumberOfProcesses$ # $i_1 = Number Of Processes #OutQueue #OutQueue_1 \leq MaxOutQueue #InQueue #<math>L_1 \le MaxLocalHis #S_1 \le MaxSerial$ Pi has an initialization operation schema InitPi and five basic operation schemas; Readi, Writei, Sendi, Receivei, and Applyi. InitP<sub>i</sub> $pn_i$ ? : $N_1$ $i' = pn_i$ ? $M_i' = \lambda x : \mathcal{M} \bullet \perp$ $t_i' = \lambda n : 1 ... Number Of Processes \bullet 0$ $OutQueue'_i = <>$ $InQueue'_i = <>$ =<> $S_i' = <>$ A read operation schema Read; is executed when- ever a read operation to a location x? is invoked by $P_i$ . Then, the value v! stored in $M_i(x?)$ is sent to $P_i$ . A label $r_i(x?, v!)$ of the read operation is added to a local execution history $L_i$ and a serialization $S_i$ . ``` Readi \Delta s_i x?:\mathcal{M} v! : Value v! = M_i x? i'=i M_i' = M_i t_i' = t_i OutQueue'_i = OutQueue_i InQueue; = InQueue; L_i' = L_i \cap \langle r_i(x?, v!) \rangle S_i' = S_i \cap \langle r_i(x?, v!) \rangle ``` A write operation schema Write; is executed whenever a write operation of a value v? to a locawhenever a write operation of a value v: we a location x? is invoked by $P_i$ . $P_i$ increases t[i], writes v? to $M_i(x)$ , and appends the tuple (i, x), v, $t_i$ to $OutQueue_i$ . This tuple is called a write\_tuple which is a message to other processes. A label $w_i(x?, v?)$ of the write operation is appended to $L_i$ and $S_i$ . Write: $\Delta s_i \ x?: \mathcal{M}$ v? : Value i' = i $M_i' = M_i \oplus \{x? \mapsto v?\}$ $t_i'i=t_ii+1$ $k:1..\#t_i\mid k\neq i\bullet t_i'k=t_ik$ $OutQueue'_i = OutQueue_i \land < (i, x?, v?, t'_i) >$ $InQueue'_{i} = InQueue_{i}$ $L_{\mathtt{i}}' = L_{\mathtt{i}} \cap \langle w_{\mathtt{i}}(x?, v?) \rangle$ $S_i' = S_i \cap \langle w_i(x?, v?) \rangle$ The information about local write operations in OutQueue; must be notified to all other processes. A send operation schema Send; sends a nonempty prefix of OutQueue; to all other processes and re- prefix of $OutQueue_i$ to all other processes and removes it from $OutQueue_i$ . When a message is received by $P_i$ , a receive operation schema $Receive_i$ is executed. $Receive_i$ appends the message to $InQueue_i$ which is a priority queue sorted by vector timestamps. The $InQueue_i$ and an element with vector timestamps are inputted to a function $priority\_queue$ . The $priority\_queue$ attaches the element to $InQueue_i$ , and returns the new $InQueue_i$ . Although it is not and returns the new InQueue. Although it is not hard to specify the function priority\_queue in Z, the specification is not presented here because of lack of space. ``` Send_i \Delta s_{i} massage! : write_tuple \overline{Outqueue_i} \neq <> i' = i M_i' = M_i t_i' = t_i massage! = head OutQueue; OutQueue' = tail OutQueuei InQueue'_i = InQueue_i L'_{i} = L_{i} S'_{i} = S_{i} ``` ``` Receive: \Delta s_i massage?: write\_tuple i' = i M'_i = M_i t'_i = t_i OutQueue'_i = OutQueue_i InQueue'_i = priority\_queue(InQueue_i, massage?) L'_i = L_i S'_i = S_i ``` Apply; compares the local timestamp $t_i$ with a remote timestamp $t_j$ associated with the write operation which was executed by the remote process $P_j$ . A write operation can be applied to local memory only if all components of $t_i$ (other than the jth) are fewer than or equal to those of $t_i$ and if the jth component of $t_i$ is more than the jth component of $t_i$ exactly by one. When a write operation is applied, it is removed from $InQueue_i$ , the corresponding component of the local vector timestamp $t_i[j]$ is updated, and the new value $v_j$ is written to $M_i(x_j)$ . This means that such a write operation of $w_j(x_j, v_j)$ will be the most recent preceding write operation of the write-into order relation to the following read operation of $v_i(x_j, v_j)$ , $w_j(x_j, v_j) \mapsto r_i(x_j, v_j)$ where the vector timestamp of $w_j(x_j, v_j)$ is less than or equal to that of $r_i(x_j, v_j)$ . A label $w_j(x_j, v_j)$ of the write operation is appended to $S_i$ . ``` \begin{array}{l} Apply_{i} \\ \Delta s_{i} \\ (j,x_{j},v_{j},t_{j}): write\_tuple \\ \hline\\ InQueue_{i} \neq <> \\ i'=i \\ (j,x_{j},v_{j},t_{j}) = head InQueue_{i} \\ k:1..\#t_{i} \mid k \neq j \bullet t_{j} k \leq t_{i} k \\ \land t_{j}j = t_{i}j + 1 \\ M'_{i} = M_{i} \oplus \{x_{j} \mapsto v_{j}\} \\ t'_{i}j = t_{j}j \\ k:1..\#t_{i} \mid k \neq j \bullet t'_{i}k = t_{i}k \\ OutQueue'_{i} = OutQueue_{i} \\ InQueue'_{i} = tail InQueue_{i} \\ L'_{i} = L_{i} \\ S'_{i} = S_{i} \cap < w_{j}(x_{j},v_{j}) > \\ \end{array} ``` 5.2 Specifying the concurrency aspect of causal memory in CCS In this section, we specify the concurrency as- pects of causal memory in ČCS. We assume that causal memory has k processes. Each of the processes is connected with all other processes through input ports $pipe_1 \cdots pipe_k$ , and output ports $\overrightarrow{pipe_1} \cdots \overrightarrow{pipe_k}$ . Each process $P_i$ consists of six operation schemas specified above in Z and four basic input or output ports; $id_i$ , $loc_i$ , $val_i$ , and $pipe_i$ . An input port $id_i$ is initially executed by each process $P_i$ to obtain its process identity number. The other ports $loc_i$ , $val_i$ , and $pipe_i$ are used for input or output ports in the following abbreviated actions: $r_i(x?, v!)$ , $w_i(x?, v?)$ , and $broadcast_i(m!)$ . An action $r_i(x?, v!)$ is an abbreviated action of An action $r_i(x^2, v!)$ is an abbreviated action of blocked sequential actions: First, an input port $loc_i$ is executed whenever a location x? for a read action is received through the input port $loc_i$ . Second, a read operation schema $Read_i$ is executed by $P_i$ . Finally, the value v! stored in $M_i(x?)$ is sent to $P_i$ through an output port $\overline{val_i}$ . An action $w_i(x^2, v^2)$ is also an abbreviated action of blocked sequential actions: First, an input port $heap_i$ is executed whenever a value $v^2$ to a location $x^2$ for a write action is received through the input port $heap_i$ . Second, a write operation schema $Write_i$ is executed by $P_i$ . We assume that these sequential actions in $r_i(x?, v!)$ and $w_i(x?, v?)$ are blocked, which means that other processes which run concurrently are blocked during those executions. It is not hard to specify that kind of blocked operations in CCS, but makes the specification hard to be understood. Hence we simply assume that these actions are blocked here. $$r_i(x?, v!) \equiv loc_i(x?).Read_i.\overline{val_i}(v!)$$ $w_i(x?, v?) \equiv heap_i(x?, v?).Write_i$ A broadcast action $broadcast_i(m!)$ is an abbreviation action that sends a message m! to all other processes through all output ports $\overline{pipe}$ except $\overline{pipe}_i$ as follows: $broadcast_i(m!) \equiv Send_i$ . $$\frac{\overline{pipe_1}(m!).\cdots.\overline{pipe_{i-1}}(m!).}{\overline{pipe_{i+1}}(m!).\cdots.\overline{pipe_k}(m!).}$$ $\overline{pipe_{i+1}}(m!)....pipe_k(m!)$ We then specify the concurrency aspects of the processes $P_i$ in CCS. $$P_{i} \stackrel{\text{def}}{=} r_{i}(x?, v!).P_{i} + \\ w_{i}(x?, v?).P_{i} + \\ broadcast_{i}(message!).P_{i} + \\ pipe_{i}(message?).Receive_{i}.P_{i} + \\ Apply_{i}.P_{i}$$ Each process is connected with all other processes through input ports $pipe_1 \cdots pipe_k$ and output ports $pipe_1 \cdots pipe_k$ to communicate the information about local writes to local memory. Then we specify a causal memory $\mathcal{CM}$ in CCS as follows: $$K = \{pipe_1, \dots, pipe_k\}$$ $CM \equiv id(1).InitP_1.\dots.id(k).InitP_k.$ $(P_1 \mid \dots \mid P_k) \setminus K$ #### 6. Verification of Causal Memory In this section we should describe the verification that the specified causal memory described in section 5 meets the causal memory consistency condition described in section 4 using the extended state-based CCS semantics. But we present only the verification of the following theorem 3 here because of lack of space. In<sup>4</sup>), Ahamad and Hutto prove the following Lemma 1,2 and Theorem 3. **Lemma 1:** Let H be a history of the implementation, ts(o) be the timestamp of an operation o, and $o_1$ and $o_2$ be two operations such that $o_1 \rightarrow o_2$ . Then $ts(o_1) \leq ts(o_2)$ . Furthermore, if $o_2$ is a write operation by $P_i$ , $ts(o_1)[i] < ts(o_2)[i]$ , so $ts(o_1) \prec ts(o_2)$ . **Lemma 2:** Let H be a history of the implementation and suppose that $w_j(x, v)$ is a write operation of process $P_j$ . Then each process $P_i$ eventually applies $w_j(x, v)$ to its local memory. Theorem 3: Let H be a history of the implemen- Proof: An inspection of operations Readi, Writei and $Apply_i$ shows that the serialization $S_i$ for Fincludes all writes in H (by Lemma 2) and all read operation in $L_i$ . Thus, $\forall i: 1 \dots n \bullet \forall o \in A_{i+w}^H \bullet \exists per_i(o) \in S_i \text{ of } A_{i+w}^H$ An inspection of operation schemas $Read_i$ , $Write_i$ and $Apply_i$ shows that a local memory $M_i$ is updated such that $M_i' = M_i \oplus \{x? \mapsto v?\}$ by $P_i$ and the value v! such that $v! = M_i x?$ is reported to $P_i$ . Then each read operation schema notifies the value that most recently write operation schema has written. Thus, the following transitions exist: $j:1\ldots n, \forall i:1\ldots n \bullet$ $(\forall r_i(x_l, v_l) \in S_i \text{ of } A^H_{i+w} \bullet \\ ((\exists per_i(w_j(x_l, v_k)) \in S_i \text{ of } A^H_{i+w}, \\ per_i(w_j(x_l, v_m)) \notin \omega \bullet \\ \langle per_i(w_j(x_l, v_k)).E_j, s_j \rangle$ $\xrightarrow{per_i(w_j(x_l,v_k))\omega} \langle r_i(x_l,v_l).E_i,s_i \rangle)$ $(((\exists per_i(w_j(x_l, v_m)) \in S_i \text{ of } A_{i+w}^H, per_i(w_j(x_l, v_m)) \notin \omega \bullet \\ \langle start. E_0, s_0 \rangle \xrightarrow{\omega} \langle per_i(r_i(x_l, v_l)). E_i, s_i \rangle) \\ \vee per_i(w_j(x_l, v_m)) \notin S_i \text{ of } A_{i+w}^H \Rightarrow v_l = \bot))))$ The state $s_j'$ has a maple $x_l \mapsto v_k$ in the local memory $M_i$ . The maple $x_l \mapsto v_k$ is not updated by $P_i$ between the state $s_j'$ and the state $s_i$ , because $v_i(x_l, v_l) = f_i(x_l, f_i(x_$ $w_j(x_l, v_m) \notin \omega$ . Thus $v_l = v_k (= M_i x_l)$ . Let $o_1$ and $o_2$ be operations in $A^H_{i+w}$ such that $\sim$ . By Lemma 1, $ts(o_1) \leq ts(o_2)$ . One of the following case must holds. We assume that $j \neq i$ . $o_1 \xrightarrow{i} o_2$ by $p_i$ . An inspection of operation schemas Read; and Write; shows that these operations are concatenated to $L_i$ and $S_i$ in the same oder in which these operation are executed by $P_i$ . Then $o_1$ precedes $o_2$ in $S_i$ . • $o_1 \xrightarrow{\gamma} o_2$ by $p_j$ . This means that $o_1$ and $o_2$ are both writes. By Lemma 1, $ts(o_1) \prec ts(o_2)$ . An inspection of operation schema $Receive_i$ and $Apply_i$ shows that $o_1$ is applied by $P_i$ before $o_2$ . Then $o_1$ precedes $o_2$ in $S_i$ . $o_1 \mapsto o_2$ by $p_i$ . This means that $o_1$ is a write operation and $o_2$ is a read operation. By Lemma 1, $ts(o_1) \prec ts(o_2)$ . An inspection of operation schema $Receive_i$ , $Apply_i$ and $Read_i$ shows that $o_1$ is applied by $P_i$ before $o_2$ . Then $o_1$ precedes $o_2$ in $\hat{S}_i$ . $\begin{array}{lll} o_1 & o_1' & o_2 \\ o_1 & o_1' & o_2 \\ \exists o_1, o', o_2 \in A, \omega_1, \omega_2 \in A^* \bullet \\ \langle o_1.E_1, s_1 \rangle \xrightarrow{\omega_1} \langle o'.E', s' \rangle & \langle o'.E', s' \rangle \xrightarrow{\omega_2} \langle o_2.E_2, s'_2 \rangle \end{array}$ Thus, the following condition holds. $j:1\ldots n, \forall i:1\ldots n \bullet$ $((\forall o_1, o_2 \in L_j \text{ of } A_{i+w}^H \bullet \exists \omega_1, \omega_2 \in A^* \bullet$ $\begin{array}{c} \langle o_1.E_1, s_1 \rangle \stackrel{\omega_1}{\rightarrow} \langle o_2.E_2, s_2 \rangle) \Rightarrow \\ \langle per_i(o_1).E_1', s_1' \rangle \stackrel{\omega_2}{\rightarrow} \langle per_i(o_2).E_2', s_2' \rangle) \end{array}$ $(\forall r_i(x_l, v_l) \in S_i \text{ of } A_{i+w}^H \bullet \exists \omega_1, \omega_2 \in A^* \bullet$ $\langle w_j(x_l, v_l).E_j, s_j \rangle \xrightarrow{\omega_1} \langle r_i(x_l, v_l).E_i, s_i \rangle) \Rightarrow$ $\langle per_i(w_j(x_l,v_l)).E_j',s_j' \rangle \stackrel{\omega_3}{\longrightarrow} \langle r_i(x_l,v_l).E_i',s_i' \rangle))$ Thus, the proof is complete. #### 7. Conclusion In this paper we described causal memory using the formal specification technique proposed by Taguchi and Araki<sup>1)</sup>. We presented an extension of the state-based CCS semantics to a sequence of actions with finite length in order to deal with a sequence of actions and operations. Using this extended state-based CCS semantics, we described causal memory consistency model and causal memory, then verified that causal memory meets causal memory consistency model. We have much future work remain to be done. First, we should adopt this technique to other memory consistency models, especially release consistency models, and have to develop a new formal technique for specifying lock and release operations. Second, we will be able to propose a new memory consistency model and present a design of new parallel computer architectures with the new memory consistency model. Those models and architecture designs can be formally described and verified by our proposing formal techniques. ## Acnowledgements We would like to thank Keijiro Araki at Kyushu University. #### References 1) Kenji Taguchi and Keijiro Araki. The State-based CCS Semantics for Concurrent Z Specification. In Proc. of the 1st Int'l Conf. of Formal Engineering Methods, pages 283-292. IEEE, November Communication and Concurrency. R.Milner. Prentice Hall, 1989. 3) Phillip W. Hutto and Mustaque Ahamad. Slow memory: Weakening consistency to enhance concurrency in distributed shared memories. In Proc. of the 10th Int'l Conf. on Distributed Computing Systems, pages 302-311, May 1990. 4) Mustaque Ahamad, Gil Neiger, Prince Kohli, James E. Burns, and Phillip W. Hutto. Causal memory: Definitions, implementation and programming. Technical Report 93/55, College of Computing, Georgia Institute of Tecnology, September 1993. 5) Keith Marzullo and Laura Sabel. Using consistent subcuts for detecting stable properties. In Proc. of the 5th Workshop on Distributed Algo-rithms and Graphs, October 1991. 6) Keith Marzullo and Gil Neiger. Detection of global state predicates. Technical Report 91/39, College of Computing, Georgia Institute of Tecnology, 1991. 7) Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558-565, July 1978. 8) Friedmann Mattern. Virtual time and global states of distributed systems. In Michel Cosnard, Patrice Quinton, Yves Robert, and Michel Raynal, editors, Proc. of the 10th Int'l Workshop on Parallel and Distributed Algorithms, pages 215-226. North-Holland, October 1988. Colin Stirling. Modal and temporal logic for pro-cesses. In Logic for Concurrency, number 1043 in LNCS, pages 149-237. Springer-Verlag, 1996.