32 Mutable State

$\newcommand{\TirName}[1]{\text{#1}} \newcommand{\inferrule}[3][]{ \let\and\qquad \begin{array}{@{}l@{}} \TirName{#1} \\ \displaystyle \frac{#2}{#3} \end{array} } \newcommand{\infer}[3][]{\inferrule[#1]{#2}{#3}} $

Recall that the most characteristic feature of imperative computation is mutation—that is, executing assignment for its side effect that updates a memory (cf. Section 3.1).

32.1 JavaScripty: Mutable Variables

32.1.1 Syntax

To introduce mutable state, we introduce mutable variables declared as follows:

\[ \mathbf{var}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2 \]

A $\mathbf{var}$ declaration allocates a new mutable variable $x$, assigns it an initial value from evaluating $e_1$, and whose scope is $e_2$.

We also introduce an assignment expression:

\[ e_1 \mathrel{\texttt{=}}e_2 \]

that writes the value of $e_2$ to a memory location named by expression $e_1$. Note that we use the C and JavaScript-style assignment operator $\texttt{=}$, which unfortunately looks like mathematical equality but is very different.

Let us consider JavaScripty with number literals and mutable variables:

\[ \begin{array}{rrrl} \text{expressions} & e& \mathrel{::=}& n \mid x \mid\mathbf{var}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2 \mid e_1 \mathrel{\texttt{=}}e_2 \mid\mathord{\texttt{*}}e_1 \mid a \\ \text{variables} & x \\[1ex] \text{values} & v& \mathrel{::=}& n \\ \text{location values} & l& \mathrel{::=}& \mathord{\texttt{*}}a \\ \text{addresses} & a \end{array} \]

Figure 32.1: Abstract syntax of JavaScripty with number literals and mutable variables.

To focus on mutation, let us drop $\mathbf{const}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2$ constant-variable declarations for the moment and have only $\mathbf{var}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2$ mutable-variable declarations. The $\mathbf{var}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2$ mutable-variable declaration allocates a new memory cell with fresh address $a$, evaluates $e_1$ to a value $v_1$, stores value $v_1$ in the new memory cell, and evaluates $e_2$ with $x$ in scope pointing to the new memory cell.

The assignment expression $e_1 \mathrel{\texttt{=}}e_2$ evaluates $e_1$ to a location value $l_1$, $e_2$ to a value $v_2$, updates the memory cell named by $l_1$ with value $v_2$, and returns $v_2$.

We introduce the unary, pointer-dereference expression $\mathord{\texttt{*}}e_1$ in the abstract syntax to use as an intermediate expression during reduction. It is not present in the concrete syntax of JavaScript, though it corresponds to the pointer-dereference expression in the C and C++ languages.

An address $a$ is a memory address for a memory cell that stores some content like values. A location value $l$ is a reduced expression that names a memory location.

Note that we do not consider an address $a$ to be a value here. In low-level languages like C and C++, an address is called a pointer and is a first-class value (i.e., an address is a $a$ is a value). A location value is also called an l-value for the value of an expression on the left-hand–side of an assignment, while a value is called a r-value for the value of an expression on the right-hand–side of an assignment.

trait Expr                                                     // e
case class N(n: Double) extends Expr                           // e ::= n
case class Var(x: String) extends Expr                         // e ::= x
case class VarDecl(x: String, e1: Expr, e2: Expr) extends Expr // e ::= var x = e1; e2
case class Assign(e1: Expr, e2: Expr) extends Expr             // e ::= e1 = e2
case class Deref(e1: Expr) extends Expr                        // e ::= *e1
case class A(a: Int) extends Expr                              // e ::= a

def isValue(e: Expr): Boolean = e match {
  case N(_) => true
  case _ => false
}

defined trait Expr
defined class N
defined class Var
defined class VarDecl
defined class Assign
defined class Deref
defined class A
defined function isValue

We represent an address $a$ in Scala as an A(a) for a positive integer a. We can think of the address A(1) corresponding to the hexadecimal 0x00000004 memory address on a 32-bit machine.

Let consider the example expression with assignment:

JavaScript

var i = 1;
i = 2

AST Representation in Scala

val e_assign =
  VarDecl("i", N(1),
    Assign(Var("i"), N(2))
  )

e_assign: VarDecl = VarDecl(
  x = "i",
  e1 = N(n = 1.0),
  e2 = Assign(e1 = Var(x = "i"), e2 = N(n = 2.0))
)

32.1.2 Small-Step Operational Semantics

32.1.2.1 Memories

A mutable variable is a variable that can be updated. We can think of a mutable variable as a box that can be filled with a value, and then the value can be updated by filling the box with a new value. A memory cell $[a \mapsto v]$ is such a box that has an address $a$ to reference that box. A memory $m$ is then a finite set of such memory cells:

\[ \begin{array}{rrrl} \text{memories} & m& \mathrel{::=}& \cdot\mid m[a \mapsto v] \end{array} \]

Figure 32.2: Memories for JavaScripty with mutable variables.

We also view a memory $m$ as a finite map from addresses $a$ to values $v$ and write $m(a)$ for looking up the value $v$ corresponding to address $a$ in memory $m$.

32.1.2.2 Location Values

We have noted above that a location value $l$ is a reduced expression that names a memory location. Like $e\;\mathsf{value}$ defining values, we can view this as unary judgment form $e\;\mathsf{location}$ on expressions:

$\fbox{$e\;\mathsf{location}$}$

$\inferrule[VarLocation]{ }{ \mathord{\texttt{*}}a \;\mathsf{location} }$

Figure 32.3: Location values for JavaScripty with mutable variables.

In particular, the location value for a variable is given by the expression $\mathord{\texttt{*}}a$ for some address $a$.

32.1.2.3 Judgment Form for Imperative Computation

To define a small-step operational semantics with mutable variables, we need to update our reduction-step judgment form to include memories:

\[ \langle e, m \rangle \longrightarrow \langle e', m' \rangle \]

that says, “Closed expression $e$ with memory $m$ reduces to closed expression $e'$ with updated memory $m'$.”

We can see the state of the machine as a pair of the expression $e$ and the memory $m$: \[ \begin{array}{rrrl} \text{states} & \sigma& \mathrel{::=}& \langle e, m \rangle \end{array} \]

and thus the small-step judgment form is $\sigma \longrightarrow\sigma'$. This machine state $\sigma$ with a program $e$ that we execute for its effects on an off-to-the-side memory $m$ is the characteristic feature of imperative computation.

In pure functional computation, the machine state $\sigma$ is just the program $e$ that we evaluate to a value (i.e., iterate $e \longrightarrow e'$).

32.1.2.4 Inference Rules for Mutation

The $\TirName{DoDeref}$ rule says that dereferencing an address $\mathord{\texttt{*}}a$ reduces to the value $v$ stored in the memory cell named by $a$:

\[ \inferrule[DoDeref]{ [a \mapsto v] \in m }{ \langle \mathord{\texttt{*}}a , m \rangle \longrightarrow \langle v, m \rangle } \]

Note that it would be equivalent to write $\TirName{DoDeref}$ as follows: \[ \inferrule[DoDeref]{ }{ \langle \mathord{\texttt{*}}a , m \rangle \longrightarrow \langle m(a) , m \rangle } \]

The $\TirName{DoAssignVar}$ rule says that assigning value $v$ to memory location $\mathord{\texttt{*}}a$ in memory $m$ updates memory $m$ with the cell $[a \mapsto v]$:

\[ \inferrule[DoAssignVar]{ a\in \operatorname{dom}(m) }{ \langle \mathord{\texttt{*}}a \mathrel{\texttt{=}} v , m \rangle \longrightarrow \langle v, m[a \mapsto v] \rangle } \]

The side-condition that the address $a$ is in the domain of memory $m$ (i.e., $a\in \operatorname{dom}(m)$) says that there is already an allocated memory cell $[a \mapsto v_0]$ in $m$ so that we are writing $m[a \mapsto v]$ to mean updating that cell from $v_0$ to $v$ in memory $m$.

We shall see that memory addresses $a$ are introduced by $\mathbf{var}$ allocation, so the alternative system that does not check this condition in the $\TirName{DoAssignVar}$ rule \[ \inferrule[DoAssignVar]{ }{ \langle \mathord{\texttt{*}}a \mathrel{\texttt{=}} v , m \rangle \longrightarrow \langle v, m[a \mapsto v] \rangle } \]

is essentially the same (technically, called being bisimilar).

The $\TirName{DoVarDecl}$ rules describes allocating a new memory cell for a mutable variable: \[ \inferrule[DoVarDecl]{ a\notin \operatorname{dom}(m) }{ \langle \mathbf{var}\;x\;\texttt{=}\;v_1\texttt{;}\;e_2 , m \rangle \longrightarrow \langle {}[ \mathord{\texttt{*}}a /x]e_2 , m[a \mapsto v_1] \rangle } \]

We choose a fresh address $a$, which we state with the side condition that $a\notin \operatorname{dom}(m)$ and thus the $[a \mapsto v_1]$ is a new cell in the updated memory $m[a \mapsto v_1]$. The reduced expression ${}[ \mathord{\texttt{*}}a /x]e_2$ is interesting. The scope of variable $x$ is the continuation expression $e_2$, so we must eliminate free-variable occurrences of $x$ in $e_2$. We do this by substituting the location value $\mathord{\texttt{*}}a$ corresponding to $x$ in $e_2$.

In the following, we repeat the above $\TirName{Do}$ rules along with the needed $\TirName{Search}$ for mutable variables:

$\fbox{$ \langle e, m \rangle \longrightarrow \langle e', m' \rangle $}$

$\inferrule[DoDeref]{ }{ \langle \mathord{\texttt{*}}a , m \rangle \longrightarrow \langle m(a) , m \rangle }$

$\inferrule[DoAssignVar]{ }{ \langle \mathord{\texttt{*}}a \mathrel{\texttt{=}} v , m \rangle \longrightarrow \langle v, m[a \mapsto v] \rangle }$

$\inferrule[DoVarDecl]{ a\notin \operatorname{dom}(m) }{ \langle \mathbf{var}\;x\;\texttt{=}\;v_1\texttt{;}\;e_2 , m \rangle \longrightarrow \langle {}[ \mathord{\texttt{*}}a /x]e_2 , m[a \mapsto v_1] \rangle }$

$\inferrule[SearchAssign1]{ \langle e_1, m \rangle \longrightarrow \langle e_1', m' \rangle \and e_1 \neq l_1 }{ \langle e_1 \mathrel{\texttt{=}} e_2 , m \rangle \longrightarrow \langle e_1' \mathrel{\texttt{=}} e_2 , m' \rangle }$

$\inferrule[SearchAssign2]{ \langle e_2, m \rangle \longrightarrow \langle e_2', m' \rangle }{ \langle l_1 \mathrel{\texttt{=}} e_2 , m \rangle \longrightarrow \langle l_1 \mathrel{\texttt{=}} e_2' , m' \rangle }$

$\inferrule[SearchVarDecl]{ \langle e_1, m \rangle \longrightarrow \langle e_1', m' \rangle }{ \langle \mathbf{var}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2 , m \rangle \longrightarrow \langle \mathbf{var}\;x\;\texttt{=}\;e_1'\texttt{;}\;e_2 , m' \rangle }$

Figure 32.4: Small-step operational semantics with mutable variables.

The $\TirName{SearchAssign1}$ rule says that if $e_1$ is not a location value, then we need to reduce it to be able to do the assignment: \[ \inferrule[SearchAssign1]{ e_1 \neq l_1 \and \langle e_1, m \rangle \longrightarrow \langle e_1', m' \rangle }{ \langle e_1 \mathrel{\texttt{=}} e_2 , m \rangle \longrightarrow \langle e_1' \mathrel{\texttt{=}} e_2 , m' \rangle } \]

Note that $\TirName{SearchAssign1}$ rule is needed if addresses were first-class values (cf. pointers in C and C++). However, we see that it is not actually needed for this variant of JavaScripty where addresses are not first-class. In this case, we can also restrict the syntax of assignment to

\[ \begin{array}{rrrl} \text{expressions} & e& \mathrel{::=}& x \mathrel{\texttt{=}}e_1 \end{array} \]

Studying the $\TirName{DoVarDecl}$ rule, we see that assignment expressions $x \mathrel{\texttt{=}}e_1$ where $x$ is in scope would become $\mathord{\texttt{*}}a \mathrel{\texttt{=}}e_1$ on substitution where either $\TirName{SearchAssign2}$ or $\TirName{DoAssignVar}$ would apply.

Indeed, the actual concrete syntax of JavaScript is restricted such that only certain the expression forms like variables can be written on the left-hand–side of assignment.

32.1.3 Implementation

32.1.3.1 Memories

Let us define Mem as an abstract data type to represent a memory $m$ (defined in Figure 32.2) in terms of a Scala Map[A, Expr]:

sealed class Mem private (m: Map[A, Expr], nextAddr: Int) {
  def apply(a: A): Expr = m(a)
  override def toString: String = m.toString

  def +(av: (A, Expr)): Mem = {
    val (a, _) = av
    require(m.contains(a))
    new Mem(m + av, nextAddr)
  }

  def alloc(v: Expr): (A, Mem) = {
    val fresha = A(nextAddr)
    (fresha, new Mem(m + (fresha -> v), nextAddr + 1))
  }
}

object Mem {
  val empty: Mem = new Mem(Map.empty, 1)
}

defined class Mem
defined object Mem

The apply and toString methods simply delegate to the corresponding methods on the underlying m: Map[A, Expr].

In addition to underlying m: Map[A, Expr], the additional nextAddr: Int field represents the next available address. The + method for updating the memory checks that the given address to update a is already in the map. The alloc method implements allocating a fresh address fresha: A by using the next available address, extending the map with the new cell (fresha -> v) using the given initialization value v, and advances the next available address to nextAddr + 1.

The abstract data type Mem thus maintains a consistency invariant between the map m: Map[A, Expr] and the next available address nextAddr: Int. As a client, any address A that we obtained from alloc must have a corresponding mapping in m:

val (a, m) = Mem.empty.alloc(N(42))

a: A = A(a = 1)
m: Mem = Map(A(1) -> N(42.0))

One further step we could take is to make A an abstract data type where the only way for a client to get an address A is via alloc.

32.1.3.2 Location Values

We define isLValue defining the expression forms that are location values following the $e\;\mathsf{location}$ judgment form (see Figure 34.5):

def isLValue(e: Expr): Boolean = e match {
  case Deref(A(_)) => true
  case _ => false
}

defined function isLValue

32.1.3.3 Substitution

Comparing $\mathbf{var}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2$ and $\mathbf{const}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2$, they are the same with respect to binding a variable whose scope is $e_2$. Thus, substitution works the same for both.

We define substitute(with_e, x, in_e) to implement

$[$ with_e $/$ x $]$ in_e

assume that with_e and in_e have non-intersecting free variables (following Figure 21.1 in Section 21.9):

def substitute(with_e: Expr, x: String, in_e: Expr) = {
  // Assume that with_e and in_e have non-intersecting free variables.
  def $(in_e: Expr): Expr = in_e match {
    case N(_) | A(_) => in_e
    case Var(y) => if (x == y) with_e else in_e
    case VarDecl(y, e1, e2) => if (x == y) VarDecl(y, $(e1), e2) else VarDecl(y, $(e1), $(e2))
    case Assign(e1, e2) => Assign($(e1), $(e2))
    case Deref(e1) => Deref($(e1))
  }
  $(in_e)
}

defined function substitute

32.1.3.4 Step

We can now define a step function following the $ \langle e, m \rangle \longrightarrow \langle e', m' \rangle $ judgment form (see Figure 32.4).

32.1.3.4.1 Explicit State Passing

We first choose to define step with type (Expr, Mem) => (Expr, Mem):

def step(e: Expr, m: Mem): (Expr, Mem) = e match {
  // DoDeref
  case Deref(a @ A(_)) => (m(a), m)
  // DoAssignVar
  case Assign(Deref(a @ A(_)), v) if isValue(v) => (v, m + (a -> v))
  // DoVarDecl
  case VarDecl(x, v1, e2) if isValue(v1) => {
    val (a, m_) = m.alloc(v1)
    (substitute(Deref(a), x, e2), m_)
  }
  // SearchAssign2
  case Assign(l1, e2) if isLValue(l1) => {
    val (e2_, m_) = step(e2, m)
    (Assign(l1, e2_), m_)
  }
  // Skip SearchAssign1
  // SearchVarDecl
  case VarDecl(x, e1, e2) => {
    val (e1_, m_) = step(e1, m)
    (VarDecl(x, e1_, e2), m_)
  }
}

defined function step

Let us test step:

val (e_assign_, m_)   = step(e_assign, Mem.empty)
val (e_assign__, m__) = step(e_assign_, m_)

e_assign_: Expr = Assign(e1 = Deref(e1 = A(a = 1)), e2 = N(n = 2.0))
m_: Mem = Map(A(1) -> N(1.0))
e_assign__: Expr = N(n = 2.0)
m__: Mem = Map(A(1) -> N(2.0))

32.1.3.4.2 Encapsulated State

While the above implementation of step faithfully implements the small-step operational semantics judgment form $ \langle e, m \rangle \longrightarrow \langle e', m' \rangle $, we see it requires threading explicitly different versions of the memory state m: Mem (e.g., m_), which could be error prone.

Recall the idea of representing mutation effects by encapsulating a function of type S => (S, A) for a state type S and a main value type A into a data type DoWith[S, A] (see Section 30.5):

DoWith._

sealed class DoWith[S, A] private (doer: S => (S, A)) {
  def map[B](f: A => B): DoWith[S, B] = new DoWith[S, B]({ (s: S) => { val (s_, a) = doer(s); (s_, f(a)) } })
  def flatMap[B](f: A => DoWith[S, B]): DoWith[S, B] = new DoWith[S, B]({ (s: S) => { val (s_, a) = doer(s); f(a)(s_) } })
  def apply(s: S): (S, A) = doer(s)
}

object DoWith {
  def doget[S]: DoWith[S, S] = new DoWith[S, S]({ s => (s, s) })
  def doput[S](s: S): DoWith[S, Unit] = new DoWith[S, Unit]({ _ => (s, ()) })
  def doreturn[S, A](a: A): DoWith[S, A] = new DoWith[S, A]({ s => (s, a) })
  def domodify[S](f: S => S): DoWith[S, Unit] = new DoWith[S, Unit]({ s => (f(s), ()) })
}

import DoWith._

defined class DoWith
defined object DoWith
import DoWith._

Rearranging the type of step slightly, we see that we can implement the judgment form $ \langle e, m \rangle \longrightarrow \langle e', m' \rangle $ using a step function of type Expr => Mem => (Mem, Expr) or Expr => DoWith[Mem, Expr].

For convenience and to warm up, let us start by defining a helper function to the alloc method of Mem using a DoWith[Mem, A]:

def memalloc(v: Expr): DoWith[Mem, A] = doget flatMap { m =>
  val (a, m_) = m.alloc(v)
  doput(m_) map { _ => a }
}

defined function memalloc

We now translate the explicit state passing version of step from above (Section 32.1.3.4.1) to use an encapsulated DoWith[Mem, Expr] state as follows:

def step(e: Expr): DoWith[Mem, Expr] = e match {
  // DoDeref
  case Deref(a @ A(_)) =>
    doget map { m => m(a) }

  // DoAssignVar
  case Assign(Deref(a @ A(_)), v) if isValue(v) =>
    domodify[Mem](m => m + (a -> v)) map { _ => v }

  // DoVarDecl
  case VarDecl(x, v1, e2) if isValue(v1) =>
    memalloc(v1) map { a => substitute(Deref(a), x, e2) }

  // SearchAssign2
  case Assign(l1, e2) if isLValue(l1) =>
    step(e2) map { e2 => Assign(l1, e2) }

  // Skip SearchAssign1

  // SearchVarDecl
  case VarDecl(x, e1, e2) =>
    step(e1) map { e1 => VarDecl(x, e1, e2) }
}

defined function step

We see that the memory state fades into the background, except where it is explicitly needed. It is in the implementation of the $\TirName{Search}$ rules where this fading into the background is particularly salient—whatever effect on memory happens in the recursive call to step is just passed along.

To use step, we can still run each step explicitly:

val (m_, e_assign_)   = step(e_assign)(Mem.empty)
val (m__, e_assign__) = step(e_assign_)(m_)

m_: Mem = Map(A(1) -> N(1.0))
e_assign_: Expr = Assign(e1 = Deref(e1 = A(a = 1)), e2 = N(n = 2.0))
m__: Mem = Map(A(1) -> N(2.0))
e_assign__: Expr = N(n = 2.0)

Observe that we have explicitly threaded the memory state in these top-level calls to step to show the intermediate memory state and expressions. That is, we called step(e_assign) to get a DoWith[Mem, Expr] that we then called with Mem.empty to get (m_, e_assign_) and then called the DoWith[Mem, Expr] resulting from step(e_assign_) with the current memory m_.

But we do not have to get the intermediate memory state m_. We can get the DoWith[Mem, Expr] for the two steps and then run it:

val (m__, e_assign__) = (step(e_assign) flatMap step)(Mem.empty)

m__: Mem = Map(A(1) -> N(2.0))
e_assign__: Expr = N(n = 2.0)

Or, we can rewrite the above make it more visible that flatMap is a sequential composition operator:

val (m__, e_assign__) = (doreturn(e_assign) flatMap step flatMap step)(Mem.empty)

m__: Mem = Map(A(1) -> N(2.0))
e_assign__: Expr = N(n = 2.0)

If desired, we can also use the for-yield expression syntax in Scala:

def step(e: Expr): DoWith[Mem, Expr] = e match {
  // DoDeref
  case Deref(a @ A(_)) =>
    for { m <- doget } yield m(a)

  // DoAssignVar
  case Assign(Deref(a @ A(_)), v) if isValue(v) =>
    for { _ <- domodify[Mem](m => m + (a -> v)) } yield v

  // DoVarDecl
  case VarDecl(x, v1, e2) if isValue(v1) =>
    for { a <- memalloc(v1) } yield substitute(Deref(a), x, e2)

  // SearchAssign2
  case Assign(l1, e2) if isLValue(l1) =>
    for { e2 <- step(e2) } yield Assign(l1, e2)

  // Skip SearchAssign1

  // SearchVarDecl
  case VarDecl(x, e1, e2) =>
    for { e1 <- step(e1) } yield VarDecl(x, e1, e2)
}

defined function step

The two step calls here look somewhat like having a side-effect on a mutable memory state, but in actuality, immutable memory states are threaded in the background:

val (m__, e_assign__) = (for {

  e_assign_  <- step(e_assign)
  e_assign__ <- step(e_assign_)

} yield e_assign_)(Mem.empty)

m__: Mem = Map(A(1) -> N(2.0))
e_assign__: Expr = Assign(e1 = Deref(e1 = A(a = 1)), e2 = N(n = 2.0))

## TypeScripty - Formalize Type Checking

32.2 Other Effects

One might realize that before considering mutation in the above, we have considered another side-effecting JavaScripty expression in logging to the console:

console.log("Hello, World!")

The console.log( $e$ ) expression evaluates $e$ to a value, logs that value to the console as a side-effect, and evaluates to undefined. Its effect is external to its final value undefined.

We gave this rule for $\TirName{DoPrint}$:

\[ \inferrule[DoPrint]{ \text{$v_1$ printed} }{ \texttt{console}\texttt{.}\texttt{log}\texttt{(}v_1\texttt{)} \longrightarrow\mathbf{undefined} } \]

that states the logging effect informally with the “$v_1$ printed” condition.

If we want to describe explicitly that there is log of values (e.g., separated by linefeed characters $\text{\tiny{LF}}$), then we need to similarly reify a log state $\log$

\[ \begin{array}{rrrl} \text{logs} & \mathit{log}& \mathrel{::=}\cdot \mid\mathit{log}\mathbin{\text{\tiny{LF}}} v \end{array} \]

and extend our small-step judgment with a log state $ \langle e, \mathit{log} \rangle \longrightarrow \langle e', \mathit{log}' \rangle $:

$\inferrule[DoPrint]{ }{ \langle \texttt{console}\texttt{.}\texttt{log}\texttt{(}v_1\texttt{)}, \mathit{log} \rangle \longrightarrow \langle \mathbf{undefined}, \mathit{log}\mathbin{\text{\tiny{LF}}} v_1 \rangle }$