9 Inductive Data Types

\(\newcommand{\TirName}[1]{\text{#1}} \newcommand{\inferrule}[3][]{ \let\and\qquad \begin{array}{@{}l@{}} \TirName{#1} \\ \displaystyle \frac{#2}{#3} \end{array} } \newcommand{\infer}[3][]{\inferrule[#1]{#2}{#3}} \)

9.1 Lists

As we saw in Section 6.1.1, the List type constructor from the Scala library is defined with two constructors Nil and :: (pronounced “cons”). A list is a basic inductive data type:

object MyList {
  sealed trait List[A]
  case class Nil[A]() extends List[A]
  case class ::[A](head: A, tail: List[A]) extends List[A]
}

defined object MyList

The List type constructor from Scala library is very close to the above. Observe that List[A] is a recursive type with the tail field of ::.

Thus, the most direct way to implement functions on Lists is using recursion and pattern matching. For example, defining a function to compute the length of a list:

def length[A](l: List[A]): Int = l match {
  case Nil => 0
  case _ :: t => 1 + length(t)
}

defined function length

The definition above using pattern matching very directly follows the inductive structure of the type. Observe that in a definition of length using an if expression

def length[A](l: List[A]): Int =
  if (l == Nil) 0
  else 1 + length(l.tail)

defined function length

it takes, for example, a bit of extra thought to realize that l.tail will never throw an exception.

We can also see why a function append (i.e., the ::: method in the Scala library) that appends one list to another must necessarily be a linear-time operation over the left list xl:

def append[A](xl: List[A], yl: List[A]): List[A] = xl match {
  case Nil => yl
  case xh :: xt => xh :: append(xt, yl)
}
val xlyl_append = append(List(1, 2, 3), List(4, 5, 6))
val xlyl_::: = List(1, 2, 3) ::: List(4, 5, 6)
xlyl_append == xlyl_:::

defined function append
xlyl_append: List[Int] = List(1, 2, 3, 4, 5, 6)
xlyl_:::: List[Int] = List(1, 2, 3, 4, 5, 6)
res3_3: Boolean = true

Now, observing that append is not tail recursive, we might try to implement the following:

def buggyAppend[A](xl: List[A], yl: List[A]): List[A] = xl match {
  case Nil => yl
  case xh :: xt => buggyAppend(xt, xh :: yl)
}

defined function buggyAppend

This is is not quite append. What does buggyAppend do?

val xlyl_buggyAppend = buggyAppend(List(1, 2, 3), List(4, 5, 6))

xlyl_buggyAppend: List[Int] = List(3, 2, 1, 4, 5, 6)

It reverses the first list xl and appends the second list yl to it. While a somewhat strange operation, it is tail recursive and in the standard library:

val xlyl_reverse_::: = List(1, 2, 3) reverse_::: List(4, 5, 6)
xlyl_buggyAppend == xlyl_reverse_:::

xlyl_reverse_:::: List[Int] = List(3, 2, 1, 4, 5, 6)
res6_1: Boolean = true

To define the reverse of a list l, we can use append to take an element on the head and append it to the reverse of the tail:

def reverse[A](l: List[A]): List[A] = l match {
  case Nil => Nil
  case h :: t => append(reverse(t), h :: Nil)
}
reverse(List(1, 2, 3, 4, 5))

defined function reverse
res7_1: List[Int] = List(5, 4, 3, 2, 1)

But what is the complexity of this function? It is \(O(n^2)\) where \(n\) is the length of l!

Can we write a linear-time reverse? Looking at buggyAppend, we see how:

def reverse[A](l: List[A]): List[A] = {
  def rev(l: List[A], acc: List[A]): List[A] = l match {
    case Nil => acc
    case h :: t => rev(t, h :: acc)
  }
  rev(l, Nil)
}
reverse(List(1, 2, 3, 4, 5))

defined function reverse
res8_1: List[Int] = List(5, 4, 3, 2, 1)

Let’s instrument this linear-time reverse to see it in action:

def reverse[A](l: List[A]): List[A] = {
  println(s"reverse($l)")
  def rev(l: List[A], acc: List[A]): List[A] = {
    println(s"-->* loop($l, $acc)")
    l match {
      case Nil => acc
      case h :: t => rev(t, h :: acc)
    }
  }
  val r = rev(l, Nil)
  println(r)
  r
}
reverse(List(1, 2, 3, 4, 5))

reverse(List(1, 2, 3, 4, 5))
-->* loop(List(1, 2, 3, 4, 5), List())
-->* loop(List(2, 3, 4, 5), List(1))
-->* loop(List(3, 4, 5), List(2, 1))
-->* loop(List(4, 5), List(3, 2, 1))
-->* loop(List(5), List(4, 3, 2, 1))
-->* loop(List(), List(5, 4, 3, 2, 1))
List(5, 4, 3, 2, 1)

defined function reverse
res9_1: List[Int] = List(5, 4, 3, 2, 1)

Observe that rev is exactly buggyAppend. The specification of rev (or buggyAppend) is that it returns the reverse of the its first argument followed by its second argument appended.

Previously, our discussion about tail recursion (Section 8.4) was simply about efficiency because the operators we considered were commutative (e.g., + or * on Ints). Now, with a non-commutative operator like ::, we see that there is something more.

The intuition is that the accumulator parameter acc in rev enables us to “do something” as we “recurse down” the list. And the stack in a non-tail recursive function enables us to “do something” as we “return up”.

9.2 Persistent Data Structures

Lists are special case of trees with one recursive parameter, so we see that values of user-defined inductive data types are in general trees. For example, we can define a binary tree of Ints:

sealed trait BinaryTree
case object Empty extends BinaryTree
case class Node(l: BinaryTree, d: Int, r: BinaryTree) extends BinaryTree

Node(Node(Empty, 2, Empty), 10, Node(Empty, 14, Node(Empty, 17, Empty)))

defined trait BinaryTree
defined object Empty
defined class Node
res10_3: Node = Node(
  l = Node(l = Empty, d = 2, r = Empty),
  d = 10,
  r = Node(l = Empty, d = 14, r = Node(l = Empty, d = 17, r = Empty))
)

One key application of immutable trees are for representing maps and sets with logarithmic lookup, insertion, and deletion using balanced search trees. First, consider making the BinaryTree type generic:

sealed trait BinaryTree[K,V]
case class Empty[K,V]() extends BinaryTree[K,V]
case class Node[K,V](l: BinaryTree[K,V], kv: (K, V), r: BinaryTree[K,V]) extends BinaryTree[K,V]

Node(Node(Empty(), 2 -> List("two", "dos", "二"), Empty()), 10 -> List("ten", "diez", "十"), Node(Empty(), 14 -> List("fourteen", "catorce", "十四"), Node(Empty(), 17 -> List("seventeen", "diecisiete", "十七"), Empty())))

defined trait BinaryTree
defined class Empty
defined class Node
res11_3: Node[Int, List[String]] = Node(
  l = Node(l = Empty(), kv = (2, List("two", "dos", "\u4e8c")), r = Empty()),
  kv = (10, List("ten", "diez", "\u5341")),
  r = Node(
    l = Empty(),
    kv = (14, List("fourteen", "catorce", "\u5341\u56db")),
    r = Node(
      l = Empty(),
      kv = (17, List("seventeen", "diecisiete", "\u5341\u4e03")),
      r = Empty()
    )
  )
)

Now, we do not want to directly construct such trees. Instead, we design an API for lookup, insertion, and deletion to maintain search (i.e., ordering) and balance invariants. Lookup, insertion, and deletion are logarithmic when the search and balance invariants are maintained.

The Scala Map and Set libraries are such search tree data structures.

val m = Map(2 -> List("two", "dos", "二"), 10 -> List("ten", "diez", "十"))
val newm = m + (14 -> List("fourteen", "catorce", "十四"))

m: Map[Int, List[String]] = Map(
  2 -> List("two", "dos", "\u4e8c"),
  10 -> List("ten", "diez", "\u5341")
)
newm: Map[Int, List[String]] = Map(
  2 -> List("two", "dos", "\u4e8c"),
  10 -> List("ten", "diez", "\u5341"),
  14 -> List("fourteen", "catorce", "\u5341\u56db")
)

The map newm is the map m with an additional key-value pair 14 -> List("fourteen", "catorce", "十四") inserted. Note that both the old version m and the new version newm exist:

val mOf10 = m(10)
val newmOf10 = newm(10)

mOf10: List[String] = List("ten", "diez", "\u5341")
newmOf10: List[String] = List("ten", "diez", "\u5341")

By checking reference equality (i.e., using eq)

mOf10 eq newmOf10
mOf10 eq List("ten", "diez", "十")
mOf10 == List("ten", "diez", "十")

res14_0: Boolean = true
res14_1: Boolean = false
res14_2: Boolean = true

we see that the above is one tree with both two versions on top of each other, leveraging immutability. Such data structures are called persistent because multiple versions can persist at the same time. In contrast, imperative data structures are ephemeral because only one version can exist at a time.

9.3 Abstract Syntax Trees (ASTs)

9.3.1 Mini Programming Languages

It is difficult to build an interpreter for any substantial language all at once. In this book, we will make some simplifications. We consider small subsets that isolate the essence of a language feature and incrementally examine more and more complex subsets.

For concreteness, let us consider variants of JavaScript as our primary object language of study, and we affectionately call the language that we implement in this course JavaScripty. However, note that the various subsets we consider could mimic just about any other language. In fact, this course has used other object languages in the past (e.g., a mini-OCaml called Lettuce, a mini-Scala called Smalla).

Because we do not yet have the mathematical tools to specify the semantics of a language, let us define JavaScripty to be a proper subset of JavaScript. That is, we may choose to omit complex behavior in JavaScript, but we want any programs that we admit in JavaScripty to behave in the same way as in JavaScript.

For example, let us consider the JavaScripty expression with +:

3 + 7 + 4.2

that results in 14.2. That is,

When we have the tools to specify the semantics of a language, we may choose to make JavaScripty to have different semantics than JavaScript.

In actuality, there is not one language called JavaScript (officially, ECMAScript) but a set of closely related languages that may have slightly different semantics. In deciding how a JavaScripty program should behave, we consult a reference implementation (that we fix to be Google’s open source V8 JavaScript Engine). We can run V8 through various engine interfaces (e.g., node and deno), and thus, we can write little test JavaScript programs and run it through to the engine to see how the test should behave.

9.3.2 Representing Abstract Syntax

The first thing we have to consider is how to represent a JavaScripty program as data in Scala, that is, we need to be able to represent a program in our object/source language JavaScripty as data in our meta/implementation language Scala.

To a JavaScripty programmer, a JavaScripty program is a text file—a string of characters. Such a representation is quite cumbersome to work with as a language implementer. Instead, language implementations typically work with trees called abstract syntax trees (ASTs). What strings are considered JavaScripty programs is called the concrete syntax of JavaScripty, while the trees (or terms) that are JavaScripty programs is called the abstract syntax of JavaScripty. The process of converting a program in concrete syntax (i.e., as a string) to a program in abstract syntax (i.e., as a tree) is called parsing.

While parsing seems like the place to start an implementation, the theory and implementation of parsers are surprising subtle. Instead, we can directly start our study of programming languages from abstract syntax assuming the JavaScripty input programs of interest come directly as abstract syntax trees.

9.3.2.1 JavaScripty: Number Literals and Addition

We represent abstract syntax trees in our meta/implementation language Scala using inductive, algebraic data types. Let us consider representing the most tiny JavaScripty language with number literals and + expressions. Here’s one possible representation:

sealed trait Expr
case class N(n: Double) extends Expr
case class Plus(e1: Expr, e2: Expr) extends Expr

val three = N(3)
val seven = N(7)
val four_point_two = N(4.2)
val three_plus_seven = Plus(three, seven)
val three_plus_seven_plus_four_point_two = Plus(three_plus_seven, four_point_two)

defined trait Expr
defined class N
defined class Plus
three: N = N(n = 3.0)
seven: N = N(n = 7.0)
four_point_two: N = N(n = 4.2)
three_plus_seven: Plus = Plus(e1 = N(n = 3.0), e2 = N(n = 7.0))
three_plus_seven_plus_four_point_two: Plus = Plus(
  e1 = Plus(e1 = N(n = 3.0), e2 = N(n = 7.0)),
  e2 = N(n = 4.2)
)

Here, we let a Scala value of type Expr (i.e., the meta language) represent a JavaScripty expression (i.e., the object language). A parser implementation (e.g., parse: String => Expr function) would take as input a JavaScripty expression (i.e., the object language) in concrete syntax (i.e., as a string) and convert into a Scala value of type Expr (i.e., in the meta language) as a tree (i.e., abstract syntax).

An N node represents a number literal where we represent JavaScripty numbers \(n\) as a Scala Double, and Plus is an AST node representing the JavaScripty \(e_1 + e_2\).

Once we have a Scala value of type Expr, we can define functions that manipulate JavaScripty expressions. For example, we can define evaluation of JavaScripty expressions as an eval function in Scala:

def eval(e: Expr): Double = e match {
  case N(n) => n
  case Plus(e1, e2) => eval(e1) + eval(e2)
}

eval( N(1.66) )
eval( Plus(N(2.1), N(3.5)) )
eval( three_plus_seven_plus_four_point_two )

defined function eval
res16_1: Double = 1.66
res16_2: Double = 5.6
res16_3: Double = 14.2