object MyList {
sealed trait List[A]
case class Nil[A]() extends List[A]
case class ::[A](head: A, tail: List[A]) extends List[A]
}
defined object MyList
As we saw in Section 6.1.1, the List
type constructor from the Scala library is defined with two constructors Nil
and ::
(pronounced “cons”). A list is a basic inductive data type:
object MyList {
sealed trait List[A]
case class Nil[A]() extends List[A]
case class ::[A](head: A, tail: List[A]) extends List[A]
}
defined object MyList
The List
type constructor from Scala library is very close to the above. Observe that List[A]
is a recursive type with the tail
field of ::
.
Thus, the most direct way to implement functions on List
s is using recursion and pattern matching. For example, defining a function to compute the length of a list:
def length[A](l: List[A]): Int = l match {
case Nil => 0
case _ :: t => 1 + length(t)
}
defined function length
The definition above using pattern matching very directly follows the inductive structure of the type. Observe that in a definition of length
using an if
expression
def length[A](l: List[A]): Int =
if (l == Nil) 0
else 1 + length(l.tail)
defined function length
it takes, for example, a bit of extra thought to realize that l.tail
will never throw an exception.
We can also see why a function append
(i.e., the :::
method in the Scala library) that appends one list to another must necessarily be a linear-time operation over the left list xl
:
def append[A](xl: List[A], yl: List[A]): List[A] = xl match {
case Nil => yl
case xh :: xt => xh :: append(xt, yl)
}
val xlyl_append = append(List(1, 2, 3), List(4, 5, 6))
val xlyl_::: = List(1, 2, 3) ::: List(4, 5, 6)
== xlyl_::: xlyl_append
defined function append xlyl_append: List[Int] = List(1, 2, 3, 4, 5, 6) xlyl_:::: List[Int] = List(1, 2, 3, 4, 5, 6) res3_3: Boolean = true
Now, observing that append
is not tail recursive, we might try to implement the following:
def buggyAppend[A](xl: List[A], yl: List[A]): List[A] = xl match {
case Nil => yl
case xh :: xt => buggyAppend(xt, xh :: yl)
}
defined function buggyAppend
This is is not quite append
. What does buggyAppend
do?
val xlyl_buggyAppend = buggyAppend(List(1, 2, 3), List(4, 5, 6))
xlyl_buggyAppend: List[Int] = List(3, 2, 1, 4, 5, 6)
It reverses the first list xl
and appends the second list yl
to it. While a somewhat strange operation, it is tail recursive and in the standard library:
val xlyl_reverse_::: = List(1, 2, 3) reverse_::: List(4, 5, 6)
== xlyl_reverse_::: xlyl_buggyAppend
xlyl_reverse_:::: List[Int] = List(3, 2, 1, 4, 5, 6) res6_1: Boolean = true
To define the reverse
of a list l
, we can use append
to take an element on the head and append it to the reverse
of the tail:
def reverse[A](l: List[A]): List[A] = l match {
case Nil => Nil
case h :: t => append(reverse(t), h :: Nil)
}
reverse(List(1, 2, 3, 4, 5))
defined function reverse res7_1: List[Int] = List(5, 4, 3, 2, 1)
But what is the complexity of this function? It is \(O(n^2)\) where \(n\) is the length of l
!
Can we write a linear-time reverse
? Looking at buggyAppend
, we see how:
def reverse[A](l: List[A]): List[A] = {
def rev(l: List[A], acc: List[A]): List[A] = l match {
case Nil => acc
case h :: t => rev(t, h :: acc)
}
rev(l, Nil)
}
reverse(List(1, 2, 3, 4, 5))
defined function reverse res8_1: List[Int] = List(5, 4, 3, 2, 1)
Let’s instrument this linear-time reverse
to see it in action:
def reverse[A](l: List[A]): List[A] = {
println(s"reverse($l)")
def rev(l: List[A], acc: List[A]): List[A] = {
println(s"-->* loop($l, $acc)")
match {
l case Nil => acc
case h :: t => rev(t, h :: acc)
}
}
val r = rev(l, Nil)
println(r)
r}
reverse(List(1, 2, 3, 4, 5))
reverse(List(1, 2, 3, 4, 5))
-->* loop(List(1, 2, 3, 4, 5), List())
-->* loop(List(2, 3, 4, 5), List(1))
-->* loop(List(3, 4, 5), List(2, 1))
-->* loop(List(4, 5), List(3, 2, 1))
-->* loop(List(5), List(4, 3, 2, 1))
-->* loop(List(), List(5, 4, 3, 2, 1))
List(5, 4, 3, 2, 1)
defined function reverse res9_1: List[Int] = List(5, 4, 3, 2, 1)
Observe that rev
is exactly buggyAppend
. The specification of rev
(or buggyAppend
) is that it returns the reverse of the its first argument followed by its second argument appended.
Previously, our discussion about tail recursion (Section 8.4) was simply about efficiency because the operators we considered were commutative (e.g., +
or *
on Int
s). Now, with a non-commutative operator like ::, we see that there is something more.
The intuition is that the accumulator parameter acc
in rev
enables us to “do something” as we “recurse down” the list. And the stack in a non-tail recursive function enables us to “do something” as we “return up”.
Lists are special case of trees with one recursive parameter, so we see that values of user-defined inductive data types are in general trees. For example, we can define a binary tree of Int
s:
sealed trait BinaryTree
case object Empty extends BinaryTree
case class Node(l: BinaryTree, d: Int, r: BinaryTree) extends BinaryTree
Node(Node(Empty, 2, Empty), 10, Node(Empty, 14, Node(Empty, 17, Empty)))
defined trait BinaryTree defined object Empty defined class Node res10_3: Node = Node( l = Node(l = Empty, d = 2, r = Empty), d = 10, r = Node(l = Empty, d = 14, r = Node(l = Empty, d = 17, r = Empty)) )
One key application of immutable trees are for representing maps and sets with logarithmic lookup, insertion, and deletion using balanced search trees. First, consider making the BinaryTree
type generic:
sealed trait BinaryTree[K,V]
case class Empty[K,V]() extends BinaryTree[K,V]
case class Node[K,V](l: BinaryTree[K,V], kv: (K, V), r: BinaryTree[K,V]) extends BinaryTree[K,V]
Node(Node(Empty(), 2 -> List("two", "dos", "二"), Empty()), 10 -> List("ten", "diez", "十"), Node(Empty(), 14 -> List("fourteen", "catorce", "十四"), Node(Empty(), 17 -> List("seventeen", "diecisiete", "十七"), Empty())))
defined trait BinaryTree defined class Empty defined class Node res11_3: Node[Int, List[String]] = Node( l = Node(l = Empty(), kv = (2, List("two", "dos", "\u4e8c")), r = Empty()), kv = (10, List("ten", "diez", "\u5341")), r = Node( l = Empty(), kv = (14, List("fourteen", "catorce", "\u5341\u56db")), r = Node( l = Empty(), kv = (17, List("seventeen", "diecisiete", "\u5341\u4e03")), r = Empty() ) ) )
Now, we do not want to directly construct such trees. Instead, we design an API for lookup, insertion, and deletion to maintain search (i.e., ordering) and balance invariants. Lookup, insertion, and deletion are logarithmic when the search and balance invariants are maintained.
The Scala Map
and Set
libraries are such search tree data structures.
val m = Map(2 -> List("two", "dos", "二"), 10 -> List("ten", "diez", "十"))
val newm = m + (14 -> List("fourteen", "catorce", "十四"))
m: Map[Int, List[String]] = Map( 2 -> List("two", "dos", "\u4e8c"), 10 -> List("ten", "diez", "\u5341") ) newm: Map[Int, List[String]] = Map( 2 -> List("two", "dos", "\u4e8c"), 10 -> List("ten", "diez", "\u5341"), 14 -> List("fourteen", "catorce", "\u5341\u56db") )
The map newm
is the map m
with an additional key-value pair 14 -> List("fourteen", "catorce", "十四")
inserted. Note that both the old version m
and the new version newm
exist:
val mOf10 = m(10)
val newmOf10 = newm(10)
mOf10: List[String] = List("ten", "diez", "\u5341") newmOf10: List[String] = List("ten", "diez", "\u5341")
By checking reference equality (i.e., using eq
)
mOf10 eq newmOf10List("ten", "diez", "十")
mOf10 eq == List("ten", "diez", "十") mOf10
res14_0: Boolean = true res14_1: Boolean = false res14_2: Boolean = true
we see that the above is one tree with both two versions on top of each other, leveraging immutability. Such data structures are called persistent because multiple versions can persist at the same time. In contrast, imperative data structures are ephemeral because only one version can exist at a time.
It is difficult to build an interpreter for any substantial language all at once. In this book, we will make some simplifications. We consider small subsets that isolate the essence of a language feature and incrementally examine more and more complex subsets.
For concreteness, let us consider variants of JavaScript as our primary object language of study, and we affectionately call the language that we implement in this course JavaScripty. However, note that the various subsets we consider could mimic just about any other language. In fact, this course has used other object languages in the past (e.g., a mini-OCaml called Lettuce, a mini-Scala called Smalla).
Because we do not yet have the mathematical tools to specify the semantics of a language, let us define JavaScripty to be a proper subset of JavaScript. That is, we may choose to omit complex behavior in JavaScript, but we want any programs that we admit in JavaScripty to behave in the same way as in JavaScript.
For example, let us consider the JavaScripty expression with +
:
3 + 7 + 4.2
that results in 14.2
. That is,
When we have the tools to specify the semantics of a language, we may choose to make JavaScripty to have different semantics than JavaScript.
In actuality, there is not one language called JavaScript (officially, ECMAScript) but a set of closely related languages that may have slightly different semantics. In deciding how a JavaScripty program should behave, we consult a reference implementation (that we fix to be Google’s open source V8 JavaScript Engine). We can run V8 through various engine interfaces (e.g., node
and deno
), and thus, we can write little test JavaScript programs and run it through to the engine to see how the test should behave.
The first thing we have to consider is how to represent a JavaScripty program as data in Scala, that is, we need to be able to represent a program in our object/source language JavaScripty as data in our meta/implementation language Scala.
To a JavaScripty programmer, a JavaScripty program is a text file—a string of characters. Such a representation is quite cumbersome to work with as a language implementer. Instead, language implementations typically work with trees called abstract syntax trees (ASTs). What strings are considered JavaScripty programs is called the concrete syntax of JavaScripty, while the trees (or terms) that are JavaScripty programs is called the abstract syntax of JavaScripty. The process of converting a program in concrete syntax (i.e., as a string) to a program in abstract syntax (i.e., as a tree) is called parsing.
While parsing seems like the place to start an implementation, the theory and implementation of parsers are surprising subtle. Instead, we can directly start our study of programming languages from abstract syntax assuming the JavaScripty input programs of interest come directly as abstract syntax trees.
We represent abstract syntax trees in our meta/implementation language Scala using inductive, algebraic data types. Let us consider representing the most tiny JavaScripty language with number literals and +
expressions. Here’s one possible representation:
sealed trait Expr
case class N(n: Double) extends Expr
case class Plus(e1: Expr, e2: Expr) extends Expr
val three = N(3)
val seven = N(7)
val four_point_two = N(4.2)
val three_plus_seven = Plus(three, seven)
val three_plus_seven_plus_four_point_two = Plus(three_plus_seven, four_point_two)
defined trait Expr defined class N defined class Plus three: N = N(n = 3.0) seven: N = N(n = 7.0) four_point_two: N = N(n = 4.2) three_plus_seven: Plus = Plus(e1 = N(n = 3.0), e2 = N(n = 7.0)) three_plus_seven_plus_four_point_two: Plus = Plus( e1 = Plus(e1 = N(n = 3.0), e2 = N(n = 7.0)), e2 = N(n = 4.2) )
Here, we let a Scala value of type Expr
(i.e., the meta language) represent a JavaScripty expression (i.e., the object language). A parser implementation (e.g., parse: String => Expr
function) would take as input a JavaScripty expression (i.e., the object language) in concrete syntax (i.e., as a string) and convert into a Scala value of type Expr
(i.e., in the meta language) as a tree (i.e., abstract syntax).
An N
node represents a number literal where we represent JavaScripty numbers \(n\) as a Scala Double
, and Plus
is an AST node representing the JavaScripty \(e_1 + e_2\).
Once we have a Scala value of type Expr
, we can define functions that manipulate JavaScripty expressions. For example, we can define evaluation of JavaScripty expressions as an eval
function in Scala:
def eval(e: Expr): Double = e match {
case N(n) => n
case Plus(e1, e2) => eval(e1) + eval(e2)
}
eval( N(1.66) )
eval( Plus(N(2.1), N(3.5)) )
eval( three_plus_seven_plus_four_point_two )
defined function eval res16_1: Double = 1.66 res16_2: Double = 5.6 res16_3: Double = 14.2