6  Data Types

\(\newcommand{\TirName}[1]{\text{#1}} \newcommand{\inferrule}[3][]{ \let\and\qquad \begin{array}{@{}l@{}} \TirName{#1} \\ \displaystyle \frac{#2}{#3} \end{array} } \newcommand{\infer}[3][]{\inferrule[#1]{#2}{#3}} \)

6.1 Standard Collections

We have already seen one standard data type in tuples (see Section 4.4.3).

6.1.1 Lists

After tuples, the most commonly-used standard data type is probably List. A List is a sequential, functional data structure.

val numbers = List(1, 2, 3)
numbers.length
numbers: List[Int] = List(1, 2, 3)
res0_1: Int = 3

Like tuples, the List type constructor is parametrized by another type. In the above, we have that numbers is bound to a value of type List[Int], that is, a list of integers. A list of strings would have type List[String].

6.1.1.1 Indexing

While it is very uncommon to do so, it is possible to index into lists:

numbers(0)
numbers(1)
numbers(2)
res1_0: Int = 1
res1_1: Int = 2
res1_2: Int = 3
numbers(3)
java.lang.IndexOutOfBoundsException: 3
  scala.collection.LinearSeqOps.apply(LinearSeq.scala:117)
  scala.collection.LinearSeqOps.apply$(LinearSeq.scala:114)
  scala.collection.immutable.List.apply(List.scala:79)
  ammonite.$sess.cmd2$Helper.<init>(cmd2.sc:1)
  ammonite.$sess.cmd2$.<clinit>(cmd2.sc:7)

Another way for getting the first element is getting the head of the list:

numbers.head
res3: Int = 1

Scalaism: As all operators in Scala are methods, the expression numbers(0) is syntactic sugar for a method call to apply:

numbers.apply(0)
res4: Int = 1

6.1.1.2 Nil and Cons

An empty list can also be written as Nil

val empty: List[Int] = List()
val nil: List[Int] = Nil
empty: List[Int] = List()
nil: List[Int] = List()

We can then prepend to a list with :: as follows:

val numbers = List(1, 2, 3)
val consZero = 0 :: numbers
val consTen = 10 :: numbers
val numbersHead = numbers.head
val consZeroHead = consZero.head
val consTenHead = consTen.head
numbers: List[Int] = List(1, 2, 3)
consZero: List[Int] = List(0, 1, 2, 3)
consTen: List[Int] = List(10, 1, 2, 3)
numbersHead: Int = 1
consZeroHead: Int = 0
consTenHead: Int = 10

Note that there is no imperative update here. A List is an immutable, functional data structure. An immutable, functional data Prepending to numbers does not change it. Rather consZero and consTen are bound to new lists.

Recall from Section 4.3, the note about immutablility enabling efficient representations. Because Lists are immutable, prepending is still a constant-time operation (i.e., \(O(1)\)). The consZero and consTen can share the same tail (i.e., numbers), that is, only 5 nodes are needed in total to represent the lists numbers, consZero, and consTen:

val consZeroTail = consZero.tail
val consTenTail = consTen.tail
consZeroTail eq consTenTail
numbers eq consZeroTail
numbers eq List(1, 2, 3)
numbers == List(1, 2, 3)
consZeroTail: List[Int] = List(1, 2, 3)
consTenTail: List[Int] = List(1, 2, 3)
res7_2: Boolean = true
res7_3: Boolean = true
res7_4: Boolean = false
res7_5: Boolean = true

where in Scala, eq is the reference equality operator, while == is structural equality.

Note that the List(1, 2, 3) constructor is equivalent to the following:

val numbers = 1 :: 2 :: 3 :: Nil
numbers: List[Int] = List(1, 2, 3)

The :: operator is often read as “cons”, referencing historically the name cons in Lisp for the primitive that constructs a cell with two values. Note that :: is a right-associative binary operator, so it is parsed as like

val numbers = 1 :: (2 :: (3 :: Nil))
Nil.::(3).::(2).::(1)
numbers: List[Int] = List(1, 2, 3)
res9_1: List[Int] = List(1, 2, 3)

and as with other binary operators, it is just syntactic sugar for method call.

It is quite common to work with lists directly using pattern matching on Nil or :::

def isEmpty(l: List[Int]): Boolean = l match {
  case Nil => true
  case head :: tail => false
}
isEmpty(numbers)
defined function isEmpty
res10_1: Boolean = false

Note that the List API also defines a method ::: for appending two lists together:

val numbers = List(1, 2, 3)
numbers ::: List(4, 5, 6)
numbers: List[Int] = List(1, 2, 3)
res11_1: List[Int] = List(1, 2, 3, 4, 5, 6)

The append method ::: is a linear-time operation in the length of its left argument (i.e., \(O(\)numbers.length\()\) in this example). Why must this be the case?

6.1.1.3 Immutability

As noted above, a List is an immutable, functional data structure.

numbers(1) = 20
cmd12.sc:1: value update is not a member of List[Int]
did you mean updated?
val res12 = numbers(1) = 20
            ^Compilation Failed
: 
Compilation Failed

While it is not common to use this method, the closest analogue is return a new list that is the original list element at index 1 updated:

val numbers_ = numbers.updated(1, 20)
numbers_: List[Int] = List(1, 20, 3)

6.1.1.4 Iterators

While it is also not common to use, a for loop in Scala enables iteration through a List:

for (n <- numbers) println(n)
1
2
3

A for loop in Scala is actually syntactic sugar for a method call to a higher-order method foreach:

numbers foreach { n => println(n) }
1
2
3

We call foreach higher-order, as it takes a function as a parameter of type Int => Unit (e.g., { n => println(n) } in the above example). The function parameter, sometimes called a callback, describes what to do for each element of the list.

Note that println is a function that conforms to Int => Unit, so we could pass it directly:

numbers.foreach(println)
1
2
3

Higher-order methods are the most common way to use Lists. However, foreach is less used than others, as the callback of type Int => Unit must inherently be imperative to do anything interesting. Why? Consider the type of the function we get that awaits receiving the callback to foreach:

val awaitingCallback: (Int => Unit) => Unit = numbers.foreach(_)
awaitingCallback(println)
1
2
3
awaitingCallback: Int => Unit => Unit = ammonite.$sess.cmd16$Helper$$Lambda$2150/0x0000000800b46840@2223c63d

In the above, read numbers.foreach(_) more like numbers.foreach. The extra (_) is a low-level Scalaism to convert a method into a function that is only needed for this explanation and rarely needed in practice.

6.1.1.4.1 Functional Traversals

Or as another example, consider trying to write a function sum that sums up a list of integers List[Int] with a for loop or foreach:

def sum(l: List[Int]): Int = {
  var acc = 0
  for (n <- l) acc += n
  acc
}
sum(numbers)
defined function sum
res17_1: Int = 6

The only way to remember the accumulated sum so far is with a mutable variable var acc.

Instead, the idiomatic way to compute such a sum is to use another higher-order method that permits accumulation in a functional manner, such as reduce:

def sum(l: List[Int]): Int = l reduce { (acc, n) => acc + n }
sum(numbers)
defined function sum
res18_1: Int = 6

Not only does the sum definition become a one-liner, but it decouples the scheduling of work on each element of the list and lets the library implement that (e.g., sequentially left-to-right, concurrently, or even distributed!).

6.1.1.4.2 Placeholder Syntax for Function Literals

While not necessarily recommended, one might sometimes see an alternative Scala syntax for function literals using placeholders _:

def sum(l: List[Int]): Int = l.reduce(_ + _)
sum(numbers)
defined function sum
res19_1: Int = 6

Each placeholder _ corresponds to a formal parameter, so _ + _ is syntactic sugar for (x,y) => x + y. Like with any diet, take sugar in moderation.

6.1.1.4.3 Composing Higher-Order Methods

We will consider in subsequent chapters how to effectively use such higher-order methods. For the moment, simply recognize that such higher-order methods exist and is the idiomatic way to work with Lists. Furthermore, this API design is particularly powerful and becoming commonplace in almost all languages (even in Java!). For example, in big-data applications, this design enables streaming where the data can be consumed in an online manner as a stream. Consider the following for a taste:

val l = List(1, 2, 3, 4, 5, 6)
val sumEvens = l filter { i => i % 2 == 0 } reduce { (acc, n) => acc + n }
l: List[Int] = List(1, 2, 3, 4, 5, 6)
sumEvens: Int = 12

Or,

val sumEvens = l.filter(_ % 2 == 0).reduce(_ + _)
sumEvens: Int = 12
6.1.1.4.4 Object-Oriented Iterators

The foreach method is an abstraction of the object-oriented Iterator Pattern:

val it = numbers.iterator
while (it.hasNext) {
  val n = it.next()
  println(n)
}
1
2
3
it: Iterator[Int] = empty iterator

In essence, the foreach method uses the callback parameter to allow the client to specify the body of the while loop. A benefit is that common programming errors in using such an object-oriented API—like calling it.next() after the it has no more elements—cannot happen when using foreach.

it.next()
java.util.NoSuchElementException: head of empty list
  scala.collection.immutable.Nil$.head(List.scala:629)
  scala.collection.immutable.Nil$.head(List.scala:628)
  scala.collection.StrictOptimizedLinearSeqOps$$anon$1.next(LinearSeq.scala:253)
  ammonite.$sess.cmd23$Helper.<init>(cmd23.sc:1)
  ammonite.$sess.cmd23$.<clinit>(cmd23.sc:7)

6.1.1.5 API Documentation

As alluded to above, Scala has a rich API for Lists. Such libraries are designed to be extremely generic for many use cases, so they necessarily have a fair amount of complexity. Nonetheless, it is worthwhile getting used to reading such API documentation.

6.1.1.6 Arrays

A List is a immutable, functional sequential collection of elements of the same type (i.e., a singly-linked list), while an Array is a fixed-size, mutable indexable collection of elements of the same type. In this course, we have little need for Array, but it exists in Scala for particular use cases.

6.1.2 Options

Another commonly used built-in data type is Option. It is either a None for or a Some of some value:

val none: Option[Int] = None
val some: Option[Int] = Some(42)
none: Option[Int] = None
some: Option[Int] = Some(value = 42)

Or, using some API methods on Option:

val none: Option[Int] = Option.empty
val some: Option[Int] = Option(42)
none: Option[Int] = None
some: Option[Int] = Some(value = 42)

The Option type is useful for methods that may optionally return a value (i.e., would error in some cases).

For example, we may want to define a division method that returns None if the client attempts to divide by zero:

def div(n: Int, m: Int): Option[Int] = m match {
  case 0 => None
  case _ => Some(n / m)
}
defined function div

Or, as another example, the head method for Lists errors if the input list is empty:

val emptyList: List[Int] = Nil
emptyList: List[Int] = List()
val h: Int = emptyList.head
java.util.NoSuchElementException: head of empty list
  scala.collection.immutable.Nil$.head(List.scala:629)
  scala.collection.immutable.Nil$.head(List.scala:628)
  ammonite.$sess.cmd28$Helper.<init>(cmd28.sc:1)
  ammonite.$sess.cmd28$.<clinit>(cmd28.sc:7)

Instead, the headOption method returns an Option using None for an empty list and Some for a non-empty list:

val h: Option[Int] = emptyList.headOption
h: Option[Int] = None

We can then work with options also using pattern matching:

def head(l: List[Int]): Option[Int] = l match {
  case Nil => None
  case h :: _ => Some(h)
}
head(List(1, 2, 3))
defined function head
res30_1: Option[Int] = Some(value = 1)

We can think of an Option value as a 0-or-1 element list and thus all of the higher-order iteration methods are available:

some.foreach(println)
42

6.1.3 Maps

Maps are particularly useful data structures for storing associations between keys and values. For example, we describe value environments for a programming language as maps from variables to values in Section 4.1.1.

type Env = Map[String,Int]
val env: Env = Map("nOranges" -> 4, "nApples" -> 7, "nPears" -> 10)
defined type Env
env: Env = Map("nOranges" -> 4, "nApples" -> 7, "nPears" -> 10)

We can lookup in maps based on a key:

env("nApples")
env.apply("nApples")
res33_0: Int = 7
res33_1: Int = 7
env("nDogs")
java.util.NoSuchElementException: key not found: nDogs
  scala.collection.immutable.Map$Map3.apply(Map.scala:399)
  ammonite.$sess.cmd34$Helper.<init>(cmd34.sc:1)
  ammonite.$sess.cmd34$.<clinit>(cmd34.sc:7)

As we see above, since a given key may not exist in a map, there is a get method that instead returns an Option:

env contains "nApples"
env get "nApples"
env contains "nDogs"
env get "nDogs"
res35_0: Boolean = true
res35_1: Option[Int] = Some(value = 7)
res35_2: Boolean = false
res35_3: Option[Int] = None

Another commonly-used alterative to apply and get for lookup in a map is getOrElse that takes an extra parameter for what to return in the case that the key does not exist:

env getOrElse ("nDogs", 0)
res36: Int = 0

Finally, we often want to extend maps:

val env_ = env + ("nBananas" -> 17)
env_: Map[String, Int] = Map(
  "nOranges" -> 4,
  "nApples" -> 7,
  "nPears" -> 10,
  "nBananas" -> 17
)

Note that just like with List, the above is a “functional update” that returns a new Map where env still exists:

env
env_
res38_0: Env = Map("nOranges" -> 4, "nApples" -> 7, "nPears" -> 10)
res38_1: Map[String, Int] = Map(
  "nOranges" -> 4,
  "nApples" -> 7,
  "nPears" -> 10,
  "nBananas" -> 17
)

A functional update is sometimes where one might want to intentionally shadow (i.e., write val env = env + ("nBananas" -> 17) in the above) to prevent referencing the unextended env in a particular scope.

We use the -> operator with maps for visual clarity, but it is actually not special. It is just an alias for constructing pairs:

"nBananas" -> 17
res39: (String, Int) = ("nBananas", 17)
val env_ = env + (("nBananas", 17))
env_: Map[String, Int] = Map(
  "nOranges" -> 4,
  "nApples" -> 7,
  "nPears" -> 10,
  "nBananas" -> 17
)

As a collection in the standard library, Map also has the usual higher-order iteration methods, such as

env.foreach(println)
(nOranges,4)
(nApples,7)
(nPears,10)

6.1.4 Sets

The Scala standard library has many other core functional data structures. Another commonly used one is the Set data structure that keeps a single copy of each element while supporting fast membership testing, union, intersection, and iteration operations.

6.2 Classes

Scala is also an object-oriented language where code and data can be encapsulated together. A class declaration introduces a new type name, specifies data that it packages together, and defines methods that operate on that data:

class Dog(name: String, breed: String, age: Int) {
  override def toString =
    s"Woof! My name is $name, I am a $breed, and I am $age years old."
}
val samuel = new Dog("Samuel", "Alsatian", 11)
val bo = new Dog("Bo", "Portuguese Water Dog", 10)
samuel.toString
defined class Dog
samuel: Dog = Woof! My name is Samuel, I am a Alsatian, and I am 11 years old.
bo: Dog = Woof! My name is Bo, I am a Portuguese Water Dog, and I am 10 years old.
res42_3: String = "Woof! My name is Samuel, I am a Alsatian, and I am 11 years old."

In the above, we define a method toString that overrides the default toString method that is defined for all objects—that we can see is used as the printer above.

We can see that defining classes in Scala is quite convenient, eliminating a lot of the repetitive boilerplate code with constructors, field declarations, etc. seen in, for example, Java and C++.

Scala has a shorthand for defining a class with single instance (sometimes called a singleton) with the object keyword:

object Dog {
  def birth(name: String, breed: String): Dog = 
    new Dog(name, breed, 0)
}
val sadie = Dog.birth("Sadie", "Pointer")
sadie.toString
defined object Dog
sadie: Dog = Woof! My name is Sadie, I am a Pointer, and I am 0 years old.
res43_2: String = "Woof! My name is Sadie, I am a Pointer, and I am 0 years old."

A object is like a module with function definitions, type definitions, etc. If an object has the same name as a class in the same file, then it is the companion object for the class and has special accessibility to that class’s instances.

6.2.1 Data Classes

In relation to Java or C++, we can see the parameters to the class on line 1 as the parameter list to the constructor to create private fields with the same name that are accessible by methods. However, those fields are not accessible outside of the class’s methods:

samuel.name
cmd44.sc:1: value name is not a member of cmd44.this.cmd42.Dog
val res44 = samuel.name
                   ^Compilation Failed
: 
Compilation Failed

Often, we just want classes that store associated data together. If the fields are immutable, it is perfectly acceptable for them to be accessible. We can do so as follows to make public val fields:

class Dog(val name: String, val breed: String, val age: Int)
val samuel = new Dog("Samuel", "Alsatian", 11)
samuel.name
defined class Dog
samuel: Dog = ammonite.$sess.cmd44$Helper$Dog@15f44dcf
res44_2: String = "Samuel"

However, in this case, we would like to treat the Dog just like a specialized tuple and use things like pattern matching, but we can’t

val Dog(name, _, _) = samuel
cmd45.sc:1: object Dog is not a case class, nor does it have a valid unapply/unapplySeq member
val Dog(name, _, _) = samuel
    ^Compilation Failed
: 
Compilation Failed

We can do so by making Dog a case class:

case class Dog(name: String, breed: String, age: Int)
val samuel = Dog("Samuel", "Alsatian", 11)
samuel.name
val Dog(_, breed, _) = samuel
defined class Dog
samuel: Dog = Dog(name = "Samuel", breed = "Alsatian", age = 11)
res45_2: String = "Samuel"
breed: String = "Alsatian"

We can think of a case class as a “functional class” used for storing immutable data. We can define methods on such classes as well, but that is somewhat secondary.

6.3 Algebraic Data Types

In addition to a tuples grouping associated data together, we want to be able to define alternatives:

trait Pet
case class Dog(name: String, breed: String) extends Pet
case class Cat(name: String, breed: String) extends Pet

def greet(pet: Pet): String = pet match {
  case Dog(name, _) => s"Woof, $name!"
  case Cat(name, _) => s"Meow, $name!"
}

greet(Dog("Samuel", "Altsatian"))
greet(Cat("Jenkins", "Siamese"))
defined trait Pet
defined class Dog
defined class Cat
defined function greet
res46_4: String = "Woof, Samuel!"
res46_5: String = "Meow, Jenkins!"

A trait introduces a new type name and is roughly a class interface. In the above, Dog and Cat can be both be a Pet. The greet function takes as input a Pet and uses pattern matching to distinguish whether it is a Dog or a Cat.

Pet in the above example is a very simple algebraic data type. As an aside, an algebraic data type is named such because it combines products of data (i.e., tuples) and sums of data (i.e., cases).

6.3.1 Option

The built-in collection types Option and List described above are both algebraic data types. Consider the following Option-like definition:

sealed trait MyOption
case object MyNone extends MyOption
case class MySome(i: Int) extends MyOption

val none: MyOption = MyNone
val some: MyOption = MySome(42)

def getOrElse(o: MyOption, default: Int): Int = o match {
  case MyNone => default
  case MySome(i) => i
}

getOrElse(none, 0)
getOrElse(some, 0)
defined trait MyOption
defined object MyNone
defined class MySome
none: MyOption = MyNone
some: MyOption = MySome(i = 42)
defined function getOrElse
res47_6: Int = 0
res47_7: Int = 42

One new thing to note is the sealed qualifier, which says that all the classes that derive from MyOption (i.e., the alternatives) are defined here, so the programmer and compiler do not need to worry about other possible cases for MyOption.

Algebraic data types can also be recursive. Recursive data types is exemplified by Lists, which we revisit subsequently.

6.3.2 Parametric Polymorphism

One difference between the built-in Option and MyOption in the above is that Option is parametrized by a type of the value that may or may not exist. We can extend our definition of MyOption with a type parameter A as follows:

sealed trait MyOption[A]
case class MyNone[A]() extends MyOption[A]
case class MySome[A](v: A) extends MyOption[A]

val none: MyOption[Int] = MyNone()
val some: MyOption[Int] = MySome(42)

def getOrElse[A](o: MyOption[A], default: A): A = o match {
  case MyNone() => default
  case MySome(v) => v
}

getOrElse(none, 0)
getOrElse(some, 0)
defined trait MyOption
defined class MyNone
defined class MySome
none: MyOption[Int] = MyNone()
some: MyOption[Int] = MySome(v = 42)
defined function getOrElse
res48_6: Int = 0
res48_7: Int = 42

Observe that getOrElse (as well as the MyNone and MySome constructors) have a type parameter list (written brackets []) and a value parameter list (written in parentheses ()).

The getOrElse function is generic in the parametrized type A. Being generic, the getOrElse function is also called parametric polymorphic.

Note that MyOption is not quite the same definition as Option, but it is close.