val two = 2
val four = two + two
two: Int = 2 four: Int = 4
Thus far, our expressions consist only of operations on literals, which is certainly restricting! Like other languages, we would like to introduce names that are bound to other items, such as values.
To introduce a value binding in Scala, we use a val
declaration, such as the following:
val two = 2
val four = two + two
two: Int = 2 four: Int = 4
The first declaration binds the name two
to the value 2
, and the second declaration binds the name four
to the value of two + two
(i.e., 4
). The syntax of value bindings is as follows:
\[ \text{$\texttt{val}$ $x\texttt{:}$ $\tau$ $\texttt{=}$ $e$} \]
for a variable \(x\), type \(\tau\), and expression \(e\). For the value binding to be well typed, expression \(e\) must be of type \(\tau\). The type annotation \(\texttt{:}\;\tau\) is optional, which if elided is inferred from typing expression \(e\). At run-time, the name \(x\) is bound to value of expression \(e\) (i.e., the value obtained by evaluating \(e\) to a value). If \(e\) does not evaluate to a value, then no binding occurs.
A binding makes a new name available to an expression in its scope. For example, the name two
must be in scope for the expression two + two
to have meaning. Intuitively, to evaluate this expression, we need to know to what value the name two
is bound. We can view val
declarations as evaluating to a value environment. A value environment is a finite map from names to values, which we write as follows: \[[
x_1 \mapsto v_1, \ldots, x_n \mapsto v_n
]\] For example, the first binding in our example
val two = 2
two: Int = 2
yields the following environment: \[[ \texttt{two} \mapsto\texttt{2} ]\]
Intuitively, to evaluate the expression \(\texttt{two + two}\), we replace or substitute the value of the binding for the name \(\texttt{two}\)
+ two two
res2: Int = 4
and then reduce as before: \[[ \texttt{two} \mapsto\texttt{2}](\texttt{two + two}) \quad = \quad \texttt{2 + 2} \quad \longrightarrow\quad \texttt{4} \;. \] In the above, we are conceptually “applying” the environment as a substitution to the expression \(\texttt{two + two}\) to get the expression \(\texttt{2 + 2}\), which reduces to the value \(\texttt{4}\).
For type checking, we need a similar type environment that maps names to types. For example, the type environment \([\texttt{two} \mapsto\texttt{Int}]\) may be used to type check the expression \(\texttt{two + two}\).
Declarations may be sequenced as seen in the example above where the binding of the name \(\texttt{two}\) is then used in the binding of the name \(\texttt{four}\).
val two = 2
val four = two + two
two: Int = 2 four: Int = 4
We can see that what is printed above are the value and type environments: \[ [ \texttt{two} \mapsto\texttt{2}, \texttt{four} \mapsto\texttt{4} ] \qquad [ \texttt{two} \mapsto\texttt{Int}, \texttt{four} \mapsto\texttt{Int} ] \;. \]
Sometimes, we use notation for the mappings that suggest value or type environments, respectively: \[ [ \texttt{two} \Downarrow\texttt{2}, \texttt{four} \Downarrow\texttt{4} ] \qquad [ \texttt{two} : \texttt{Int}, \texttt{four} : \texttt{Int} ] \;. \]
Another kind of binding is for types where we can bind one type name to another creating a type alias, such as
type Str = String
"Hello": Str
"Hello"
defined type Str res4_1: Str = "Hello" res4_2: String = "Hello"
Type binding is not so useful in our current Scala subset, but such bindings become particularly relevant later onin ?sec-something.
At this point, all our bindings are placed into the global scope. A scope is simply a window of the program where a name applies. We can limit the scope of a variable by using blocks {
\(\ldots\)}
:
{
{ val three = 3 }
three}
cmd5.sc:2: not found: value three
val res5_1 = three
^Compilation Failed
:
Compilation Failed
A block introduces a new nested scope where the name in an inner scope may shadow one in an outer scope:
In the above Figure 4.1, there are two scopes, and there is a binding to a name a
in each. The use of a
on line 5 refers to the inner binding on line 4, while the use of a
on line 6 refers to the outer binding on line 1. Also note that the use of b
on refers to the binding of b
in the outer scope, as b
is not bound in the inner scope. The name c
ends up being bound to the value 6
. In particular, after applying the environments, we end up evaluating the expression { 3 + 2 } + 1
. Note that value binding is not assignment. After the inner binding of name a
on line 4, the outer binding of a
still exists but is hidden—called shadowed—within the inner scope.
We can rename the two variables named a
in the expression in Figure 4.1 to a semantically equivalent expression that eliminates the shadowing:
Observe that in Figure 4.2, the name a_inner
is regardless unavailable in the outer, global scope.
Also observe that in Scala, blocks {
\(\ldots\)}
are also expressions—like (
\(\ldots\))
expressions that introduce a new scope:
val a = { 3 + 2 } + 1
val b = ( 3 + 2 ) + 1
a: Int = 6 b: Int = 6
Scala uses static scoping (or also called lexical scoping), which means that the binding that applies to the use of any name can be determined by examining the program text. Specifically, the binding that applies is not dependent on evaluation order. For Scala, the rule is that for any use of a variable \(x\), the innermost scope that (a) contains the use of \(x\) and (b) has a binding for \(x\) is the one that applies. Note that there are only two scopes in the above example in Figure 4.1 (e.g., not one for each declaration). Thus, the following example has a compile-time error:
cmd8.sc:3: forward reference extends over definition of value c
val c = a
^Compilation Failed
:
Compilation Failed
In particular, the use of variable a
at line 3 refers to the binding at line 4, and the use comes before the binding.
Consider again the nested scopes and shadowing example in Figure 4.1. How do we describe the evaluation of this expression? The substitution-based evaluation rule for names described previously in Section 4.1 needs to be more nuanced. In particular, eliminating the binding of the name in the outer scope should replace the use of name a
on line 6 but not the use of name a
on 5. In particular, applying the environment \([\texttt{a} \mapsto\texttt{1}, \texttt{b} \mapsto\texttt{2}]\) to lines 3 to 4 yields the following:
val c = {
val a = 3
+ 2
a } + 1
c: Int = 6
This notion of substitution is directly linked to the terms free and bound variables. In any given expression \(e\), a bound variable is one whose binding location is in \(e\), while a free variable is one whose binding location is not in \(e\). For example, in the expression { val x = 3; x + y }
, variable x
is bound, while variable y
is free:
{
{ val x = 3; x + y }
}
cmd9.sc:1: not found: value y
val res9 = { val x = 3; x + y }
^Compilation Failed
:
Compilation Failed
Note that as an aside about Scala syntax, the ;
sequences expressions, which are inferred when using newlines:
{
{ val x = 3
+ y }
x }
cmd9.sc:2: not found: value y
x + y }
^Compilation Failed
:
Compilation Failed
We can see free variables as inputs to an expression: we do not know how to evaluate it without an environment giving bindings.
Consider again the example of renaming to eliminate shadowing in Figure 4.2. A key property to observe is that when looking at a sub-expression, we can rename the bound variables consistently to a semantically equivalent one, but we cannot rename the free variables.
A closed expression is one that has no free variables, which is then one that can be evaluated with an empty environment:
{
{ val y = 4; { val x = 3; x + y } }
}
res9: Int = 7
An open expression is one that has at least one free variable (and thus cannot be evaluated with an empty environment).
In the above, we are using the term variable in the same sense as in mathematical logic where variables are placeholder names and immutable. However, in the context of imperative programming, the notion of a program variable is often, by default, considered a name for a mutable memory cell.
Just like many other languages (including JavaScript, Java, C), Scala has both immutable and mutable variables.
Language | Immutable Variable Declaration | Mutable Variable Declaration |
---|---|---|
Scala | val one = 1 |
var one = 1 |
JavaScript | const one = 1 |
var one = 1 |
Java | final int one = 1 |
int one = 1 |
C | const int one = 1 |
int one = 1 |
A mutable variable allocates a memory cell that can be assigned:
var greeting = "Hello"
println(greeting)
= "Hi"
greeting println(greeting)
= "Hola"
greeting println(greeting)
Hello
Hi
Hola
greeting: String = "Hola"
where the value stored in the cell depends on when it is read.
There are many reasons why we might prefer val
. One reason is understandability of the source code. As we see from the above, using var
breaks referential transparency of variable use (e.g., the value that the expression greeting
evaluates to depends on when it runs).
Another reason is getting more efficient executable code from the compiler. A mutable requires a new memory allocation for each time it executes, whereas a reference to an immutable can be shared aggressively.
The most basic and perhaps most important form of abstraction in programming languages is defining functions. Here’s an example Scala function:
def square(x: Int): Int = x * x
defined function square
where x
is a formal parameter of type for the function that returns a value of type Int
. Schematically, a function definition has the following form:
\[ \text{$\texttt{def}$ $x\texttt{(}x_1\texttt{:}$ $\tau_1\texttt{,}$ $\ldots\texttt{,}$ $x_n\texttt{:}$ $\tau_n\texttt{):}$ $\tau$ $\texttt{=}$ $e$} \]
where the formal parameter types \(\tau_1, \ldots, \tau_n\) are always required and the return type \(\tau\) is sometimes required.1 However, we adopt the convention of always giving the return type. This convention is good practice in documenting function interfaces, and it saves us from worrying about when Scala actually requires or does not require it.
Note that braces {}
are not part of the syntax of a def
. For example, the following code is valid:
def max(x: Int, y: Int): Int =
if (x > y)
xelse
y
defined function max
As a convention, we will not use {}
unless we need to introduce bindings.
Functions are values in Scala. That is, expressions can evaluate to values that are functions. Functions are values is sometimes stated as, “Functions are first-class in Scala.”
A function literal defines an anonymous function:
(x: Int) => x * x
res13: Int => Int = ammonite.$sess.cmd13$Helper$$Lambda$1997/0x0000000800acf040@1f40f380
We can see that a function taking a formal parameter of type Int
and returning a value of type Int
is written Int => Int
.
For historical reasons referencing the Lambda Calculus, function values are sometimes called lambdas.
As values, we can bind functions to variables:
val square = (x: Int) => x * x
square: Int => Int = ammonite.$sess.cmd14$Helper$$Lambda$2006/0x0000000800ad4840@69cd8b89
If the type of the formal parameters are clear from context, they can be inferred:
val square: Int => Int = x => x * x
square: Int => Int = ammonite.$sess.cmd15$Helper$$Lambda$2010/0x0000000800ad7040@46593b21
Note that it is a common style to put braces {}
around function literals to make them stand out more visually, even if there’s no need to introduces bindings:
val square: Int => Int = { x => x * x }
square: Int => Int = ammonite.$sess.cmd16$Helper$$Lambda$2014/0x0000000800ad9840@11ee4343
An expression defining a function can refer to variables bound in an outer scope:
val four = 4
def plusFour(x: Int): Int = x + four
plusFour(4)
four: Int = 4 defined function plusFour res17_2: Int = 8
Referencing a detail needed to properly implement static scoping with such functions that can refer to variables bound in an outer scope, first-class functions are also sometimes called closures.
NB The return
keyword does exist in Scala but is rarely used and generally considered bad practice. For the purposes of this course, we should also avoid using return
, as it can lead to unexpected results without a deeper knowledge about Scala internals.
A tuple is a simple data structure that combines a fixed number of values. It is a value that is a pair, triple, quadruple, etc. of values:
val oneT = (1, true)
oneT: (Int, Boolean) = (1, true)
That is, a pair is a 2-tuple, a triple is a 3-tuple, a quadruple is a 4-tuple and so forth.
A \(n\)-tuple expression annotated with a \(n\)-tuple type is written as follows: \[ \texttt{(}e_1\texttt{,}\;\ldots\texttt{,}\;e_n\texttt{):}\;\; \texttt{(}\tau_1\texttt{,}\;\ldots\texttt{,}\;\tau_n\texttt{)} \;.\]
Tuples are often used with functions to return multiple values or to pass around a small number of associated values together. It is generally used when defining a custom data type for a single use does not really make sense, and it is generally advisable not to use tuples larger than 4- or 5-tuples.
As example of a function returning multiple associated values, we can write a function that takes two integers and and returns a pair of their quotient and their remainder:
def divRem(x: Int, y: Int): (Int, Int) = (x / y, x % y)
defined function divRem
The \(i^{\text{\textrm{th}}}\) component of a tuple \(e\) can be obtained using the expression \(e\texttt{.\_}i\) (invoking the \(i^{\text{\textrm{th}}}\) projection method). For example,
val divRemSevenThree: (Int, Int) = divRem(7, 3)
val div: Int = divRemSevenThree._1
val rem: Int = divRemSevenThree._2
divRemSevenThree: (Int, Int) = (2, 1) div: Int = 2 rem: Int = 1
However, it much more common and almost always clearer to get the components of a tuple using pattern matching:
val (div, rem) = divRem(7, 3)
div: Int = 2 rem: Int = 1
Note that the bottom line is a binding of two names div
and rem
, which are bound to the first and second components of the tuple returned by evaluating divRem(7,3)
, respectively. The parentheses \(\texttt{()}\) are necessary in the code above.
If we do not need one part of the pair, we can use the _
pattern:
val (div, _) = divRem(7, 3)
div: Int = 2
There is no 1-tuple type, but there is a 0-tuple type that is called Unit
(see Section 3.2.3). There is only one value of type (also typically called the unit value). The unit value is written using the expression ()
(i.e., open-close parentheses), as it is the 0-tuple.
As noted in Section 3.2.3, a good indication of imperative programming are when expressions return Unit
. Conceptually, the unit value represents “nothing interesting returned.” When we introduce side-effects, a function with return type is a good indication that its only purpose is to be executed for side effects because “nothing interesting” is returned. A block that does not have a final expression (e.g., only has declarations) returns the unit value:
val u: Unit = { }
Scala has an alternative syntax for functions that have a Unit
return type:
def doNothing() { }
defined function doNothing
Specifically, the =
is dropped and no type annotation is needed for the return type since it is fixed to be Unit
. This syntax makes imperative Scala code look a bit more like C or Java code.
Another workhorse of defining functions in a functional programming language is pattern matching. We have seen pattern matching to deconstruct tuples (Section 4.4.3.2), such as
def fst(pair: (Int, Boolean)): Int = {
val (i, _) = pair
i}
fst(3, true)
defined function fst res25_1: Int = 3
Patterns can match deeply, which is particularly powerful.
val (_, (i, _)) = (3.14, (42, true))
i: Int = 42
For tuples, we can pattern match with val
because the shape of tuples is homogenous. Pattern matching can also be used when there are multiple possible cases, such as
def isZero(n: Int): Boolean = n match {
case 0 => true
case _ => false
}
isZero(10)
defined function isZero res27_1: Boolean = false
Observe that one possible pattern is a value: 0
in this example.
Cases are attempted from top-to-bottom, so it is important to order more specific patterns before less specific ones:
def isZero(n: Int): Boolean = n match {
case _ => false
case 0 => true
}
isZero(0)
defined function isZero res28_1: Boolean = false
And it is possible that a run-time error results when no cases match:
def isZero(n: Int): Boolean = n match {
case 0 => true
}
isZero(10)
scala.MatchError: 10 (of class java.lang.Integer) ammonite.$sess.cmd29$Helper.isZero(cmd29.sc:1) ammonite.$sess.cmd29$Helper.<init>(cmd29.sc:4) ammonite.$sess.cmd29$.<clinit>(cmd29.sc:7)
While this is not a core language feature, Scala has convenient string construction facilities for constructing strings that can be useful for writing debugging logs.
There are C printf
-style format strings:
def helloWorld(greeting: String): String = "%s, World!" format greeting
helloWorld("Hello")
helloWorld("Hi")
defined function helloWorld res31_1: String = "Hello, World!" res31_2: String = "Hi, World!"
But even more convenient, there is a macro s
for string interpolation:
def helloWorld(greeting: String): String = s"$greeting, World!"
helloWorld("Hello")
helloWorld("Hi")
defined function helloWorld res32_1: String = "Hello, World!" res32_2: String = "Hi, World!"
That is, the $
in the string template informs Scala to evaluate the expression greeting
. For a more complex expression than a variable use, use braces ${}
:
def helloWorld(greeting: String): String = s"${greeting.toLowerCase}, World!"
helloWorld("Hello")
helloWorld("Hi")
defined function helloWorld res33_1: String = "hello, World!" res33_2: String = "hi, World!"
We can update the example of inspecting how Scala evaluates from Section 3.3 with a bit more information:
def printeval(indent: String, e: String, v: Int): Int =
{ println(s"${indent}eval($e) = $v"); v }
printeval("", "(1 + 2) + (3 + 4)", {
printeval("- if ", "1 + 2", {
printeval(" - if ", "1", 1) +
printeval(" - if ", "2", 2)
}) +
printeval("- if ", "3 + 4", {
printeval(" - if ", "3", 3) +
printeval(" - if ", "4", 4)
})
})
- if eval(1) = 1
- if eval(2) = 2
- if eval(1 + 2) = 3
- if eval(3) = 3
- if eval(4) = 4
- if eval(3 + 4) = 7
eval((1 + 2) + (3 + 4)) = 10
defined function printeval res34_1: Int = 10
To print the evaluation of the expression with its resulting value, we now print out the post-order traversal of the evaluation tree versus the pre-order traversal in Section 3.3.
Note that we need to take some care in preserving the structure of the expression (1 + 2) + (3 + 4)
to test the evaluation order. The expression { val r = 3 + 4; (1 + 2) + r }
is semantically equivalent to the first one with respect to their values but not evaluation order:
val r = {
printeval("- if ", "3 + 4", {
printeval(" - if ", "3", 3) +
printeval(" - if ", "4", 4)
})
}
printeval("", "(1 + 2) + (3 + 4)", {
printeval("- if ", "1 + 2", {
printeval(" - if ", "1", 1) +
printeval(" - if ", "2", 2)
}) + r
})
- if eval(3) = 3
- if eval(4) = 4
- if eval(3 + 4) = 7
- if eval(1) = 1
- if eval(2) = 2
- if eval(1 + 2) = 3
eval((1 + 2) + (3 + 4)) = 10
r: Int = 7 res35_1: Int = 10
Scala is also an object-oriented language, and this syntax actually introduces a method. Fortunately, in most situations, methods can be treated like functions in Scala.↩︎