JavaScript
true + 2
In the previous part, we began the discussion of language specification and the importance specifying languages clearly, crisply, and precisely. Grammars is the main tool by which the syntax of a language, that is, the programs that we can write are specified. In this section, we introduce a tool for defining the semantics of a language, that is, the meaning of programs.
There are several ways to think about the meaning of programs. One natural way is to think about how programs evaluate. An operational semantics is a way to describe how programs evaluate in terms of the language itself (rather than by compilation to a machine model). One way to see an operational semantics is as describing an interpreter for the language of interest.
We might guess the semantics of particular expressions based on common conventions. For example, we might guess that expression \[ e_1 \mathbin{\texttt{+}} e_2 \] adds two numbers that result from evaluating \(e_1\) and \(e_2\). But note that this statement is something about the semantics of \(e_1 \mathbin{\texttt{+}} e_2\), which has yet to be specified.
As we have seen in the previous lab (Section 16.2.3), one aspect that makes the JavaScript specification “interesting” is the presence of implicit conversions (e.g., boolean values may be implicitly converted to numeric values depending on the context in which values are used). For example,
JavaScript
true + 2
evaluates to 3
.
Then, the +
operator in JavaScript is overloaded for strings and numbers with +
on strings meaning string concatenation:
JavaScript
"Hello, " + "World!"
So \(e_1 \mathbin{\texttt{+}} e_2\) may not be just adding two numbers!
You might guess that defining coercions between value types can lead to some interesting semantics. It is because of these coercions that we have the meme that “JavaScript is bananas.”
JavaScript
"b" + "a" + "n" + - "a" + "a" + "s"
How can we describe how to implement a interpreter for all programs?
It is possible to specify the semantics of a programming language using natural language prose. However, just like with specifying syntax using natural language prose, it is very easy to leave ambiguity in the description. Furthermore, trying to minimize ambiguity can create very verbose descriptions. The JavaScript specification, specifically ECMA-262 standard, is actually rather precise specification based on natural language prose, but the descriptions are quite verbose.
In this section, we introduce some mathematical notation that enables us to specify semantics with less ambiguity in a very compact form. Like any mathematical notation, its precise and compact nature makes it easier, for example, to spot errors or inconsistencies in specification. However, there will necessarily be a learning curve to reading the notation.
We want to write out as unambiguously as possible how a program should evaluate independent of an implementation (e.g., a compiler and machine architecture). We use a methodology for semantics specification known as an operational semantics. An operational semantics can be thought as describing an interpreter for the language of interest with relations between syntactic objects.
We have already used a notation for describing an evaluation relation: \[ e \Downarrow v \] This notation is a judgment form stating informally, “Expression \(e\) evaluates to value \(v\).” Defining this judgment describes how to evaluate expressions to values and thus corresponds closely to writing a recursive interpreter of the abstract syntax trees representing expressions. A set of inference rules defining such an evaluation judgment form is called a big-step operational semantics for expressions \(e\) because it describes evaluation from expressions in one “big step” to values. Another term for a big-step operational semantics is a natural semantics.
Let us first consider an object language with only one type of values—numbers. In particular, we consider just numbers \(n\) and one arithmetic operator \(\texttt{+}\):
\[ \begin{array}{rrrl} \text{expressions} & e& \mathrel{::=}& v\mid e_1 \mathbin{\mathit{bop}} e_2 \\ \text{values} & v& \mathrel{::=}& n \\ \text{binary operators} & \mathit{bop}& \mathrel{::=}& \texttt{+} \\ \text{numbers} & n \end{array} \]
trait Expr // e
trait Bop // bop
case class Binary(bop: Bop, e1: Expr, e2: Expr) extends Expr // e ::= e1 bop e2
case class N(n: Double) extends Expr // e ::= n
case object Plus extends Bop // bop ::= +
def isValue(e: Expr): Boolean = e match {
case N(_) => true
case _ => false
}
val e_oneplustwo = Binary(Plus, N(1), N(2))
defined trait Expr defined trait Bop defined class Binary defined class N defined object Plus defined function isValue e_oneplustwo: Binary = Binary(bop = Plus, e1 = N(n = 1.0), e2 = N(n = 2.0))
Back to the original example in this section, we are trying to specify how the expression \(e_1 \mathbin{\texttt{+}} e_2\) evaluates. Thinking operationally, we want to say something like: evaluate \(e_1\) to a number, evaluate \(e_2\) to a number, and then return the number that is the addition of those numbers.
Consider the following rules defining the \(e \Downarrow v\) judgment form:
\(\inferrule[EvalNum]{ \phantom{ \Downarrow} }{ n \Downarrow n }\)
\(\inferrule[EvalPlus]{ e_1 \Downarrow n_1 \and e_2 \Downarrow n_2 \and n= n_1 + n_2 }{ e_1 \mathbin{\texttt{+}} e_2 \Downarrow n }\)
The \(\TirName{EvalNum}\) rule is an axiom that states that an expression \(n\) evaluates to itself (as it is already a value).
The \(\TirName{EvalPlus}\) rule specifies how the expression \(e_1 \mathbin{\texttt{+}} e_2\) evaluates following our intuition above. Reading top-down, this rule says if we know that expression \(e_1\) evaluates to a number \(n_1\) and \(e_2\) evaluates to \(n_2\), then expression \(e_1 \mathbin{\texttt{+}} e_2\) evaluates to \(n\) where \(n\) is the addition of the \(n_1\) and \(n_2\).
Any evaluation rule can also be read bottom-up, which matches more closely to an implementation. For example, the above \(\TirName{EvalPlus}\) rule says, “To evaluate \(e_1 \mathbin{\texttt{+}} e_2\), evaluate \(e_1\) to get a number \(n_1\), evaluate \(e_2\) to get a number \(n_2\), and return the addition of those two numbers \(n= n_1 + n_2\).”
Note that the \(+\) in the premise is “plus” in the meta language (i.e., the implementation language) in contrast to the \(\texttt{+}\) in the conclusion that is the syntactic symbol in the object language (i.e., the source language). Here, we have distinguished the meta-language “plus” for clarity, but often, the reader is asked to determine this distinction based on context. To be completely explicit, let us use an alternative notation for the abstract syntax:
\(\inferrule[EvalPlus]{ e_1 \Downarrow \texttt{N($n_1$)} \and e_1 \Downarrow \texttt{N($n_2$)} \and n= n_1 + n_2 }{ \texttt{Binary(Plus, $e_1$, $e_2$)} \Downarrow \texttt{N($n$)} }\)
We see that these inference rules could translate to following eval
implementation:
def eval(e: Expr): Expr = e match {
// EvalNum
case n @ N(_) => n
// EvalPlus
case Binary(Plus, e1, e2) => {
val N(n1) = eval(e1)
val N(n2) = eval(e2)
val n = n1 + n2
N(n)
}
}
e_oneplustwoval v_oneplustwo = eval(e_oneplustwo)
assert(v_oneplustwo == N(3))
defined function eval res1_1: Binary = Binary(bop = Plus, e1 = N(n = 1.0), e2 = N(n = 2.0)) v_oneplustwo: Expr = N(n = 3.0)
As there could be slightly different code implementations that behave the same, the same is true for inference rules. For example, the following version of \(\TirName{EvalPlus}\) says the same thing without making an explicit “binding” of \(n\):
\(\inferrule[EvalPlus]{ e_1 \Downarrow n_1 \and e_2 \Downarrow n_2 }{ e_1 \mathbin{\texttt{+}} e_2 \Downarrow n_1 + n_2 }\)
While we want a set of inference rules to define a semantics unambiguously, there are implementation choices. For example, defining eval
as follows:
def eval(e: Expr): Double = e match {
// EvalNum
case N(n) => n
// EvalPlus
case Binary(Plus, e1, e2) => eval(e1) + eval(e2)
}
e_oneplustwoval v_oneplustwo = eval(e_oneplustwo)
assert(v_oneplustwo == 3)
defined function eval res2_1: Binary = Binary(bop = Plus, e1 = N(n = 1.0), e2 = N(n = 2.0)) v_oneplustwo: Double = 3.0
is also described by these inference rules.
We can imagine that we also add inference rules for other arithmetic operators in a similar manner (cf. Figure 18.2).
Let us add boolean values to our JavaScripty variant:
\[ \begin{array}{rrrl} \text{values} & v& \mathrel{::=}& b \\ \text{booleans} & b \end{array} \]
case class B(b: Boolean) extends Expr // e ::= b
def isValue(e: Expr): Boolean = e match {
case N(_) | B(_) => true
case _ => false
}
val e_true = B(true)
val e_trueplustwo = Binary(Plus, e_true, N(2))
defined class B defined function isValue e_true: B = B(b = true) e_trueplustwo: Binary = Binary(bop = Plus, e1 = B(b = true), e2 = N(n = 2.0))
For the moment, we only add boolean literals and consider the following set of inference rules defining evaluation:
\(\inferrule[EvalNum]{ \phantom{ \Downarrow} }{ n \Downarrow n }\)
\(\inferrule[EvalBool]{ \phantom{ \Downarrow} }{ b \Downarrow b }\)
\(\inferrule[EvalPlus]{ e_1 \Downarrow n_1 \and e_2 \Downarrow n_2 }{ e_1 \mathbin{\texttt{+}} e_2 \Downarrow n_1 + n_2 }\)
def eval(e: Expr): Expr = e match {
// EvalNum
case n @ N(_) => n
// EvalBool
case b @ B(_) => b
// EvalPlus
case Binary(Plus, e1, e2) => {
val N(n1) = eval(e1)
val N(n2) = eval(e2)
N(n1 + n2)
}
}
e_oneplustwoval v_oneplustwo = eval(e_oneplustwo)
assert(v_oneplustwo == N(3))
e_trueval v_true = eval(e_true)
assert(v_true == B(true))
defined function eval res4_1: Binary = Binary(bop = Plus, e1 = N(n = 1.0), e2 = N(n = 2.0)) v_oneplustwo: Expr = N(n = 3.0) res4_4: B = B(b = true) v_true: Expr = B(b = true)
An alternative would be to have just one rule for values:
\(\inferrule[EvalVal]{ \phantom{ \Downarrow} }{ v \Downarrow v }\)
\(\inferrule[EvalPlus]{ e_1 \Downarrow n_1 \and e_2 \Downarrow n_2 }{ e_1 \mathbin{\texttt{+}} e_2 \Downarrow n_1 + n_2 }\)
where have rewritten \(\TirName{EvalNum}\) and \(\TirName{EvalBool}\) to \(\TirName{EvalVal}\) here to apply to all values \(v\), including both numbers and booleans. We can reimplement eval
to match these rules:
def eval(e: Expr): Expr = e match {
// EvalVal
case v if isValue(v) => v
// EvalPlus
case Binary(Plus, e1, e2) => {
val N(n1) = eval(e1)
val N(n2) = eval(e2)
N(n1 + n2)
}
}
e_oneplustwoval v_oneplustwo = eval(e_oneplustwo)
assert(v_oneplustwo == N(3))
e_trueval v_true = eval(e_true)
assert(v_true == B(true))
defined function eval res5_1: Binary = Binary(bop = Plus, e1 = N(n = 1.0), e2 = N(n = 2.0)) v_oneplustwo: Expr = N(n = 3.0) res5_4: B = B(b = true) v_true: Expr = B(b = true)
Recall that a judgment form (e.g., \(e \Downarrow v\)) is an inductively-defined relation. A particular judgment holds, for example, \[ 1 \mathbin{\texttt{+}} 2 \Downarrow 3 \] when we can find a derivation for it (cf. Section 15.2).
It is essentially undefined behavior when a judgment does not hold. For example, with these rules, there is no derivation for the judgment \[ \mathbf{true} \mathbin{\texttt{+}} 2 \Downarrow v \] for any value \(v\). In code, this might manifest as an exception:
e_trueplustwoval v_trueplustwo = eval(e_trueplustwo)
scala.MatchError: B(true) (of class ammonite.$sess.cmd3$Helper$B) ammonite.$sess.cmd5$Helper.eval(cmd5.sc:6) ammonite.$sess.cmd6$Helper.<init>(cmd6.sc:2) ammonite.$sess.cmd6$.<clinit>(cmd6.sc:7)
We see that the particular issue is that \(+\) can only apply to numbers in the \(\TirName{EvalPlus}\) rule: specifically, \(n_1 + n_2\). When the operator does not apply to input values, this is called a type error. If we detect type error at run time, then this is called dynamic typing.
In particular, we do not fail haphazardly. Instead, we want to identify specifically the expression that has the type error:
case class DynamicTypeError(e: Expr) extends Exception {
override def toString: String = s"TypeError: in expression $e"
}
def eval(e: Expr): Expr = e match {
// EvalVal
case v if isValue(v) => v
case Binary(Plus, e1, e2) => {
(eval(e1), eval(e2)) match {
// EvalPlus
case (N(n1), N(n2)) => N(n1 + n2)
// Otherwise, we have a type error.
case _ => throw DynamicTypeError(e)
}
}
}
defined class DynamicTypeError defined function eval
We introduce the exception type DynamicTypeError
so that we can report the specific expression that has the type error.
e_trueplustwoval v_trueplustwo = eval(e_trueplustwo)
TypeError: in expression Binary(Plus,B(true),N(2.0)) ammonite.$sess.cmd7$Helper.eval(cmd7.sc:13) ammonite.$sess.cmd8$Helper.<init>(cmd8.sc:2) ammonite.$sess.cmd8$.<clinit>(cmd8.sc:7)
The \(\TirName{EvalPlus}\) rules above define semantics that does not match JavaScript because they require \(e_1\) and \(e_2\) in \(e_1 \mathbin{\texttt{+}} e_2\) to evaluate to number values (i.e., \(n_1\) and \(n_2\)). JavaScript permits other types of values and then performs a conversion before performing the addition.
Suppose we want to extend the definition of the evaluation judgment form so that there is a derivation of the judgment \[ \mathbf{true} \mathbin{\texttt{+}} 2 \Downarrow v \] for some value \(v\). That is, we want to define type coercions so that we can apply \(n_1 + n_2\) in \(\TirName{EvalPlus}\).
Let us introduce a new judgment form for type coercions: \[ v \rightsquigarrow n \] to say, “Value \(v\) coerces to number \(n\).” In the case that values are numbers or booleans (i.e., \(v\mathrel{::=}n\mid b\)), we define this judgment form for coercions as follows:
\(\fbox{$v \rightsquigarrow n$}\)
\(\inferrule[ToNumberNum]{ }{ n \rightsquigarrow n }\)
\(\inferrule[ToNumberTrue]{ }{ \mathbf{true} \rightsquigarrow 1 }\)
\(\inferrule[ToNumberFalse]{ }{ \mathbf{false} \rightsquigarrow 0 }\)
that we implement with the toNumber
function:
def toNumber(e: Expr): Double = {
require(isValue(e))
match {
e // ToNumberNum
case N(n) => n
// ToNumberTrue
case B(true) => 1
// ToNumberFalse
case B(false) => 0
}
}
defined function toNumber
\(\inferrule[EvalVal]{ \phantom{ \Downarrow} }{ v \Downarrow v }\)
\(\inferrule[EvalPlus]{ e_1 \Downarrow v_1 \and e_2 \Downarrow v_2 \and v_1 \rightsquigarrow n_1 \and v_2 \rightsquigarrow n_2 }{ e_1 \mathbin{\texttt{+}} e_2 \Downarrow n_1 + n_2 }\)
def eval(e: Expr): Expr = e match {
// EvalVal
case v if isValue(v) => v
// EvalPlus
case Binary(Plus, e1, e2) => {
val v1 = eval(e1)
val v2 = eval(e2)
N(toNumber(v1) + toNumber(v2))
}
}
e_trueplustwoval v_trueplustwo = eval(e_trueplustwo)
defined function eval res10_1: Binary = Binary(bop = Plus, e1 = B(b = true), e2 = N(n = 2.0)) v_trueplustwo: Expr = N(n = 3.0)
We can imagine that we also add inference rules for other arithmetic and boolean operators in a similar manner, though we also need a judgment form for coercing into booleans \(v \rightsquigarrow b\) (cf. Figure 18.2).
Let us consider extending the above language with variable uses and binding (as before in Chapter 14):
\[ \begin{array}{rrrl} \text{expressions} & e& \mathrel{::=}& x\mid\mathbf{const}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2 \\ \text{variables} & x, y \end{array} \]
case class Var(x: String) extends Expr // e ::= x
case class ConstDecl(x: String, e1: Expr, e2: Expr) extends Expr // e ::= const x = e1; e2
val e_const_i_two_trueplusi = ConstDecl("i", N(2), Binary(Plus, B(true), Var("i")))
defined class Var defined class ConstDecl e_const_i_two_trueplusi: ConstDecl = ConstDecl( x = "i", e1 = N(n = 2.0), e2 = Binary(bop = Plus, e1 = B(b = true), e2 = Var(x = "i")) )
Because of variables, we need a slightly richer judgment form with an additional parameter: \[ E \vdash e \Downarrow v \] which says informally, “In value environment \(E\), expression \(e\) evaluates to value \(v\).” This relation has three parameters: \(E\), \(e\), and \(v\). The other parts of the judgment is simply punctuation that separates the parameters. The \(\vdash\) symbol is called the “turnstile” symbol.
First, we need to “refactor” the \(\TirName{EvalVal}\) and \(\TirName{EvalPlus}\) rules to account for the additional value environment parameter:
\(\inferrule[EvalVal]{ \phantom{E \vdash \Downarrow} }{ E \vdash v \Downarrow v }\)
\(\inferrule[EvalPlus]{ E \vdash e_1 \Downarrow v_1 \and E \vdash e_2 \Downarrow v_2 \and v_1 \rightsquigarrow n_1 \and v_2 \rightsquigarrow n_2 }{ E \vdash e_1 \mathbin{\texttt{+}} e_2 \Downarrow n_1 + n_2 }\)
Reading the rules bottom-up, observe that we are simply passing the value environment \(E\) into recursive calls of \(E \vdash e \Downarrow v\).
A value environment \(E\) is a finite map from variables \(x\) to values \(v\) and can be described by the following grammar: \[ \begin{array}{rrrl} \text{value environments} & E, \mathit{env}& \mathrel{::=}& \cdot \mid E[x \mapsto v] \end{array} \] We write \(\cdot\) for the empty environment and \(E[x \mapsto v]\) as the environment that maps \(x\) to \(v\) but is otherwise the same as \(E\) (i.e., extends \(E\) with mapping \(x\) to \(v\)). Additionally, we write \(E(x)\) for looking up the value of \(x\) in environment \(E\). More precisely, we can define look up as follows by induction on the structure of \(E\): \[\begin{array}{rcll} E[y \mapsto v](x) & \stackrel{\text{\tiny def}}{=}& v& \text{if $y= x$} \\ E[y \mapsto v](x) & \stackrel{\text{\tiny def}}{=}& E(x) & \text{otherwise} \\ \cdot(x) & \phantom{\stackrel{\text{\tiny def}}{=}} & & \text{undefined} \\ \end{array} \]
While we give a syntax for value environments in the above to define them mathematically, we may choose to implement them in other ways. For example, we choose to represent value environments Env
using the Map[String, Expr]
data type from Scala standard library with lookup
and extend
functions:
type Env = Map[String, Expr]
val empty: Env = Map.empty
def lookup(env: Env, x: String): Expr = env(x)
def extend(env: Env, x: String, v: Expr): Env = {
require(isValue(v))
+ (x -> v)
env }
defined type Env empty: Env = Map() defined function lookup defined function extend
Let us consider inference rules that define evaluating variable uses and variable binding:
\(\inferrule[EvalVar]{ \phantom{\Downarrow} }{ E \vdash x \Downarrow E(x) }\)
\(\inferrule[EvalConstDecl]{ E \vdash e_1 \Downarrow v_1 \and E[x \mapsto v_1] \vdash e_2 \Downarrow v_2 }{ E \vdash \mathbf{const}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2 \Downarrow v_2 }\)
The \(\TirName{EvalVar}\) rule says that a variable use \(x\) evaluates to the value to which it is bound in the environment \(E\). Or operationally, to evaluate a variable use \(x\), look up the value corresponding to \(x\) in the environment \(E\).
The \(\TirName{EvalConstDecl}\) rule is particularly interesting because we see explicitly that \[ \mathbf{const}\;x\;\texttt{=}\;e_1\texttt{;}\;e_2 \] the scope of variable \(x\) is the expression \(e_2\) because \(e_2\) is evaluated in an extended environment with a binding for \(x\).
It is informative to see how the rules correspond to implementing an interpreter:
def eval(env: Env, e: Expr): Expr = e match {
// EvalVal
case v if isValue(v) => v
// EvalPlus
case Binary(Plus, e1, e2) => {
val v1 = eval(env, e1)
val v2 = eval(env, e2)
N(toNumber(v1) + toNumber(v2))
}
// EvalVar
case Var(x) => lookup(env, x)
// EvalConstDecl
case ConstDecl(x, e1, e2) => {
val v1 = eval(env, e1)
eval(extend(env, x, v1), e2)
}
}
e_const_i_two_trueplusival v_const_i_two_trueplusi = eval(empty, e_const_i_two_trueplusi)
assert(v_const_i_two_trueplusi == N(3))
defined function eval res13_1: ConstDecl = ConstDecl( x = "i", e1 = N(n = 2.0), e2 = Binary(bop = Plus, e1 = B(b = true), e2 = Var(x = "i")) ) v_const_i_two_trueplusi: Expr = N(n = 3.0)
Figure 18.1 describes the syntax of a JavaScripty variant with variables, numbers, and booleans using a number of syntactic categories. The main syntactic category is expressions. We consider a program to be an expression. Expressions \(e\) consist of variables, a variable binding expression, value literals, unary operator expressions, binary operator expressions, and a conditional if-then-else expression. Value literals \(v\) can be numbers (double-precision floating point) and booleans. This set of arithmetic and logic expressions is the usual core of any programming language. Strings, side effects, and functions are notably missing.
We give in Figure 18.2, a big-step operational semantics for the JavaScripty variant defined above. That is, we give inference rules that define the evaluation judgment form: \(E \vdash e \Downarrow v\).
In rule \(\TirName{EvalArith}\), we lump all of the arithmetic operators \(\texttt{+}\), \(\texttt{-}\), \(\texttt{*}\), and \(\texttt{/}\) together. We abuse notation here slightly by treating the \(\mathit{bop}\) as the corresponding meta-language operator in \(n_1 \mathbin{\mathit{bop}} n_2\).
It is informative to study the complete set of inference rules and think about how the rules correspond to implementing an interpreter.
Observe that the \(\TirName{EvalAndTrue}\), \(\TirName{EvalAndFalse}\), \(\TirName{EvalOrTrue}\), \(\TirName{EvalOrFalse}\), \(\TirName{EvalIfTrue}\), and \(\TirName{EvalIfFalse}\) rules use coercions to booleans but do not necessarily return boolean. The \(\TirName{EvalEquality}\) rule shows that equality \(\texttt{===}\) and disequality \(\texttt{!==}\) apply to any values without coercion, while the \(\TirName{EvalInequality}\) rule says that inequalites apply after coercing to numbers.
Does this reveal any bugs in your implementation in the previous lab?
Let us consider extending our big-step semantics for JavaScripty with string values.
\[ \begin{array}{rrrl} \text{values} & v& \mathrel{::=}& \mathit{str} \\ \text{strings} & \mathit{str} \end{array} \]
We have seen that in JavaScript some operators are overloaded for numbers and strings (e.g., +
for string concatenation and <
, <=
, >
, and >=
for string comparisons).
Where we need to do some detective work is when these string operations apply with type coercions. When is a value coerced into a number versus a string?
Let’s consider string concatenation +
:
"hello" + 3
3 + "hello"
It appears that string concatenation applies if either operand is a string \(\mathit{str}\).
How about string comparison?
0 < "1"
0 < "hello"
"0" < 1
"a" < "ab"
It appears that string comparison only applies if both operands are strings.
To capture these observations, we replace rules \(\TirName{EvalArith}\) and \(\TirName{EvalInequality}\) in Figure 18.2 with the following rules shown in Figure 18.3.
The rules \(\TirName{EvalPlusString\(_1\)}\) and \(\TirName{EvalPlusString\(_2\)}\) apply string concatenation if either \(e_1\) or \(e_2\) evaluate to strings, whereas \(\TirName{EvalInequalityString}\) applies if and only if both \(e_1\) and \(e_2\) evaluate to strings. The other rules carefully state that number operations apply in all other cases.