Meeting 08 - Abstract Syntax and Parsing

Author

Bor-Yuh Evan Chang

Published

Thursday, September 19, 2024

\(\newcommand{\TirName}[1]{\text{#1}} \newcommand{\inferrule}[3][]{ \let\and\qquad \begin{array}{@{}l@{}} \TirName{#1} \\ \displaystyle \frac{#2}{#3} \end{array} } \newcommand{\infer}[3][]{\inferrule[#1]{#2}{#3}} \)

What questions does your neighbor have?

In-Class Slides
In-Class Jupyter
Book Chapter

Announcements

  • HW 2 due this Friday Monday 9/23 6pm

Today

  • Revisit last bit of Concrete Syntax
  • Abstract Syntax
  • Triage Your Questions
    • Using VSCode or the terminal to test your code (on coding.csel.io)?
    • Auto-testing with GitHub Actions?
    • Lab 1?
  • Revisit and Go Deeper On:
    • Concrete Syntax (Meeting 07), if time permits

Questions?

  • Review:
    • How do you show that a grammar is ambiguous?

Questions?

One-Slide Review

Concrete syntax is …

Abstract syntax is …

Parsers convert …, which have deal with …

… and … are ways to to describe and deal with a common form of ambiguity.

Concrete syntax is the surface structure of a language (=strings).

Abstract syntax is the deep structure of a language (=trees/terms).

Parsers convert concrete syntax into abstract syntax, which have deal with ambiguity.

Precedence and associativity are ways to describe and deal with a common form of ambiguity.

Motivation

  • We’ve seen pitfalls when writing grammars.
  • Parser makes life easier down the road in the rest of the tool. We’ve seen it is hard enough to recognize tree structure by itself.

An Ambiguous Grammar

\[\begin{array}{rrrl} \text{expressions} & e & \mathrel{::=}& n \mid e\;\texttt{/}\;e \mid e\;\texttt{-}\;e \end{array}\]

Abstract Syntax Trees

sealed trait Expr                                  // e ::=
case class N(n: Int) extends Expr                  //   n
case class Divide(e1: Expr, e2: Expr) extends Expr // | e1 / e2
case class Minus(e1: Expr, e2: Expr) extends Expr  // | e1 - e2

It matches the ambiguous grammar when read as concrete syntax. But it is also standard to write such a grammar and read it (or expect the reader to read it) as abstract syntax.

Abstract Syntax Trees

Minus(N(10), Divide(N(10), N(10)))

Abstract Syntax Trees versus Parse Trees

Parsing

Take a Theory of Computation course for more language theory. We focus here on the practical aspect of reading and writing BNF grammars.

There are also many advanced parser libraries and generator tools. You might use one in a Compiler Construction course.

An Ambiguous Grammar

\[ \begin{array}{rrrl} \text{expressions} & e & \mathrel{::=}& n \mid e + e \\ \text{numbers} & n \end{array} \]

defined trait Expr
defined class N
defined class Plus

Recursive-Descent Parsing

Automate the top-down, leftmost parsing derivation.

Ask: What’s that?

And trying productions left-to-right until finding a prefix match.

Recursive-Descent Parsing

Two rules:

  1. Best to have an unambiguous grammar.

Why? Predictability.

  1. No left-recursion in the grammar.

Why? Avoid unbounded “lookahead”.

Combinator Parsing

import $ivy.$                                                       

What’s a combinator?

A higher-order function/method.

So a combinator parsing library is simply one where you use it by calling methods with your own callback functions

We’ve seen a higher-order function/method before, right? Yes,

1
2
3

Restricting the Concrete Syntax

\[\begin{array}{rrrl} \text{terms} & t, \mathit{term} & \mathrel{::=}& \mathit{num} \mid\texttt{(}\; \mathit{expr} \;\texttt{)} \\ \text{expressions} & e, \mathit{expr} & \mathrel{::=}& \mathit{term} \mathbin{\texttt{+}} \mathit{term} \\ \text{numbers} & n, \mathit{num} \end{array}\]

Not ambiguous. No left recursion.

These are s-expressions!

Let’s Implement a Parser!

defined object ExprParser
defined object AParser
defined object ExprParser

First, \[ \mathit{num} \]

Then, def parse(s: String): Either[String, String].

Then, \[ \mathit{term} \mathrel{::=}\mathit{num} \]

res18_0: Either[String, String] = Right(value = "1")
res18_1: Either[String, String] = Left(
  value = "Failure: '(' expected but '<' found"
)

Then, def parse(s: String): Either[String, Expr].

res19_0: Either[String, Expr] = Right(value = N(n = 1.0))
res19_1: Either[String, Expr] = Left(
  value = "Failure: '(' expected but '<' found"
)

Then, \[ \mathit{term} \mathrel{::=}\texttt{(}\; \mathit{expr} \;\texttt{)} \]

\[ \mathit{expr} \mathrel{::=}\mathit{term} \mathbin{\texttt{+}} \mathit{term} \]

res20_0: Either[String, Expr] = Right(
  value = Plus(e1 = Plus(e1 = N(n = 1.0), e2 = N(n = 2.0)), e2 = N(n = 3.0))
)
res20_1: Either[String, Expr] = Right(
  value = Plus(e1 = N(n = 1.0), e2 = Plus(e1 = N(n = 2.0), e2 = N(n = 3.0)))
)
res21: Either[String, Expr] = Left(value = "Failure: end of input expected")