Meeting 07 - Concrete Syntax

Author

Bor-Yuh Evan Chang

Published

Tuesday, September 17, 2024

\(\newcommand{\TirName}[1]{\text{#1}} \newcommand{\inferrule}[3][]{ \let\and\qquad \begin{array}{@{}l@{}} \TirName{#1} \\ \displaystyle \frac{#2}{#3} \end{array} } \newcommand{\infer}[3][]{\inferrule[#1]{#2}{#3}} \)

What questions does your neighbor have?

In-Class Slides
Book Chapter

Announcements

  • HW 2 due this Friday 9/20 6pm
    • Jupyter assignment

Today

  • Preview HW 2
  • Concrete Syntax
  • Triage Your Questions
    • Lab 1?
  • Revisit and Go Deeper On:
    • Inductive Data Types (Meeting 06), if time permits

Questions?

  • Review:
    • What’s an inductive data type?

Questions?

What is the purpose of a programming language specification?

We want to describe a programming language unambigously so that everyone understands how to use it.

Who uses a language specification?

Language designer, language implementer, and language user.

What is the difference between syntax and semantics?

Can a given syntax have two different semantics (in different languages)?

Informally,

  • Syntax is the form of programs (e.g., expressions).
  • Semantics is the meaning of programs (e.g., expressions). How programs evaluate.

What is the difference between concrete and abstract syntax?

Concrete syntax is the surface structure of a language (i.e., sequences/strings).

Abstract syntax is the deep structure of a language (i.e., trees/terms).

Concrete Syntax

Formal Language Terminology

In formal language theory, a language \(L\) is …

… a set of strings of characters in some alphabet \(\Sigma\) (i.e., a subset of \(\Sigma^*\))

For example,

\[ \begin{array}{rrl} \Sigma & \stackrel{\text{\tiny def}}{=}& \{ \texttt{a}, \texttt{b} \} \\ L_1 & \stackrel{\text{\tiny def}}{=}& \{ \texttt{aa}, \texttt{bb} \} \\ L_2 & \stackrel{\text{\tiny def}}{=}& \{ \texttt{a}, \texttt{aa}, \texttt{aaa}, \ldots \} \end{array} \]

Formal Language Terminology

A sentence is …

… a string of a language

\[ \texttt{aa} \in L_1 \]

Grammars

A (context-free) grammar describes …








BNF (Backus-Naur Form) is …

… (context-free) languages

… a notation for (context-free) languages

Grammars

A grammar consists terminals, non-terminals, and productions:

\[ N \mathrel{::=}\alpha_1 \mid\alpha_2 \mid\cdots \]

where \(N \in \mathcal{N}\) is a non-terminal and \(\alpha \in (\Sigma \cup\mathcal{N})^\ast\).

Lexical versus Syntactic

A lexeme is …








A token is …

… a sequence of characters that form a syntactic unit (lowest level unit)

… a category of lexemes

For example,

  • An \(\mathit{if}\) token consists of the lexeme [i, f]
  • A \(\mathit{num}\) token has possible lexemes { [0], [1], [4, 2] }

A lexer (i.e., lexical analyzer)

  • groups characters into lexemes and classifies them into tokens

A parser (i.e., syntactic analyzer)

  • recognizes sequences of tokens

Generators versus Recognizers

A grammar can be viewed as either generating a the set of strings in a language \(L\) or recognizing whether a given string \(s\) is in a language \(L\).

Example Grammars

\[ e \mathrel{::=}n \mid e + e \mid e * e \]

  • What are the terminals and non-terminals?
  • What is \(n\)? It is a terminal. If it is a meta-variable, then it stands for a element in some set and must stated elsewhere.
  • What is the alphabet?

Each non-terminal describes an inductively-defined language.

Parsing Derivations

Give a parsing derivation for 1 + 2 * 3 with \(e \mathrel{::=}n \mid e + e \mid e * e\).

Parse Trees

Give a parse tree for 1 + 2 * 3 with \(e \mathrel{::=}n \mid e + e \mid e * e\).

Ambiguity

What should 1 + 2 * 3 evaluate to (i.e., what is the semantics of 1 + 2 * 3)?

This string can be thought of as either (1 + 2) * 3 or 1 + (2 * 3).

The same string can be read in two different ways!

“Ambiguous grammars need parentheses!”

Ambiguity

Give another parsing derivation for 1 + 2 * 3 with \(e \mathrel{::=}n \mid e + e \mid e * e\).

Ambiguity

Give the corresponding parse tree as the previous derivation for 1 + 2 * 3 with \(e \mathrel{::=}n \mid e + e \mid e * e\).

What’s the “parenthesization”?

Resolving Ambiguity: Associativity

\[ e \mathrel{::=}n \mid e - e \]

Is this grammar ambiguous?

Consider 7 - 3 - 2

What does 7 - 3 - 2 evaluate to?

(7 - 3) - 2 or 7 - (3 - 2)?

Exercise: Can we rewrite the grammar so that we get left associativity?

Is the following unambiguous and what associativity does it have?

\[ e \mid n \mid e - n \]

It has left recursion. Can we rewrite the grammar so that we get right associativity?

Resolving Ambiguity: Precedence

\[ e \mathrel{::=}n \mid n + e \mid n * e \]

Is this grammar ambiguous?

Is it what we want (intuitively)?

Consider 7 + 8 * 5 and 7 * 8 + 5.

Exercise: Write parse trees.

Resolving Ambiguity: Precedence

Consider the ambiguous grammar again:

\[ e \mathrel{::=}n \mid e + e \mid e * e \]

What does this grammar do?

\[ \begin{array}{rrl} e & \mathrel{::=}& t \mid t + e \\ t & \mathrel{::=}& n \mid n * t \end{array} \]

Exercise: Write parse trees for 7 + 8 * 5 and 7 * 8 + 5 using both grammars.

Bonus: Resolving Ambiguity

\[ \begin{array}{rrl} s & \mathrel{::=}& \mathit{ifstmt} \mid\cdots \\ \mathit{ifstmt} & \mathrel{::=}& \texttt{if} \; e \; \texttt{then} \; s \\ & \mid& \texttt{if} \; e \; \texttt{then} \; s \; \texttt{else} \; s \end{array} \]

Ambiguous?

Write Grammars

Write a grammar for a language that has the string \(\texttt{aa}\).

Write Grammars

Write a grammar for a language that has an even number of \(\texttt{a}\)’s.

Write Grammars

Write a grammar for a language that has an odd number of \(\texttt{a}\)’s.

Write Grammars

Write a grammar that generates matching open and close braces { }.