flyinghyrax.net

Formal Regular Expressions

Part of my notes from ITEC 420 "Computability Theory and Formal Languages" at Radford University, Fall 2015

Syntax

A regular expression is a string of symbols that describes a language (as an arithmetic expression is a string of symbols that describes a numeric value).

For some alphabet \(\Sigma\), a valid (syntactically correct) regular expressions R is:

\(R \rightarrow \varnothing \mid \epsilon \mid symbol \mid R^* \mid R^+ \mid (R) \mid RR \mid R \cup R\)
\(symbol \rightarrow \text{any symbol in } \Sigma\)

This definition can be broken up into base cases and recursive cases:

A regular expression is:

And if \(\alpha\) and \(\beta\) are regular expressions, then so are:

Semantics

Assume L is a function that, given some regular expressions, returns the equivalent regular language:

L : reg. expr. -> language

Then each of the forms given in the syntactic definition above has the following meanings:

Operators

Precedence

From highest to lowest:

  1. Kleene star
  2. Concatenation
  3. Union

So \(a^*b \cup c = ((a^*)b) \cup c\)

Properties

Union

Regular expressions describe sets, so union behaves as with sets...

Concatenation

Kleene star

(these are... less intuitive)