Formal Regular Expressions
Part of my notes from ITEC 420 "Computability Theory and Formal Languages" at Radford University, Fall 2015
Syntax
A regular expression is a string of symbols that describes a language (as an arithmetic expression is a string of symbols that describes a numeric value).
For some alphabet \(\Sigma\), a valid (syntactically correct) regular expressions R is:
\(R \rightarrow \varnothing \mid \epsilon \mid symbol \mid R^* \mid R^+ \mid (R) \mid RR \mid R \cup R\)
\(symbol \rightarrow \text{any symbol in } \Sigma\)
This definition can be broken up into base cases and recursive cases:
A regular expression is:
- \(\varnothing\)
- \(\epsilon\)
- Any single symbol in \(\Sigma\)
And if \(\alpha\) and \(\beta\) are regular expressions, then so are:
- \(\alpha\beta\)
- \(\alpha \cup \beta\)
- \(\alpha^*\)
- \(\alpha^+\)
- \((\alpha)\)
Semantics
Assume L is a function that, given some regular expressions, returns the equivalent regular language:
L : reg. expr. -> language
Then each of the forms given in the syntactic definition above has the following meanings:
- \(L(\varnothing) = \{\}\) (the empty language)
- \(L(\epsilon) = \{\epsilon\}\) (language containing the empty string)
- \(\exists c \in \Sigma \rightarrow L(c) = \{c\}\) (the language containing c)
- \(L(\alpha\beta) = L(\alpha)L(\beta)\) (concatenation)
- \(L(\alpha \cup \beta) = L(\alpha) \cup L(\beta)\) (union)
- \(L(\alpha^*) = (L(\alpha))^*\) (Kleene star)
- \(L(\alpha^+) = L(\alpha\alpha^*) = L(\alpha)(L(\alpha))^*\) (Kleene plus)
- \(L((\alpha)) = L(\alpha)\) (grouping with parenthesis)
Operators
Precedence
From highest to lowest:
- Kleene star
- Concatenation
- Union
So \(a^*b \cup c = ((a^*)b) \cup c\)
Properties
Union
Regular expressions describe sets, so union behaves as with sets...
- Commutative: \(a \cup b = b \cup a\)
- Associative: \((a \cup b) \cup c = a \cup (b \cup c)\)
- Idempotent: \(a \cup a = a\)
- \(\varnothing\) for identity: \(a \cup \varnothing = a\)
Concatenation
- Associative: \((ab)c = a(bc)\)
- \(\epsilon\) for identity: \(a\epsilon = a\)
- \(\varnothing\) for "zero": \(a\varnothing = \varnothing\)
- Distributes over union: \((a \cup b) c = (ac) \cup (bc)\), \(c (a \cup b) = (ca) \cup (cb)\)
Kleene star
(these are... less intuitive)
- \(\varnothing^* = \epsilon\)
- \(\epsilon^* = \epsilon\)
- \((a^*)^* = a^*\)
- \(a^*a^* = a^*\)
- \((a \cup b)^* = (a^*b^*)^*\)
- \(a^* \subseteq b^* \Rightarrow a^*b^* = b^*a^* = b^*\)
- \(a \subseteq b^* \Rightarrow a^*b^* = b^*a^* = b^*\)