Formal languages
Languages over an alphabet
The empty set \(\emptyset\) is an initial object in the category of sets Set: for any set \(Y\) there is a unique function \(\emptyset_Y:\emptyset \to Y\). 3
For sets \(X\) and \(Y\), let
\[\mathrm{hom}_{\mathbf{Set}}(X,Y)=Y^X\]be the set of functions with domain \(X\) and image contained in \(Y\). In particular, \(Y^\emptyset = \{\emptyset_Y\}\).
Let \(\mathbb{N}=\{0,1,2,\ldots\}\). For \(n \in \mathbb{N}\), define
\[[n] = \{i \in \mathbb{N} : i < n\}.\]Thus, \([0]=\{\}=\emptyset\), \([1]=\{0\}\), \([2]=\{0,1\}\), \([3]=\{0,1,2\}\), etc.
For \(m,n \in \mathbb{N}\), define
\[[m,n] = \{i \in \mathbb{N}: m \leq i < n\}.\]If \(m \geq n\), then \([m,n]=\emptyset\).
Let \(A\) be a finite set that we shall call an alphabet.
Write \(\epsilon_A=\emptyset_A\), the unique function \(\emptyset \to A\). Thus \(A^\emptyset = \{\epsilon_A\}\).
A word over the alphabet \(A\) of length \(n\), \(n \in \mathbb{N}\), is an element of \(A^{[n]}\). 4
A word over \(A\) of length 0 is an element of \(A^{[0]}\), and
\[A^{[0]} = A^\emptyset = \{\epsilon_A\},\]so the unique word over \(A\) of length 0 is \(\epsilon_A\). We call \(\epsilon_A\) the empty word over the alphabet \(A\).
If \(N\) is an empty set, then as we have stated above, \(\emptyset^N = \{\emptyset\}\). If \(N\) is a nonempty set, then \(\emptyset^N = \emptyset\). Thus, if the alphabet \(A\) is empty, then \(A^{[n]}=\emptyset\) for \(n > 0\).
If \(x \in A^{[m}]\) and \(y \in A^{[n]}\), define the concatenation \(x*y \in A^{[m+n]}\) by
\[(x*y)(i) = \begin{cases} x(i)&i \in [m]\\ y(i-m)&i \in [m,m+n] \end{cases}, \quad i \in [m+n].\]Free monoid on a set
Define
\[A^* = \bigcup_{n \in \mathbb{N}} A^{[n]}.\]The sets in this union are pairwise disjoint; this is true whether or not the alphabet \(A\) is empty, for different reasons. If the alphabet \(A\) is empty, then \(A^{[0]}=\{\epsilon_A\}\), and \(A^{[n]}=\emptyset\) for \(n>0\); and in this case the sets are indeed pairwise disjoint. If the alphabet \(A\) is nonempty, then a word of some length is not a word of any other length, so in this case the sets are pairwise disjoint.
\((A^*,*,\epsilon_A)\) is the free monoid on \(A\). 4 (\(A^*\) is also called the Kleene star of \(A\).5)
For \(x \in A^*\), define \(\mathrm{len}(x)\) to be the unique \(n \in \mathbb{N}\) such that \(x \in A^{[n]}\), the length of a word. 6
If \(A\) is an empty set, then \(A^*=\{\epsilon_A\}\).
If \(A\) is a set with one element, say \(A=\{a\}\), then for each \(n>0\), the unique function \([n] \to A\) is \(i \mapsto a\) for \(i \in [n]\). Thus when \(A=\{a\}\), it is the case that for \(n>0\) the set \(A^{[n]}\) has exactly one element, function
\[i \mapsto a, \quad i \in [n],\]that is, the unique word of length \(n\) over \(A=\{a\}\),
\[\underbrace{a \cdots a}_{n}.\]If \(A=\{a\}\) then \(x \mapsto \mathrm{len}(x)\) is an isomorphism of monoids \(A^* \to \mathbb{N}\).
A language over the alphabet \(A\) is any subset of \(A^*\), the free monoid on \(A\).
Free semigroup on a set
Define
\[A^+ = \bigcup_{n \in \mathbb{N} \setminus\{0\}} A^{[n]}.\]\((A^+,*)\) is the free semigroup on \(A\). 7
\[A^+ = A^* \setminus \{\epsilon_A\}.\]Operations on languages
We remind ourselves that a language over an alphabet \(A\) is any subset of the free monoid \(A^*\). In other words, the power set \(\mathscr{P}(A^*)\) is the set of all languages over an alphabet \(A\).
Union. For \(L_1, L_2 \subset A^*\), the union \(L_1 \cup L_2 \subset A^*\).
Concatenation. For \(L_1, L_2 \subset A^*\),
\[L_1 L_2 = \{x*y : x \in L_1, y \in L_2\} \subset A^*.\]Kleene star. For \(L \subset A^*\), the Kleene star of \(L\) is 5
\[L^* = \bigcup_{n \in \mathbb{N}} L^{[n]} \subset A^*.\]