Close
Skip to content

Regular Expression Reference

regex-500px

Regular Expressions (REs) provide a mechanism to select specific strings from a set of character strings. RE’s are a context-independent syntax that can represent a wide variety of character sets and character set orderings, where these character sets are interpreted according to the current locale. While many regular expressions can be interpreted differently depending on the current locale, many features, such as character class expressions, provide for contextual invariance across locales.

The Basic Regular Expression (BRE) notation and construction rules in Basic Regular Expressions shall apply to most utilities supporting regular expressions. Some utilities, instead, support the Extended Regular Expressions (ERE) described in Extended Regular Expressions.

The use of regular expressions is generally associated with text processing. REs (BREs and EREs) operate on text strings; that is, zero or more characters followed by an end-of-string delimiter (typically NUL). Some utilities employing regular expressions limit the processing to lines; that is, zero or more characters followed by a <newline>. In the regular expression processing described in IEEE Std 1003.1-2001, the <newline> is regarded as an ordinary character and both a period and a non-matching list can match one. The Shell and Utilities volume of IEEE Std 1003.1-2001 specifies within the individual descriptions of those standard utilities employing regular expressions whether they permit matching of <newline>s; if not stated otherwise, the use of literal <newline>s or any escape sequence equivalent produces undefined results. Those utilities (like grep) that do not allow <newline>s to match are responsible for eliminating any <newline> from strings before matching against the RE.
RegularExpressionReference

Reference for foma regular expression syntax.

Regular expression operators

Concatenation (X Y)

The language or relation X concatenated with Y. The operator is not overtly signaled by spacing, etc., and two adjacent regular expressions will be concatenated regardless of white space when found at the level of precedence of the concatenation operator by the regular expression parser.

Restrictions: none

Kleene star (X*)

Zero or more iterations of X.

Restrictions: none

Kleene plus (X+)

One or more iterations of X. Equivalent to [X X*].

Restrictions: none

Iteration operators

In addition to the Kleene closures, there are a number of m-n-ary iteration operators:

 

Iteration operators
X^n exactly n iterations of X
X^>n at least n iterations of X
X^<n less than n iterations of X
X^{m,n} between m and n iterations of X

 

Restrictions: none

Union (X | Y)

Returns the union (disjunction) of FSMs X and Y.

Notational variants: XY (UNION, U+222A), XY (LOGICAL OR, U+2228)

Restrictions: none

Intersection (X & Y)

Returns the intersection (conjunction) of FSMs X and Y.

Notational variants: XY (UNION, U+2229), XY (LOGICAL AND, U+2227)

Restrictions: foma always calculates the path intersection of two FSMs. This may not be the logical intersection of the relations X and Y if X and Y are transducers and not automata.

Subtraction (X – Y)

Returns the difference of FSMs X and Y.

Restrictions: foma always calculates the path subtraction of two FSMs. This may not be the logical difference of the relations X and Y if X and Y are transducers and not automata.

Complement (~X)

Calculates the complement set of X.

Notational variants: ¬X (NOT SIGN, U+00AC)

Restrictions: undefined for non-identity transducers.

Composition (X .o. Y)

Calculates the composite transducer of relations X and Y.

Notational variants: XY (RING OPERATOR, U+2218)

Restrictions: none

Substitution ([X,Y,Z])</h2> Substitutes all occurrences of symbol <tt>Y</tt> in FSM <tt>X</tt> by <tt>Z</tt>. Restrictions: none <h2><a name="Optionality_(X)"></a>Optionality (X)</h2> Defines the language or relation that contains zero or one iteration of <tt>X</tt>. Equivalent to <tt>[X | 0]</tt>. Restrictions: none <h2><a name="Domain_extraction_(X.u)"></a>Domain extraction (X.u)</h2> Extracts the domain from a FSM. Notational variants: X₁ (SUBSCRIPT ONE, U+2081) Restrictions: none <h2><a name="Range_exctraction_(X.l)"></a>Range exctraction (X.l)</h2> Extracts the range from a FSM. Notational variants: <tt>X</tt>₂ (SUBSCRIPT TWO, U+2082) Restrictions: none <h2><a name="Inversion_(X.i)"></a>Inversion (X.i)</h2> Inverts a FSM. Notational variants: <tt>X</tt>⁻¹ (SUPERSCRIPT MINUS SUPERSCRIPT ONE, U+207B U+00B9) Restrictions: none <h2><a name="Term_negation_(\X)"></a>Term negation (\X)</h2> Any single symbol except <tt>X</tt>. Equivalent to <tt>[? - X]</tt>. Restrictions: see subtraction <h2><a name="Cross_product_(X:Y)"></a>Cross product (X:Y)</h2> Calculates the cross product (or Cartesian product) of FSMs <tt>X</tt> and <tt>Y</tt> producing a transducer representing the relation where every string in <tt>X</tt> is paired with every string in <tt>Y</tt>. Notational variants: X .x. Y, X × Y (MULTIPLICATION SIGN, U+00D7), both with lower precedence than <tt>:</tt> Restrictions: undefined if <tt>X</tt> and <tt>Y</tt> are not both automata. <h2><a name="Ignore_(X/Y)"></a>Ignore (X/Y)</h2> Denotes the language where instances of Y are arbitrarily interspersed in the language X. Restrictions: not well defined for transducers. <h2><a name="Ignore_inside_(X./.Y)"></a>Ignore inside (X./.Y)</h2> Denotes the language where instances of Y are arbitrarily interspersed in the language X, except that the first symbol and last symbol belong to X. Restrictions: not well-defined for transducers. <h2><a name="Left_quotient_(X\\\Y)"></a>Left quotient (X\\\Y)</h2> The operation <tt>X\\\Y</tt> is defined as: {w <tt>|</tt> ∃x((x ∈ X) ∧ (xw ∈ Y )}. Informally: the set of suffixes one can add to strings in X to get strings from Y. Restrictions: not well-defined for transducers. <h2><a name="Right_quotient_(X///Y)"></a>Right quotient (X///Y)</h2> The operation <tt>X</tt><tt>///</tt><tt>Y</tt> is defined as: {w <tt>|</tt> ∃x((x ∈ Y ) ∧ (wx ∈ X)}. Informally, this is the set of prefixes one can add to Y to get strings from X. Restrictions: not well-defined for transducers. <h2><a name="Precedes_(X<Y)"></a>Precedes (X<Y)</h2> Denotes the languages where every string from <tt>X</tt> precedes every string from <tt>Y</tt>. The precedence need not be immediate. Restrictions: not well-defined for transducers. <h2><a name="Follows_(X>Y)"></a>Follows (X>Y)</h2> Denotes the languages where every string from <tt>X</tt> follows every string from <tt>Y</tt>. The precedence need not be immediate. Restrictions: not well-defined for transducers. <h2><a name="Shuffle_(X_<>_Y)"></a>Shuffle (X <> Y)</h2> The shuffle (or asynchronous) product of <tt>X</tt> and <tt>Y</tt>, i.e. the set of words formed by any method of ‘shuffling’ the contents of <tt>X</tt> with <tt>Y</tt>. The shuffle is not perfect. Notational variants: <tt>X</tt> ∥ <tt>Y</tt> (PARALLEL TO, U+2225) Restrictions: none <h2><a name="Containment_operators_($X,_$.X,_$?X)"></a>Containment operators ($X, $.X, $?X)</h2> The operator $X denotes the language that contains a substring drawn from the language X. This is equivalent to <tt>[?* X ?*]</tt>. The operator <tt>$.X</tt> denotes the language that contains exactly one substring drawn from the language <tt>X</tt>, while <tt>$?X</tt> denotes the language that contains at most one substring from X. Restrictions: <tt>$.X</tt> and <tt>$?X</tt> are not well-defined for transducers. <h2><a name="Context_restriction_(X_=>_Y1_Z1,_...,_Yn_Zn)"></a>Context restriction (X => Y1 <i> Z1, ..., Yn </i> Zn)</h2> Denotes the language where every instance of a string from <tt>X</tt> is surrounded by string from some pair Yi and Zi on the left and right, respectively. Restrictions: not well-defined for transducers. <h2><a name="Priority_unions_(X_.P._Y,_X_.p._Y)"></a>Priority unions (X .P. Y, X .p. Y)</h2> Th upper-side priority union <tt>X .P. Y</tt> denotes the union of relations <tt>X</tt> and <tt>Y</tt>, with relations in Y discarded if a relation in X have the same input (domain). Equivalent to <tt>[X | [ ̃[X.u] .o. Y]]</tt>. The lower-side priority union is similar, except a relation in X has precedence over a relation in Y based on the range, not the domain. Restrictions: none <hr /> <img src="https://fomafst.github.io/regexref-priorityunion.png" /> <hr /> <h2><a name="Lenient_composition_(X_.O._Y)"></a>Lenient composition (X .O. Y)</h2> <h2></h2> The 'lenient' composition of <tt>X</tt> with <tt>Y</tt>. For those relations where strings in the domain of Y does not include some possible string from the range of <tt>X</tt>, the relation <tt>X</tt> is not composed with <tt>Y</tt>. Equivalent to <tt>[X .o. Y] .P. Y</tt>. See Karttunen (1998) for details and usage. <h2><a name="Logical_connectives_(X_→_Y,_X_↔_Y)"></a>Logical connectives (X → Y, X ↔ Y)</h2> The implication operator and the biconditional are shorthands whose definitions are as follows: <ul> <li><tt>X</tt> → <tt>Y</tt> is equivalent to <tt>~X | Y</tt></li> <li><tt>X</tt> ↔ <tt>Y</tt> is equivalent to <tt>[~X | Y] & [~Y | X]</tt></li> </ul> <h2><a name="Flag_elimination_(X.f)"></a>Flag elimination (X.f)</h2> Eliminates all flag diacritics in <tt>X</tt> and calculates the equivalent FSM without flags. If no flags are present, no action is taken. Restrictions: none <h2><a name="Replacement_operators"></a>Replacement operators</h2> The most general format of a replacement rule is <pre class="prettyprint">A -> B || L _ R ;</pre> which dictates the replacement of sequences <tt>A</tt> with <tt>B</tt> when occurring between <tt>L</tt> and <tt>R</tt>. The replacement arrow (<tt>-></tt>) can be of various types indicating different replacement strategies, optional replacement (->), longest-leftmost <tt>@-></tt>, shortest-leftmost <tt>@></tt>, etc. The directionality constraint <tt>||</tt> can also vary depending on the application. All replacement operators allow context specification through one of the directionality markers. The directionality constraint DIR can be one of:   <table class="wikitable"> <tbody> <tr> <td>DIR</td> <td>Interpretation</td> </tr> <tr> <td><tt>|</tt><tt>|</tt></td> <td>left and right contexts hold on input side</td> </tr> <tr> <td><tt>\\</tt></td> <td>left holds on input side, right holds on output side</td> </tr> <tr> <td><tt>//</tt></td> <td>left holds on output side, right holds on input side</td> </tr> <tr> <td><tt>\/</tt></td> <td>left holds on output side, right holds on output side</td> </tr> </tbody> </table>   The following replacement operators are available:   <table class="wikitable"> <tbody> <tr> <td>Operator</td> <td>Type</td> </tr> <tr> <td><tt>-></tt></td> <td>Unconditional replacement</td> </tr> <tr> <td><tt> <- </tt></td> <td>Unconditional inverse replacement</td> </tr> <tr> <td><tt> <-> </tt></td> <td>Unconditional replacement and inverse replacement</td> </tr> <tr> <td><tt> (->) </tt></td> <td>Optional replacement</td> </tr> <tr> <td><tt> (<-) </tt></td> <td>Optional inverse replacement</td> </tr> <tr> <td><tt> (<->) </tt></td> <td>Optional replacement and inverse replacement</td> </tr> <tr> <td><tt> @-> </tt></td> <td>Left-to-right longest-match replacement</td> </tr> <tr> <td><tt> @> </tt></td> <td>Left-to-right shortest-match replacement</td> </tr> <tr> <td><tt> <-@ </tt></td> <td>Left-to-right longest-match inverse replacement</td> </tr> <tr> <td><tt> <@ </tt></td> <td>Left-to-right shortest-match inverse replacement</td> </tr> <tr> <td><tt> (@->) </tt></td> <td>Optional left-to-right longest-match replacement</td> </tr> <tr> <td><tt> (@>) </tt></td> <td>Optional left-to-right shortest-match replacement</td> </tr> <tr> <td><tt> (<-@) </tt></td> <td>Optional left-to-right longest-match inverse replacement</td> </tr> <tr> <td><tt> (<@) </tt></td> <td>Optional left-to-right shortest-match replacement</td> </tr> </tbody> </table>   <h3><a name="Markup_(_..._)"></a>Markup (<tt>...</tt>)</h3> The LHS/RHS of a replacement rule may also be of the format <pre class="prettyprint">X -> Y ... Z ;</pre> in which case both <tt>Y</tt> and <tt>Z</tt> are inserted around instances of <tt>X</tt>. <tt>Y</tt> or <tt>Z</tt> may be omitted and all contextual specifications are available. <h3><a name="Epsilon_modifier_[._.]"></a>Epsilon modifier <tt>[. .]</tt></h3> The LHS of a rule may be wrapped in the <i>epsilon modifier</i>, in which case any epsilons on the LHS get a special interpretation, where only one empty string is assumed to exists between each symbol in the input string. For example, the rule: <pre class="prettyprint">[.a*.] -> x</pre> will produce a transducer that maps the input string <tt>a</tt> unambiguously to <tt>xxx</tt>. Also, <tt>[..]</tt> will simply produce a rule that inserts one instance of the RHS whenever the context is matched: <pre class="prettyprint">[..] -> x</pre> will map <tt>aaa</tt> to <tt>xaxaxax</tt>. <h3><a name="Multiple_contexts"></a>Multiple contexts</h3> Several possible contexts can be specified by separating the contexts with a comma. Example: <pre class="prettyprint">a -> b || c _ d , e _ f ;</pre> <h3><a name="Multiple_left-hand_sides"></a>Multiple left-hand sides</h3> Several left hand sides of a rule can be combined with a comma. In such a rule, the context specification aplies to all left-hand sides. Example: <pre class="prettyprint">a -> b , b -> a || c _ d ;</pre> <h3><a name="Parallel_rules_(,,)"></a>Parallel rules (,,)</h3> Separate rules can be compiled in parallel by separating each individual rules with a double comma (<tt>,,</tt>), i.e. <pre class="prettyprint">Rule1 ,, Rule2 ,, ... ,, RuleN ;</pre> <h3><a name="Transducers_with_backreferences"></a>Transducers with backreferences</h3> Rules of the format <pre class="prettyprint">T -> || L _ R ;</pre> are also possible. Here, T is usually a transducer. The contexts can be omitted, and several contexts can be specified as with ordinary rules, and the arrow can take any of the standard arrow types. The semantics of the rule is that the strings from the input side of <tt>T</tt> are replaced with T. This allows for a more general specification of chunking and insertion rules. For example, the rule <pre class="prettyprint">0:%[ Chunk 0:%] @-> ;</pre> is identical to the chunking rule <pre class="prettyprint">Chunk @-> %{ ... %}</pre> This type of rule allows for replacements that are otherwise hard to define. Gerdemann and van Noord(1999), who proposed the rule semantics, provide the following example: suppose we have a transducer of word-sequence/abbreviation pairs such as <pre class="prettyprint">define Abbr {nondeterministic finite automaton}:{NFA} | {deterministic finite automaton}:{DFA} ;</pre> Now, if we wanted to convert all known word sequences into the known acronyms whenever they occur between <tt><abbr> ... </abbr></tt> tags, we can issue the rule: <pre class="prettyprint">Abbr -> || {<abbr>} _ {</abbr>}</pre> which would for example transduce <tt><abbr>nondeterministic finite automaton</abbr></tt> to <tt><abbr>NFA</abbr></tt>. <h3><a name="Word-boundary_marker_(_.#._)"></a>Word-boundary marker (<tt>.#.</tt>)</h3> The special word-boundary marker may be used in context specification of both the context restriction operator as well as replacement rules. Example: <pre class="prettyprint">a => _ c | .#. ;</pre> specifies the language where <tt>a</tt> must always be followed by <tt>c</tt> or a word boundary. The word-boundary marker has no special meaning in other regular expression constructs; however, any <tt>.#.</tt>-symbols found in a context specification will always be interpreted as a word boundary, and the symbol is removed from the alphabet at the end of replacement rule or context restriction compilation. <h1><a name="Built-in_functions"></a>Built-in functions</h1> All built-in regular expression function names begin with an underscore (<tt>_</tt>). <h2><a name="_isunambiguous(X)"></a><tt>_isunambiguous(X)</tt></h2> Returns the empty string (ε) if <tt>X</tt> is an unambiguous transducer, or the empty language (∅) otherwise. <h2><a name="_isidentity(X)"></a><tt>_isidentity(X)</tt></h2> Returns the empty string (ε) if <tt>X</tt> is an identity transducer, or the empty language (∅) otherwise. <h2><a name="_isfunctional(X)"></a><tt>_isfunctional(X)</tt></h2> Returns the empty string (ε) if <tt>X</tt> is a functional (single-valued) transducer, or the empty language (∅) otherwise. <h2><a name="_notid(X)"></a><tt>_notid(X)</tt></h2> Returns an automaton containing all the words in <tt>X</tt> that do not map to themselves (are not in an identity relation). <h2><a name="_lm(X)"></a><tt>_lm(X)</tt></h2> Returns the letter machine equivalent to <tt>X</tt>. A letter machine is a FSM where every transition contains maximally one UTF-8 symbol. <h2><a name="_loweruniq(X)"></a><tt>_loweruniq(X)</tt></h2> Modifies <tt>X</tt> in such a way that each input word maps to a unique output word. Lower-side symbols are replaced by arbitrary symbols from the alphabet. In case the output side cannot be made unique using only existing symbols in the alphabet, the alphabet is extended with new random symbols to achieve uniqueness. <h2><a name="_allfinal(X)"></a><tt>_allfinal(X)</tt></h2> Returns the same FSM as <tt>X</tt>, with the exception that all states are marked as final states. <h2><a name="_unambpart(X)"></a><tt>_unambpart(X)</tt></h2> Returns an FSM containing only those paths in <tt>X</tt> that are unambiguous. That is, a mapping in <tt>X</tt> is preserved only if its input string has a unique path through the transducer. Example: <pre class="prettyprint">_unambpart(a:b | a:c | b:c);</pre> returns a transducer equivalent to <pre class="prettyprint">b:c</pre> since <tt>b:c</tt> is the only unambiguous path with respect to the input side. <h2><a name="_ambpart(X)"></a><tt>_ambpart(X)</tt></h2> Returns an FSM containing only those paths in <tt>X</tt> that are ambiguous. That is, a mapping in <tt>X</tt> is preserved only if its input string has at least two paths through the transducer. Example: <pre class="prettyprint">_ambpart(a:b | a:c | b:c);</pre> returns a transducer equivalent to <pre class="prettyprint">a:b | a:c</pre> That is, the path containing <tt>b:c</tt> is removes since it is unambiguous. <h2><a name="_ambdom(X)"></a><tt>_ambdom(X)</tt></h2> Returns an automaton containing all words in the domain of <tt>X</tt> that yield an ambiguous path through <tt>X</tt>. <pre class="prettyprint">_ambdom(a:b | a:c | b:c);</pre> returns a an automaton equivalent to <pre class="prettyprint">a</pre> <h2><a name="_eq(X,L,R)"></a><tt>_eq(X,L,R)</tt></h2> Filters from the output side of <tt>X</tt> all those strings where some substrings occurring between the delimiters <tt>L</tt> and <tt>R</tt> are different. Example: Consider the language <tt>%< a* b %> %< a b* %></tt>, which contains an infinite number of strings: <pre class="prettyprint"><b><a> <b><ab> <ab><a> <ab><ab> <ab><abbb> ...</pre> However, only one of the strings in this language has identical substrings between all instances of <tt><</tt> and <tt>></tt>, namely <tt><ab><ab></tt>. Hence, the language containing the single string <tt><ab><ab></tt> is produced by the regular expression:   <pre class="prettyprint">_eq(%< a* b %> %< a b* %> , %< , %>) ;</pre> This operation is mostly used to model reduplication in natural language lexicons. Usually, the bare words to be reduplicated are marked with delimiters, say <tt><</tt> and <tt>></tt>, after which one can produce the reduplicated forms. For example: <pre class="prettyprint">define Lexicon {cat}|{dog}|{horse}; define RLexicon %< Lexicon %> (%- %< \[%<|%>]+ %>); regex _eq(RLexicon, %<, %>) .o. %<|%> -> 0 ;</pre> and now we get: <pre class="prettyprint">foma[1]: lower-words cat cat-cat dog dog-dog horse horse-horse</pre> <h2><a name="_flatten(X,_EPS)"></a><tt>_flatten(X, EPS)</tt></h2> This function converts a transducer into an automaton where input-output symbols are interleaved. Since epsilons cannot be retained, they need to be converted to the <tt>EPS</tt> symbol. For example: <pre class="prettyprint">regex _flatten(a:0 0:b c, "EPS");</pre> produces an automaton which accepts only the word: <pre class="prettyprint">a EPS EPS b c c</pre> <h2><a name="_addloop(L,_a:b)"></a><tt>_addloop(L, a:b)</tt></h2> This function adds self-loops to all states in <tt>L</tt> with the symbol pair <tt>a:b</tt>. <tt>a</tt> or <tt>b</tt> may be <tt>0</tt>. <h2><a name="_addfinalloop(L,_a:b)"></a><tt>_addfinalloop(L, a:b)</tt></h2> Like <tt>_addloop(L, a:b)</tt> but only adds loops at final states. <h2><a name="_addnonfinalloop(L,_a:b)"></a><tt>_addnonfinalloop(L, a:b)</tt></h2> Like <tt>_addloop(L, a:b)</tt> but only adds loops at nonfinal states. <h2><a name="_leftrewr(L,_a:b)"></a><tt>_leftrewr(L, a:b)</tt></h2> This is a fast low-level single-symbol rewrite with left contexts only. It is sometimes useful for constructing complex transducers faster than with the generic operations. <pre class="prettyprint">_leftrewr(L, a:b)</pre> is semantically equivalent to <pre class="prettyprint">a -> b || .#. L _</pre> while <pre class="prettyprint">_leftrewr(?* L, a:b)</pre> is equivalent to: <pre class="prettyprint">a -> b || L _</pre> <h2><a name="_marktail(L,_a:b)"></a><tt>_marktail(L, a:b)</tt></h2> The function converts L into a single-symbol insertion transducer. As the above, this is a fast state/arc manipulation function that produces a transducer equivalent to a certain operation. In this case: <pre class="prettyprint"> _marktail(?* L, 0:x)</pre> is equivalent to <pre class="prettyprint">~$x .o. [..] -> x || L _ ;</pre> and <pre class="prettyprint">_marktail(?* R.r, 0:x).r</pre> is equivalent to <pre class="prettyprint">~$x .o. [..] -> x || _ R</pre> <h1><a name="First-order_logic_over_substrings_(experimental)"></a>First-order logic over substrings (experimental)</h1> <i>SUPPORT FOR FIRST-ORDER LOGIC DISCONTINUED from 0.9.16 onward</i> Foma also has a compiler for a type of first-order logic over substrings. Statements in the first-order logic must begin with a parenthesized quantification of a variable. When compiling statements in first-order logic, parenthesis symbols <tt>(</tt> and <tt>)</tt> lose their optionality meaning and work as grouping symbols. See the article on first-order logic for details.   <table class="wikitable"> <tbody> <tr> <td>Connectives</td> <td></td> </tr> <tr> <td><tt>∨</tt></td> <td>Disjunction</td> </tr> <tr> <td><tt>∧</tt></td> <td>Conjunction</td> </tr> <tr> <td><tt>→</tt></td> <td>Implication</td> </tr> <tr> <td><tt>↔</tt></td> <td>Biconditional</td> </tr> <tr> <td><tt>¬</tt></td> <td>Negation</td> </tr> </tbody> </table>     <table class="wikitable"> <tbody> <tr> <td>Quantifiers</td> <td>Type</td> </tr> <tr> <td>∀</td> <td>universal</td> </tr> <tr> <td>∃</td> <td>existential</td> </tr> </tbody> </table>     <table class="wikitable"> <tbody> <tr> <td>Available predicates</td> </tr> <tr> <td><tt>x ∈ L</tt></td> <td>the substring x is a member of language L</td> </tr> <tr> <td><tt>_S(x,L)</tt></td> <td>the substring x is followed immediately by a substring from L</td> </tr> <tr> <td><tt>x = y</tt></td> <td>the position of substrings x and y are identical</td> </tr> <tr> <td><tt>x ≠ y</tt></td> <td>equivalent to (x = y)</td> </tr> <tr> <td><tt>x ≺ y</tt></td> <td>substring x precedes y</td> </tr> <tr> <td><tt>x ≻ y</tt></td> <td>substring x succeeds y</td> </tr> <tr> <td><tt>x ≤ y</tt></td> <td>substring x precedes or is in equal position with y</td> </tr> <tr> <td><tt>x ≥ y</tt></td> <td>substring x succeeds or is in equal position with y</td> </tr> </tbody> </table>   Examples: <pre class="prettyprint">regex (∃x)(∃y)(x ∈ a ∧ y ∈ a ∧ x ≠ y);</pre> <ul> <li>denotes the languages that contains two instances of the string <tt>a</tt>, in different positions.</li> </ul> <pre class="prettyprint">regex (∃x)(x ∈ a) & (∃y)(y ∈ b);</pre> <ul> <li>denotes the language that contains both a substring <tt>a</tt> and a substring <tt>b</tt>.</li> </ul> <pre class="prettyprint">regex (∃x)(x ∈ L ∧ ¬(∃y)(y ∈ L ∧ ¬(x = y) ) );</pre> <ul> <li>the language where only exactly one substring from the language <tt>L</tt> is present. Equivalent to <tt>$.L</tt>.</li> </ul> <pre class="prettyprint">regex (∀y)( (y ∈ x) → (_S(a,y) ∧ _S(y,b) ) ∨ (_S(c,y) ∧ _S(y,d) ) );</pre> <ul> <li>the language where each instance of the string <tt>x</tt> is surrounded by <tt>a</tt> and <tt>b</tt>, or <tt>c</tt> and <tt>d</tt>. Equivalent to:</li> </ul> <pre class="prettyprint">regex x => a _ b , c _ d;</pre> <h1><a name="Operator_precedence"></a>Operator precedence</h1> The following table gives the operator precedence of all regular expression operators, from highest to lowest.   <table class="wikitable"> <tbody> <tr> <td>Operator precedence</td> </tr> <tr> <td><tt>\ </tt> : + * %%^%% .l .u .i .f .r ~ $ $. $? / /// \\\ /\/ (concatenation) > < | & - .P. .p. => -> (->) @-> etc. <> .x. .o. .O.

 

Reserved symbols

The following is a table of all reserved symbols in foma regular expressions, showing the character, the official Unicode character name, and its Code Point. These need to be escaped (by % or enclosing a string in quotes) for their literal meaning in regular expressions.

 

Character Character Name Code point
! EXCLAMATION MARK U+0021
" QUOTATION MARK U+0022
# NUMBER SIGN U+0023
$ DOLLAR SIGN U+0024
% PERCENT SIGN U+0025
& AMPERSAND U+0026
( LEFT PARENTHESIS U+0028
) RIGHT PARENTHESIS U+0029
* ASTERISK U+002A
+ PLUS SIGN U+002B
, COMMA U+002C
- HYPHEN-MINUS U+002D
. FULL STOP U+002E
/ SOLIDUS U+002F
0 DIGIT ZERO U+0030
: COLON U+003A
; SEMICOLON U+003B
< LESS-THAN SIGN U+003C
> GREATER-THAN SIGN U+003E
? QUESTION MARK U+003F
[ LEFT SQUARE BRACKET U+005B
\ REVERSE SOLIDUS U+005C
] RIGHT SQUARE BRACKET U+005D
^ CIRCUMFLEX ACCENT U+005E
_ LOW LINE U+005F
` GRAVE ACCENT U+0060
{ LEFT CURLY BRACKET U+007B
|| VERTICAL LINE U+007C
} RIGHT CURLY BRACKET U+007D
~ TILDE U+007E
¬ NOT SIGN U+00AC
¹ SUPERSCRIPT ONE U+00B9
× MULTIPLICATION SIGN U+00D7
Σ GREEK CAPITAL LETTER SIGMA U+03A3
ε GREEK SMALL LETTER EPSILON U+03B5
SUPERSCRIPT MINUS U+207B
SUBSCRIPT ONE U+2081
SUBSCRIPT TWO U+2082
RIGHTWARDS ARROW U+2192
LEFT RIGHT ARROW U+2194
FOR ALL U+2200
THERE EXISTS U+2203
EMPTY SET U+2205
ELEMENT OF U+2208
RING OPERATOR U+2218
PARALLEL TO U+2225
LOGICAL AND U+2227
LOGICAL OR U+2228
INTERSECTION U+2229
UNION U+222A
LESS-THAN OR EQUAL TO U+2264
GREATER-THAN OR EQUAL TO U+2265
PRECEDES U+227A
SUCCEEDS U+227B