Regular Expressions
A regular expression stands for, not one particular character string, but a class of character strings. For example, suppose that you want to find all interactions with U.S. Zip codes in them. U.S. Zip codes are five-digit numbers, so you could in theory write about 9,000 screening rules (Find(“00000”), Find(“00001”), Find(“00002”), and so on).
Fortunately, you can use the special symbol \d, which stands for any digit, to write a screening rule using a regular expression: RegExFind(“\d\d\d\d\d”). This screening rule matches any sequence of five digits.
There are often several different ways of writing the same regular expression.
For instance, two items separated by a hyphen and enclosed in square brackets denotes a range of which the two items are endpoints. So [a-d]
matches a, b, c, or d, and [5-8] matches any digit between 5 and 8; hence \d is the same as [0-9].
The table "Elements of Regular Expressions" lists some of the most commonly-used elements of regular expressions:
Symbol |
Meaning |
Example |
---|---|---|
. |
Any character, including space |
b.t matches bat, bet, bit,
and but.
|
\d |
Any digit |
\d\d matches any pair of digits from 00 to 99. |
\s |
Space |
\d\s\d matches 1 0, 5 9, and so on. |
|
Zero or more instances of the preceding expression |
o*f matches oof, of,
and f.
|
+ |
One or more instances of the preceding expression |
bre+d matches bred, breed andbreeed.
|
? |
Zero or one instances of the preceding expression |
c?rude matches rude
and crude.
|
{x} |
X instances of the preceding expression |
st.{2}k matches steak, stork
, and stink.
|
^ |
Any character except the following |
s[^e]t matches sat, sit,
and sot, but not set.
|
[ ] |
Any characters or ranges within the brackets |
Any characters: b[aeiou]at matches boat but not brat.
|
\ |
Turns off the special meaning of the following symbol |
\* matches the character * (asterisk);\. matches the character . (period or full stop). |
| |
Or |
[b|p]ig matches big and pig. Do not be confused: | means or in regular expressions, but || means or as one of the Operators used in screening rule formulas. |
Here are some other points to keep in mind:
-
Space is just another character. The regular expression savings account contains a space, and so it does not match the string savingsaccount.
-
Word boundaries are not considered. The regular expression read matches not only read,
but also reader, ready, spread, bread, and so on.
-
Use parentheses to group parts of regular expressions together. For example, RegExFind(“(\d{3}\.){2}“) puts \d{3}\. in parentheses so that the number-of-instances item {2} applies to the all of d{3}\., not just to \. This expression matches any group of three digits plus period plus any three digits plus period (for example, 198.351.). Further examples are provided in Examples of Screening Rules.
-
Regular expressions make use of many more special characters and operators than those listed in the table "Elements of Regular Expressions." Much documentation on regular expressions is available on the Web. Because Genesys Knowledge Management uses Java classes for regular expressions, it is best to consult documents describing the particular version of regular expressions used in Java.