Perl Regular Expressions
contents of a variable to a regular expression, use the =~ operator. Regular expressions are also used by perl built in functions such as grep and split, and by the s operator.
Perl uses a very full set of elements within its regular expressions, most of which are terse so hard for the newcomer to follow when maintaining code. It predates, so does not follow, the POSIX standard.
Perl 6, currently under development, will support grammars and rules rather than regular expressions. Grammars and Rules will take pattern matching to a whole new level, and tools will be available to covert code - in other words, rules and grammars will do everything that the old Regular Expressions didn’t, and more.
|
Operator Type |
Examples |
Description |
|
Literal Characters |
a A y 6 % @ |
Letters, digits and many special |
|
$ ^ + \ ? |
Precede other special characters |
|
|
n t r |
Literal new line, tab, return |
|
|
cJ cG |
Control codes |
|
|
xa3 |
Hex codes for any character |
|
|
Anchors and assertions |
^ |
Starts with |
|
$ |
Ends with |
|
|
b B |
on a word boundary, |
|
|
Character groups |
[aAeEiou] |
any character listed from [ to ] |
|
[^aAeEiou] |
any character except aAeEio or u |
|
|
[a-fA-F0-9] |
any hex character (0 to 9 or a to f) |
|
|
. |
any character at all |
|
|
s |
any space character (space n r or t) |
|
|
w |
any word character (letter digit or _) |
|
|
d |
any digit (0 through 9) |
|
|
S W D |
any character that is NOT a space |
|
|
Counts |
+ |
1 or more (“some”) |
|
* |
0 or more (“perhaps some”) |
|
|
? |
0 or 1 (“perhaps a”) |
|
|
{4} |
exactly 4 |
|
|
{4,} |
4 or more |
|
|
{4,8} |
between 4 and 8 |
|
|
Add a ? after any count to turn it sparse (match as few as possible) rather than have it default to greedy |
||
|
Alternation |
| |
either, or |
|
Grouping |
( ) |
group for count and save to variable |
|
(?: ) |
group for count but do not save |
|
|
Variables |
$xyz |
Insert contents of $xyz into regular expression |
|
1 2 |
Back reference to 1st, 2nd etc matched groups |
|
After the closing / of your regular expression, you can add one or more modifiers to change its behaviour.
|
Modifier |
Description |
|
i |
Ignore case in matching |
|
g |
Global match. Return a list of all matches (list context) or return the next match (scalar context) |
|
x |
White space is to be treated as a comment (otherwise it matches exactly) |
|
s |
. to match everything including new line (otherwise it matches everything except new line) |
|
m |
^ and $ to match embedded new lines |
|
o |
Tell compiler that regular expression doesn’t change even if it includes a variable reference |
|
e |
s command only. Execute the output before you substitute it in |
The following Perl functions and operators use regular expressions
|
Function / Operator |
use |
|
|
If you write a regular expression without an operator, it matches the regular expression against the contents of the $_ variable. |
|
=~ |
Match the regular expression to the right against the variable to the left |
|
s |
Substitute the matched regular expression with a replacement string |
|
grep |
Filter a list for all member scalars that match the regular expression |
|
split |
split a scalar into a list, dividing the elements at the regular expression |
The above lists show the most commonly used elements of Perl regular expressions, and are not exhaustive.
In Perl, you can change the / regular expression delimiter to almost any other special character if you preceed it with the letter m (for match); if you change to ( { or [, the balancing end expression character becomes ) } or ].
Awesome!
Awesome!