Tuesday, February 27, 2007

Perl Regular Expressions

contents of a variable to a regular expression, use the =~ operator. Regular expressions are also used by perl built in functions such as grep and split, and by the s operator.

Perl uses a very full set of elements within its regular expressions, most of which are terse so hard for the newcomer to follow when maintaining code. It predates, so does not follow, the POSIX standard.

Perl 6, currently under development, will support grammars and rules rather than regular expressions. Grammars and Rules will take pattern matching to a whole new level, and tools will be available to covert code - in other words, rules and grammars will do everything that the old Regular Expressions didn’t, and more.

Operator Type

Examples

Description

Literal Characters
Match a character exactly

a A y 6 % @

Letters, digits and many special
characters match exactly

$ ^ + \ ?

Precede other special characters
with a to cancel their regex special meaning

n t r

Literal new line, tab, return

cJ cG

Control codes

xa3

Hex codes for any character

Anchors and assertions

^

Starts with

$

Ends with

b B

on a word boundary,
NOT on a word boundary

Character groups
any 1 character from the group

[aAeEiou]

any character listed from [ to ]

[^aAeEiou]

any character except aAeEio or u

[a-fA-F0-9]

any hex character (0 to 9 or a to f)

.

any character at all
(not new line in some circumstances)

s

any space character (space n r or t)

w

any word character (letter digit or _)

d

any digit (0 through 9)

S W D

any character that is NOT a space
word character or digit

Counts
apply to previous element

+

1 or more (“some”)

*

0 or more (“perhaps some”)

?

0 or 1 (“perhaps a”)

{4}

exactly 4

{4,}

4 or more

{4,8}

between 4 and 8

Add a ? after any count to turn it sparse (match as few as possible) rather than have it default to greedy

Alternation

|

either, or

Grouping

( )

group for count and save to variable

(?: )

group for count but do not save

Variables

$xyz

Insert contents of $xyz into regular expression

1 2

Back reference to 1st, 2nd etc matched groups


After the closing / of your regular expression, you can add one or more modifiers to change its behaviour.

Modifier

Description

i

Ignore case in matching

g

Global match. Return a list of all matches (list context) or return the next match (scalar context)

x

White space is to be treated as a comment (otherwise it matches exactly)

s

. to match everything including new line (otherwise it matches everything except new line)

m

^ and $ to match embedded new lines

o

Tell compiler that regular expression doesn’t change even if it includes a variable reference

e

s command only. Execute the output before you substitute it in

The following Perl functions and operators use regular expressions

Function / Operator

use


If you write a regular expression without an operator, it matches the regular expression against the contents of the $_ variable.

=~

Match the regular expression to the right against the variable to the left

s

Substitute the matched regular expression with a replacement string

grep

Filter a list for all member scalars that match the regular expression

split

split a scalar into a list, dividing the elements at the regular expression


The above lists show the most commonly used elements of Perl regular expressions, and are not exhaustive.

In Perl, you can change the / regular expression delimiter to almost any other special character if you preceed it with the letter m (for match); if you change to ( { or [, the balancing end expression character becomes ) } or ].

Posted by Babai at 09:28:57
Comments

2 Responses to “Perl Regular Expressions”

  1. tor7861 says:

    Awesome!

  2. tor7861 says:

    Awesome!

Leave a Reply