regular expressions. overview regular expressions allow you to do complex searches within text...

16
Regular Expressions

Upload: esmond-edwards

Post on 05-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Regular Expressions

Page 2: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Overview

Regular expressions allow you to do complex searches within text documents.

Examples: Search 8-K filings for restatements a Boolean search of “restate” would yield too

many “false positives.” Regular expressions provide tremendous

flexibility.

Page 3: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Getting Started

Open your “RegexBuddy” program. We are going to build regular expressions

to find specific text in this document using a variety of “Tokens.”

Page 4: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Specifying Literal Text

Literal defined - A literal just means that the characters are to be interpreted “as is.” The application will not attempt to interpret the character.

For example, suppose you where looking for the “\t” You need to tell the the application that you

are looking for “\t” and not a tab space because \t typically represents a tab space

Page 5: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Specifying Literal Text

Click on “Insert Token” then click on Literal Text.

In the text box, type “\t” and click OK You will see “\\t” in the window regular

expression window. The first “\” tells the Perl to interpret the following “\” literally.

Page 6: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Non-printable characters

\t – Tab \r – Carriage return \n – Newline (UNIX/Linux) \r\n – Newline (Windows)

Page 7: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Dot and Short-Hand Character Classes

. Match any character but newline (unless modified with s) Short-Hand Character Classes \w Match any word character (includes numbers and “_”). \W Match any non-word character

\d Match a digit character \D Match a non-digit character \s Match a whitespace character \S Match a non-whitespace character

Page 8: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Character Class and Anchors

Character Class [456] - matches 4, 5 or 6. [^456] - matches anything but 4, 5 or 6. Create an expression that matches either

“Balls” or “Balks” Anchors

• \A – beginning of the string• \z – end of the string• ^ - beginning of the line• $ - end of the line.

Page 9: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Alternation

Alternation is essentially “OR.” | - is inserted between alternatives. Boy|Girl – matches “Boy” or “Girl”

Page 10: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Quantifiers

x? Match 0 or 1 x x* Match 0 or more occurrences of x x+ Match 1 or more occurrences of x (xyz)+ Match 1 or more occurrences of xyz x{m,n} Matches at least m occurrences of x

up to n occurrences of x

Page 11: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Grouping and Backreferencing (string) - use for backreferencing $1 - reference to contents of first set

of parentheses $2 - reference to contents of second

set of parentheses. In regex toolkit

Put the following in the regular expression window:(.*)\s(.*)

Put the following in the “Test” window:John Smith

Select Group 2 from the highlight drop-down.

Page 12: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Greediness Normally, expressions match as many

characters as possible (they are greedy).$_=“ab12345AB”The regex ab[0-9]* will replace as follows:XAB

We can turn off greediness by adding a “?” after the greedy character (*).The regex s/ab[0-9]*?/X will replace as follows:X12345AB

Page 13: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Substitution of subpatterns

Remember using () causes Perl to remember the contents.

Suppose we want to replace Fred with Freddy? Put “(Fred)” in the regular expression window Put \1dy in the replace window Put Fred Couples in the Test window

Page 14: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Look Ahead and Look Behind

Allows you to check ahead or back for a particular pattern before continuing match.

/PATTERN(?=pattern)/ Positive look ahead

/PATTERN(?!pattern)/ Negative look ahead

(?<=pattern)PATTERN/ Positive look behind

(?<!pattern)PATTERN/ Negative look behind

Page 15: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Mode Modifiers

Dot match new lines (s in Perl) Case insensitive (i in Perl) ^$ match at line breaks (m in Perl) Free-spacing (x in Perl)

Page 16: Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements

Note on Regex

Regular expressions can be used on many platforms (besides Perl).

For example, there are built in Perl regular expressions from within SAS.