an introduction to regular expressions

30

Upload: yamagata-europe

Post on 27-May-2015

1.137 views

Category:

Technology


0 download

DESCRIPTION

An Introduction to Regular expressions

TRANSCRIPT

Page 1: An Introduction to Regular expressions
Page 2: An Introduction to Regular expressions

And are they contagious?

Page 3: An Introduction to Regular expressions

There is no official standard for

regular expressions, so no real

definition.

Simply put, you can call it a

text pattern to search and/or

replace text.

Easy peasy!

Page 4: An Introduction to Regular expressions

Perl programming language

Perl-compatible

.NET

Java

JavaScript

… What, no cherry flavour?

Page 5: An Introduction to Regular expressions

Back to grammar school!

Page 6: An Introduction to Regular expressions

a matches any occurrence of that character Jack is a boy. cat matches About cats and dogs.

Page 7: An Introduction to Regular expressions

square bracket [ backslash \ caret ^ dollar sign $ period or dot . vertical bar or pipe symbol | question mark ? asterisk or star * plus sign + opening round bracket ( closing round bracket ) opening curley bracket {

Page 8: An Introduction to Regular expressions

Special characters are reserved for special use. They need to be preceded by a backslash if you want to match them as literal characters. This is called escaping. If you want to match 1+1=2 the correct regex is 1\+1=2

Page 9: An Introduction to Regular expressions

tab \t carriage return \r line feed \n beginning of line ^ end of line $ word boundary \b

Page 10: An Introduction to Regular expressions

If regular expressions are Unicode enabled you can search any character using the Unicode value. Depending on syntax: \u0000 or \x{0000} Hard space \u00A0 or \x{00A0} ® sign \u00AE or \x{00AE} ...

Page 11: An Introduction to Regular expressions

Quantifiers allow you to specify the number of occurrences to match against X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times

Page 12: An Introduction to Regular expressions

The regex colou?r matches both colour and color. You can also group items together by using brackets: Nov(ember)? will match Nov and November The regex a+ is the same as a{1,} and matches a or aaaaa The regex w{3} matches www.qa-distiller.com

Page 13: An Introduction to Regular expressions

Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey. A character class matches only a single character, the order is not important You can also use ranges. [0-9] matches a single digit between 0 and 9

Page 14: An Introduction to Regular expressions

Typing a caret after the opening square bracket will negate the character class. q[^u] means: "a q followed by a character that is not a u". It will match the q and the space after the q in Iraq is a political quagmire. but not the q of quagmire because it is followed by the letter u

Page 15: An Introduction to Regular expressions

\d digit [0-9] \w word character [A-Za-z0-9_ ] \s whitespace [ \t\r\n] Negated versions \D not a digit [^\d] \W not a word character [^\w] \S not a whitespace [^\s]

Page 16: An Introduction to Regular expressions

The dot matches a single character, without caring what that character is. The regex e. matches Houston, we have a problem

Page 17: An Introduction to Regular expressions

If you want to search for cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog matches Are you sure you want a cat? You can add more options like this: green|black|yellow|white

Page 18: An Introduction to Regular expressions

Which of the following completely matches regex a(ab)*a 1) abababa 2) aaba 3) aabbaa 4) aba 5) aabababa

Page 19: An Introduction to Regular expressions

Which of the following completely matches regex ab+c? 1) abc 2) ac 3) abbb 4) bbc 5) abbcc

Page 20: An Introduction to Regular expressions

Which of the following completely matches regex a.[bc]+ 1) abc 2) abbbbbbbb 3) azc 4) abcbcbcbc 5) ac 6) asccbbbbcbcccc

Page 21: An Introduction to Regular expressions

Which of the following completely matches regex (very )+(fat )?(tall|ugly) man 1) very fat man 2) fat tall man 3) very very fat ugly man 4) very very very tall man

Page 22: An Introduction to Regular expressions

Still awake?

Page 23: An Introduction to Regular expressions

Positive lookahead: X(?=X) Match something that is followed by something Yamagata(?= Europe) matches Yamagata Europe, Yamagata Intech Solutions Negative lookahead: X(?!X) Match something that is not followed by something Yamagata(?! Europe) matches Yamagata Europe, Yamagata Intech Solutions

Page 24: An Introduction to Regular expressions

Positive lookbehind: (?<=X)X Match something following something (?<=a)b matches thingamabob Negative lookbehind: (?<!X)X Match something not following something (?<!a)b matches thingamabob

Page 25: An Introduction to Regular expressions

Round brackets create a backreference. You can use the backreference with a backslash + the number of the backreference. The regex Java(script) is a \1ing language matches Javascript is a scripting language The regex (Java)(script) is a \2ing language that is not the same as \1 matches Javascript is a scripting language that is not the same as Java

Page 26: An Introduction to Regular expressions

Use the regex \b(\w+) \1\b to find doubled words. Ze streelde haar haar in in de auto. With exceptions: \b(?!haar\b)(\w+) \1\b Ze streelde haar haar in in de auto.

Page 27: An Introduction to Regular expressions

You want to add brackets around step numbers: This is step 5 from chapter 1. Continue with step 45 from page 15. Use the regex ([sS]tep) (\d+) to find all instances. Replace it by \1 (\2) Or alternatively (?<=[sS]tep )\d+ by (\0)

Page 28: An Introduction to Regular expressions

Powerful, for individual text-based files

More powerful, batch operations, command line

No back references

RegEx Text File Filter

RegEx search

Very limited

Powerful, called GREP

Page 29: An Introduction to Regular expressions

Some people, when confronted with a problem, think "I know, I'll use regular expressions.“ Now they have two problems. -> Do not try to do everything in one uber-regex -> Regular expressions are not parsers