an introduction to regular expressions
DESCRIPTION
An Introduction to Regular expressionsTRANSCRIPT
And are they contagious?
There is no official standard for
regular expressions, so no real
definition.
Simply put, you can call it a
text pattern to search and/or
replace text.
Easy peasy!
Perl programming language
Perl-compatible
.NET
Java
JavaScript
… What, no cherry flavour?
Back to grammar school!
a matches any occurrence of that character Jack is a boy. cat matches About cats and dogs.
square bracket [ backslash \ caret ^ dollar sign $ period or dot . vertical bar or pipe symbol | question mark ? asterisk or star * plus sign + opening round bracket ( closing round bracket ) opening curley bracket {
Special characters are reserved for special use. They need to be preceded by a backslash if you want to match them as literal characters. This is called escaping. If you want to match 1+1=2 the correct regex is 1\+1=2
tab \t carriage return \r line feed \n beginning of line ^ end of line $ word boundary \b
If regular expressions are Unicode enabled you can search any character using the Unicode value. Depending on syntax: \u0000 or \x{0000} Hard space \u00A0 or \x{00A0} ® sign \u00AE or \x{00AE} ...
Quantifiers allow you to specify the number of occurrences to match against X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times
The regex colou?r matches both colour and color. You can also group items together by using brackets: Nov(ember)? will match Nov and November The regex a+ is the same as a{1,} and matches a or aaaaa The regex w{3} matches www.qa-distiller.com
Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey. A character class matches only a single character, the order is not important You can also use ranges. [0-9] matches a single digit between 0 and 9
Typing a caret after the opening square bracket will negate the character class. q[^u] means: "a q followed by a character that is not a u". It will match the q and the space after the q in Iraq is a political quagmire. but not the q of quagmire because it is followed by the letter u
\d digit [0-9] \w word character [A-Za-z0-9_ ] \s whitespace [ \t\r\n] Negated versions \D not a digit [^\d] \W not a word character [^\w] \S not a whitespace [^\s]
The dot matches a single character, without caring what that character is. The regex e. matches Houston, we have a problem
If you want to search for cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog matches Are you sure you want a cat? You can add more options like this: green|black|yellow|white
Which of the following completely matches regex a(ab)*a 1) abababa 2) aaba 3) aabbaa 4) aba 5) aabababa
Which of the following completely matches regex ab+c? 1) abc 2) ac 3) abbb 4) bbc 5) abbcc
Which of the following completely matches regex a.[bc]+ 1) abc 2) abbbbbbbb 3) azc 4) abcbcbcbc 5) ac 6) asccbbbbcbcccc
Which of the following completely matches regex (very )+(fat )?(tall|ugly) man 1) very fat man 2) fat tall man 3) very very fat ugly man 4) very very very tall man
Still awake?
Positive lookahead: X(?=X) Match something that is followed by something Yamagata(?= Europe) matches Yamagata Europe, Yamagata Intech Solutions Negative lookahead: X(?!X) Match something that is not followed by something Yamagata(?! Europe) matches Yamagata Europe, Yamagata Intech Solutions
Positive lookbehind: (?<=X)X Match something following something (?<=a)b matches thingamabob Negative lookbehind: (?<!X)X Match something not following something (?<!a)b matches thingamabob
Round brackets create a backreference. You can use the backreference with a backslash + the number of the backreference. The regex Java(script) is a \1ing language matches Javascript is a scripting language The regex (Java)(script) is a \2ing language that is not the same as \1 matches Javascript is a scripting language that is not the same as Java
Use the regex \b(\w+) \1\b to find doubled words. Ze streelde haar haar in in de auto. With exceptions: \b(?!haar\b)(\w+) \1\b Ze streelde haar haar in in de auto.
You want to add brackets around step numbers: This is step 5 from chapter 1. Continue with step 45 from page 15. Use the regex ([sS]tep) (\d+) to find all instances. Replace it by \1 (\2) Or alternatively (?<=[sS]tep )\d+ by (\0)
Powerful, for individual text-based files
More powerful, batch operations, command line
No back references
RegEx Text File Filter
RegEx search
Very limited
Powerful, called GREP
Some people, when confronted with a problem, think "I know, I'll use regular expressions.“ Now they have two problems. -> Do not try to do everything in one uber-regex -> Regular expressions are not parsers