advanced regular expressions in .net
TRANSCRIPT
Advanced Regular Expressions in .NET
Patrick Delancy
NOTICE!!!
This slide deck has been adapted from a
presentation that was intended to be given live,
in person…. like with a real person in front of
real people. You know… breathing the same air
and all that.
The key points have been transcribed onto
separate slides, so you still get some benefit
from reading through it all, but you are still
missing out on all of the great stories, witty
banter, hilarious costumes, stunning arias … or
something like that.
If you REALLY want to get the most out of this
presentation, go to patrickdelancy.com and ask
him to come give it to your group!
This presentation will help you understand what Regex
is capable of.
Don’t bother trying to memorize the syntax, just remember the concepts.
Then you can make a more intelligent decision about
when you should and should not use Regex.
Common Features
...but not ubiquitous
● Non-capturing groups
● Look ahead
● Look behind
● Free-spacing
Non-Capturing Groups
^(.*)(@)(.*)$
[email protected][1] = email[2] = @[3] = ddress.com
^(.*)(?:@)(.*)$
[email protected][1] = email[2] = ddress.com
Look Ahead
\b\w+(?=\.) # match the word at end of each sentence# but don’t capture the period.
See Dick. See Jane. See Dick and Jane run.
DickJanerun
Look Behind
(?<=\b19)\d{2}\b # match all years in the 1900’s# capturing only the 2-digit year
1842 1902 1776 1985 2003 1999
028599
Free Spacing (Ignore Pattern Whitespace)
new Regex(@”\b[^@]+ # pattern can now span multiple lines@[^\b]+\b # and include white space for readability
”, RegexOptions.IgnorePatternWhitespace);
Less-Common Features
...in more advanced engines
● Named Captures
● Comments
● Inline Directives
● Conditional Alternation
● Atomic Groups
● Compiled Patterns
● Unicode Categories and
Named Character Blocks
Comments
^.*@.*$ # comment to the end of the line
^.*@(?# this is an inline comment).*$
Inline Directives
John the (?ix) (?: wiser | better and greater | privy )
John the Wiser, John the BetterAndGreater, john the privy, John the Better and Greater
John the WiserJohn the BetterAndGreater
^Type:(?:(?<ssn>SSN)|(?<eid>EID)), ID:(?(ssn)\d{3}\-\d{2}\-\d{4}|[-\d]+)$
Type:SSN, ID:352-23-4567Type:EID, ID:35-2234567Type:SSN, ID:35-2234567Type:EID, ID:???
Conditional Alternation
\b(in|integer|insert)\b
integerintegersininsert
Atomic Grouping / Possessive Quantifiers
\b(?>in|integer|insert)\b
integerintegersininsert
var pattern = new Regex(@”a+h+!+”);
return pattern.IsMatch(value);
Compiled Patterns
var pattern = @”a+h+!+”;
return Regex.IsMatch(pattern, value);
\b(?:\p{IsGreek}+\s?)+\p{Pd}\s(?>\p{IsBasicLatin}+\s?)+
Κατα Μαθθαίον - The Gospel of Matthew
Named Character Blocks & Unicode Groups
Unique Features...in the .NET RegEx engine
● Balancing Groups
● Character Class Subtraction
● Explicit Capture Only
^(?:[^{}]|(?<open>{)|(?<-open>}))*(?(open)(?!))$
{ if (true) { return “A”; } else { return “B”; } }{ if (true) { return “A”; } else { return “B”; }
Balancing Groups
[0-9-[1-8]]
0123456789
[0-9-[1-8-[2-7]]]
0123456789
Character Class Subtraction
[\w-[aeiou]]
Lazy dog, quick fox, blah,blah, blah.
^(?<name>[^@\+]+(\+[^\+]+)?)@(?<domain>(\w+)\.(com|net|org))$
[email protected][name] = e+mail[2] = +mail[domain] = ddress.com[4] = ddress[5] = com
Explicit Capture Only
(?n)^(?<name>[^@\+]+(\+[^\+]+)?)@(?<domain>(\w+)\.(com|net|org))$
[email protected][name] = e+mail[domain] = ddress.com
Patrick Delancy
patrickdelancy.com
This Presentation:
patrickdelancy.com/presentations/...
@patrickdelancy
linkedin.com/in/patrickdelancy
google.com/+patrickdelancy
Some Additional Resources
• https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines - This is a little outdated, but still a good overview of how Regex implementations vary.
• https://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#SupportedNamedBlocks –Here is a reference of all of the named Unicode blocks that .NET supports in Regex. Linked here because I told you I would : )
• http://www.regular-expressions.info/refflavors.html - This is a very comprehensive reference for many common Regex engines. Some content may be out of date as new versions of each platform are released.
• http://www.regexplanet.com/ - An online pattern tester. Not the best interface, but very capable and has some nice features.