advanced regular expressions in .net

24
Advanced Regular Expressions in .NET Patrick Delancy

Upload: patrick-delancy

Post on 12-Apr-2017

320 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Advanced Regular Expressions in .NET

Advanced Regular Expressions in .NET

Patrick Delancy

Page 2: Advanced Regular Expressions in .NET

NOTICE!!!

This slide deck has been adapted from a

presentation that was intended to be given live,

in person…. like with a real person in front of

real people. You know… breathing the same air

and all that.

The key points have been transcribed onto

separate slides, so you still get some benefit

from reading through it all, but you are still

missing out on all of the great stories, witty

banter, hilarious costumes, stunning arias … or

something like that.

If you REALLY want to get the most out of this

presentation, go to patrickdelancy.com and ask

him to come give it to your group!

Page 3: Advanced Regular Expressions in .NET

This presentation will help you understand what Regex

is capable of.

Page 4: Advanced Regular Expressions in .NET

Don’t bother trying to memorize the syntax, just remember the concepts.

Page 5: Advanced Regular Expressions in .NET

Then you can make a more intelligent decision about

when you should and should not use Regex.

Page 6: Advanced Regular Expressions in .NET

Common Features

...but not ubiquitous

● Non-capturing groups

● Look ahead

● Look behind

● Free-spacing

Page 7: Advanced Regular Expressions in .NET

Non-Capturing Groups

^(.*)(@)(.*)$

[email protected]

[email protected][1] = email[2] = @[3] = ddress.com

^(.*)(?:@)(.*)$

[email protected]

[email protected][1] = email[2] = ddress.com

Page 8: Advanced Regular Expressions in .NET

Look Ahead

\b\w+(?=\.) # match the word at end of each sentence# but don’t capture the period.

See Dick. See Jane. See Dick and Jane run.

DickJanerun

Page 9: Advanced Regular Expressions in .NET

Look Behind

(?<=\b19)\d{2}\b # match all years in the 1900’s# capturing only the 2-digit year

1842 1902 1776 1985 2003 1999

028599

Page 10: Advanced Regular Expressions in .NET

Free Spacing (Ignore Pattern Whitespace)

new Regex(@”\b[^@]+ # pattern can now span multiple lines@[^\b]+\b # and include white space for readability

”, RegexOptions.IgnorePatternWhitespace);

Page 11: Advanced Regular Expressions in .NET

Less-Common Features

...in more advanced engines

● Named Captures

● Comments

● Inline Directives

● Conditional Alternation

● Atomic Groups

● Compiled Patterns

● Unicode Categories and

Named Character Blocks

Page 12: Advanced Regular Expressions in .NET

Named Captures

^(?<name>.*)(?:@)(?<domain>.*)$

[email protected]

[email protected][name] = email[domain] = ddress.com

Page 13: Advanced Regular Expressions in .NET

Comments

^.*@.*$ # comment to the end of the line

^.*@(?# this is an inline comment).*$

Page 14: Advanced Regular Expressions in .NET

Inline Directives

John the (?ix) (?: wiser | better and greater | privy )

John the Wiser, John the BetterAndGreater, john the privy, John the Better and Greater

John the WiserJohn the BetterAndGreater

Page 15: Advanced Regular Expressions in .NET

^Type:(?:(?<ssn>SSN)|(?<eid>EID)), ID:(?(ssn)\d{3}\-\d{2}\-\d{4}|[-\d]+)$

Type:SSN, ID:352-23-4567Type:EID, ID:35-2234567Type:SSN, ID:35-2234567Type:EID, ID:???

Conditional Alternation

Page 16: Advanced Regular Expressions in .NET

\b(in|integer|insert)\b

integerintegersininsert

Atomic Grouping / Possessive Quantifiers

\b(?>in|integer|insert)\b

integerintegersininsert

Page 17: Advanced Regular Expressions in .NET

var pattern = new Regex(@”a+h+!+”);

return pattern.IsMatch(value);

Compiled Patterns

var pattern = @”a+h+!+”;

return Regex.IsMatch(pattern, value);

Page 18: Advanced Regular Expressions in .NET

\b(?:\p{IsGreek}+\s?)+\p{Pd}\s(?>\p{IsBasicLatin}+\s?)+

Κατα Μαθθαίον - The Gospel of Matthew

Named Character Blocks & Unicode Groups

Page 19: Advanced Regular Expressions in .NET

Unique Features...in the .NET RegEx engine

● Balancing Groups

● Character Class Subtraction

● Explicit Capture Only

Page 20: Advanced Regular Expressions in .NET

^(?:[^{}]|(?<open>{)|(?<-open>}))*(?(open)(?!))$

{ if (true) { return “A”; } else { return “B”; } }{ if (true) { return “A”; } else { return “B”; }

Balancing Groups

Page 21: Advanced Regular Expressions in .NET

[0-9-[1-8]]

0123456789

[0-9-[1-8-[2-7]]]

0123456789

Character Class Subtraction

[\w-[aeiou]]

Lazy dog, quick fox, blah,blah, blah.

Page 22: Advanced Regular Expressions in .NET

^(?<name>[^@\+]+(\+[^\+]+)?)@(?<domain>(\w+)\.(com|net|org))$

[email protected]

[email protected][name] = e+mail[2] = +mail[domain] = ddress.com[4] = ddress[5] = com

Explicit Capture Only

(?n)^(?<name>[^@\+]+(\+[^\+]+)?)@(?<domain>(\w+)\.(com|net|org))$

[email protected]

[email protected][name] = e+mail[domain] = ddress.com

Page 24: Advanced Regular Expressions in .NET

Some Additional Resources

• https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines - This is a little outdated, but still a good overview of how Regex implementations vary.

• https://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#SupportedNamedBlocks –Here is a reference of all of the named Unicode blocks that .NET supports in Regex. Linked here because I told you I would : )

• http://www.regular-expressions.info/refflavors.html - This is a very comprehensive reference for many common Regex engines. Some content may be out of date as new versions of each platform are released.

• http://www.regexplanet.com/ - An online pattern tester. Not the best interface, but very capable and has some nice features.