beneath the surface - rubyconf 2013

171
Photo By Mr. Christopher Thomas Creative Commons Attribution-ShareALike 2.0 Generic License Beneath the Surface Regular Expressions in Ruby @nellshamrell

Upload: nell-shamrell

Post on 14-Jun-2015

1.218 views

Category:

Technology


3 download

DESCRIPTION

This is the final version of this talk, given at RubyConf 2013 Many of us approach regular expressions with a certain fear and trepidation, using them only when absolutely necessary. We can get by when we need to use them, but we hesitate to dive any deeper into their cryptic world. Ruby has so much more to offer us. This talk showcases the incredible power of Ruby and the Onigmo regex library Ruby runs on. It takes you on a journey beneath the surface, exploring the beauty, elegance, and power of regular expressions. You will discover the flexible, dynamic, and eloquent ways to harness this beauty and power in your own code.

TRANSCRIPT

Page 1: Beneath the Surface - Rubyconf 2013

Photo By Mr. Christopher ThomasCreative Commons Attribution-ShareALike 2.0 Generic License

Beneath the SurfaceRegular Expressions in Ruby

@nellshamrell

Page 2: Beneath the Surface - Rubyconf 2013

^4[0-9]{12}(?:[0-9]{3})?$

Source: regular-expressions.info

Page 3: Beneath the Surface - Rubyconf 2013

We fear what we do not understand

Page 4: Beneath the Surface - Rubyconf 2013
Page 5: Beneath the Surface - Rubyconf 2013

Regular Expressions

+ Ruby

Photo By ShayanCreative Commons Attribution-ShareALike 2.0 Generic License

Page 6: Beneath the Surface - Rubyconf 2013

Regex Matching in Ruby

RubyMethods

Onigmo

Page 7: Beneath the Surface - Rubyconf 2013

Onigmo

Page 8: Beneath the Surface - Rubyconf 2013

Oniguruma

OnigmoFork

Page 9: Beneath the Surface - Rubyconf 2013

Onigmo

Reads Regex

Page 10: Beneath the Surface - Rubyconf 2013

Onigmo

Reads Regex

AbstractSyntax

Tree

ParsesInto

Page 11: Beneath the Surface - Rubyconf 2013

Onigmo

Reads Regex

AbstractSyntax

Tree

Series ofInstructions

ParsesInto

CompilesInto

Page 13: Beneath the Surface - Rubyconf 2013

A Finite State Machine Shows How

Something Works

Page 14: Beneath the Surface - Rubyconf 2013

Annie the Dog

Page 15: Beneath the Surface - Rubyconf 2013

In the House

Out of House

Annie the Dog

Page 16: Beneath the Surface - Rubyconf 2013

In the House

Out of House

Annie the Dog

Door

Page 17: Beneath the Surface - Rubyconf 2013

In the House

Out of House

Annie the Dog

Door

Door

Page 18: Beneath the Surface - Rubyconf 2013

Finite

State

Machine

Page 19: Beneath the Surface - Rubyconf 2013

Finite

State

Machine

Page 20: Beneath the Surface - Rubyconf 2013

Finite

State

Machine

Page 21: Beneath the Surface - Rubyconf 2013

Multiple States

Page 22: Beneath the Surface - Rubyconf 2013

/force/

Page 23: Beneath the Surface - Rubyconf 2013

re = /force/string = “Use the force”re.match(string)

Page 24: Beneath the Surface - Rubyconf 2013

f o r c e

/force/

“Use the force”

Path Doesn’t Match

Page 25: Beneath the Surface - Rubyconf 2013

f o r c e

/force/

“Use the force”

Still Doesn’t Match

Page 26: Beneath the Surface - Rubyconf 2013

f o r c e

/force/

“Use the force”

Path Matches!

(Fast Forward)

Page 27: Beneath the Surface - Rubyconf 2013

f o r c e

/force/

“Use the force”

Page 28: Beneath the Surface - Rubyconf 2013

f o r c e

/force/

“Use the force”

Page 29: Beneath the Surface - Rubyconf 2013

f o r c e

/force/

“Use the force”

Page 30: Beneath the Surface - Rubyconf 2013

f o r c e

/force/

“Use the force”

Page 31: Beneath the Surface - Rubyconf 2013

f o r c e

/force/

“Use the force”

We Have A Match!

Page 32: Beneath the Surface - Rubyconf 2013

re = /force/string = “Use the force”re.match(string)=> #<MatchData “force”>

Page 34: Beneath the Surface - Rubyconf 2013

/Y(olk|oda)/

Pipe

Page 35: Beneath the Surface - Rubyconf 2013

re = /Y(olk|oda)/string = “Yoda”re.match(string)

Page 36: Beneath the Surface - Rubyconf 2013

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

Page 37: Beneath the Surface - Rubyconf 2013

Y oo

l k

d a

/Y(olk|oda)/

Which To Choose?

“Yoda”

Page 38: Beneath the Surface - Rubyconf 2013

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”Saves To Backtrack

Stack

Page 39: Beneath the Surface - Rubyconf 2013

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”Uh Oh, No Match

Page 40: Beneath the Surface - Rubyconf 2013

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”Backtracks To Here

Page 41: Beneath the Surface - Rubyconf 2013

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

Page 42: Beneath the Surface - Rubyconf 2013

Y ol k

d a

/Y(olk|oda)/

“Yoda”

o

Page 43: Beneath the Surface - Rubyconf 2013

Y ol k

d a

/Y(olk|oda)/

“Yoda”

We Have A Match!

o

Page 44: Beneath the Surface - Rubyconf 2013

re = /Y(olk|oda)/string = “Yoda”re.match(string)=> #<MatchData “Yoda”>

Page 46: Beneath the Surface - Rubyconf 2013

/No+/

PlusQuantifier

Page 47: Beneath the Surface - Rubyconf 2013

re = /No+/string = “Noooo”re.match(string)

Page 48: Beneath the Surface - Rubyconf 2013

N o

o

/No+/

“Noooo”

Page 49: Beneath the Surface - Rubyconf 2013

N o

o

/No+/

“Noooo”

Page 50: Beneath the Surface - Rubyconf 2013

N o

o

/No+/

“Noooo”

Return Match? Or Keep Looping?

Page 51: Beneath the Surface - Rubyconf 2013

N o

o

/No+/

“Noooo”

Greedy Quantifier

KeepsLooping

Page 52: Beneath the Surface - Rubyconf 2013

Greedy quantifiers match as much as possible

Page 53: Beneath the Surface - Rubyconf 2013

Greedy quantifiers use maximum effort for

maximum return

Page 54: Beneath the Surface - Rubyconf 2013

N o

o

/No+/

“Noooo”

Page 55: Beneath the Surface - Rubyconf 2013

N o

o

/No+/

“Noooo”

Page 56: Beneath the Surface - Rubyconf 2013

N o

o

/No+/

“Noooo”

We Have A Match!

Page 57: Beneath the Surface - Rubyconf 2013

re = /No+/string = “Noooo”re.match(string)=> #<MatchData “Noooo”>

Page 58: Beneath the Surface - Rubyconf 2013

Lazy Quantifiers

Page 59: Beneath the Surface - Rubyconf 2013

Lazy quantifiers match as little as possible

Page 60: Beneath the Surface - Rubyconf 2013

Lazy quantifiers use minimum effort for

minimum return

Page 61: Beneath the Surface - Rubyconf 2013

/No+?/

Makes Quantifier

Lazy

Page 62: Beneath the Surface - Rubyconf 2013

re = /No+?/string = “Noooo”re.match(string)

Page 63: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+?/

Page 64: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+?/

Page 65: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+?/

Return Match? Or Keep Looping?

Page 66: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+?/

We Have A Match!

Page 67: Beneath the Surface - Rubyconf 2013

re = /No+?/string = “Noooo”re.match(string)=> #<MatchData “No”>

Page 68: Beneath the Surface - Rubyconf 2013

Greedy quantifiers are greedy but reasonable

Page 69: Beneath the Surface - Rubyconf 2013

/.*moon/

StarQuantifier

Page 70: Beneath the Surface - Rubyconf 2013

re = /.*moon/string = “That’s no moon”re.match(string)

Page 71: Beneath the Surface - Rubyconf 2013

. m o o n

./.*moon/

“That’s no moon”

Page 72: Beneath the Surface - Rubyconf 2013

. m o o n

.

“That’s no moon”

/.*moon/

Page 73: Beneath the Surface - Rubyconf 2013

. m o o n

.

“That’s no moon”

Loops

/.*moon/

Page 74: Beneath the Surface - Rubyconf 2013

. m o o n

. Which To Match?

(Fast Forward)

“That’s no moon”

/.*moon/

Page 75: Beneath the Surface - Rubyconf 2013

. m o o n

.

“That’s no moon”

Keeps Looping

/.*moon/

Page 76: Beneath the Surface - Rubyconf 2013

. m o o n

.

“That’s no moon”

Keeps Looping

/.*moon/

Page 77: Beneath the Surface - Rubyconf 2013

. m o o n

.

“That’s no moon”

Keeps Looping

/.*moon/

Page 78: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”No More

Characters?

./.*moon/

Page 79: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”

Backtrack or Fail?./.*moon/

Page 80: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”Backtracks

./.*moon/

Page 81: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”Backtracks

./.*moon/

Page 82: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”Backtracks

./.*moon/

Page 83: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”Backtracks

Huzzah!./.*moon/

Page 84: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”

./.*moon/

Page 85: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”

./.*moon/

Page 86: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”

./.*moon/

Page 87: Beneath the Surface - Rubyconf 2013

. m o o n

“That’s no moon”

. We Have A Match!

/.*moon/

Page 88: Beneath the Surface - Rubyconf 2013

re = /.*moon/string = “That’s no moon”re.match(string)=> #<MatchData “That’s no moon”>

Page 89: Beneath the Surface - Rubyconf 2013

Backtracking = Slow

Page 90: Beneath the Surface - Rubyconf 2013

/No+w+/

Page 91: Beneath the Surface - Rubyconf 2013

re = /No+w+/string = “Noooo”re.match(string)

Page 92: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

w

Page 93: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

w

Page 94: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

wLoops

Page 95: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

wLoops

Page 96: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

wLoops

Page 97: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

w

Uh Oh

Page 98: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

w

Uh Oh

Backtrack or Fail?

Page 99: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

wBacktracks

Page 100: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

wBacktracks

Page 101: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

wBacktracks

Page 102: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

/No+w+/

w

w

Match FAILS

Page 103: Beneath the Surface - Rubyconf 2013

Possessive Quantifers

Page 104: Beneath the Surface - Rubyconf 2013

Possessive quantifiers do not backtrack

Page 105: Beneath the Surface - Rubyconf 2013

Makes Quantifier Possessive

/No++w+/

Page 106: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

w

w

/No++w+/

Page 107: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

w

w

/No++w+/

Page 108: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

w

wLoops

/No++w+/

Page 109: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

w

wLoops

/No++w+/

Page 110: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

w

wLoops

/No++w+/

Page 111: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

w

w

/No++w+/

Page 112: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

w

wLoops

Uh Oh

Backtrack or Fail?

/No++w+/

Page 113: Beneath the Surface - Rubyconf 2013

N o

o“Noooo”

w

w

Match FAILS

/No++w+/

Page 114: Beneath the Surface - Rubyconf 2013

Possessive quantifiers fail faster by

controlling backtracking

Page 115: Beneath the Surface - Rubyconf 2013

Use possessive quantifers with caution

Page 116: Beneath the Surface - Rubyconf 2013
Page 118: Beneath the Surface - Rubyconf 2013
Page 119: Beneath the Surface - Rubyconf 2013

snake_case to CamelCase

Page 120: Beneath the Surface - Rubyconf 2013

Find first letter of string and capitalize it

snake_case to CamelCase

Page 121: Beneath the Surface - Rubyconf 2013

Find first letter of string and capitalize it

Find any character that follows an underscore and capitalize it

snake_case to CamelCase

Page 122: Beneath the Surface - Rubyconf 2013

Find first letter of string and capitalize it

Find any character that follows an underscore and capitalize it

Remove underscores

snake_case to CamelCase

Page 123: Beneath the Surface - Rubyconf 2013

Find first letter of string and capitalize it

snake_case to CamelCase

Page 124: Beneath the Surface - Rubyconf 2013

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter.upcase_chars(ʺ″methodʺ″)

result.should == ʺ″Methodʺ″

case_converter_spec.rb

before(:each) do

end@case_converter = CaseConverter.new

Page 125: Beneath the Surface - Rubyconf 2013

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter.upcase_chars(ʺ″methodʺ″)

result.should == ʺ″Methodʺ″

case_converter_spec.rb

before(:each) do

end@case_converter = CaseConverter.new

Page 126: Beneath the Surface - Rubyconf 2013

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter.upcase_chars(ʺ″methodʺ″)

result.should == ʺ″Methodʺ″

case_converter_spec.rb

before(:each) do

end@case_converter = CaseConverter.new

Page 127: Beneath the Surface - Rubyconf 2013

/ /\A

Anchors Match To

Beginning Of String

Page 128: Beneath the Surface - Rubyconf 2013

Matches Any Word

Character

/ /\A\w

Page 129: Beneath the Surface - Rubyconf 2013

case_converter.rb

def upcase_chars(string)

end

re = / /\w\Astring.gsub(re){|char| char.upcase}

Page 130: Beneath the Surface - Rubyconf 2013

case_converter.rb

def upcase_chars(string)

endstring.gsub(re){|char| char.upcase}re = / \w\A /

Page 131: Beneath the Surface - Rubyconf 2013

case_converter.rb

def upcase_chars(string)

endstring.gsub(re){|char| char.upcase}

Spec Passes!

re = / \w\A /

Page 132: Beneath the Surface - Rubyconf 2013

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter

result.should == ʺ″_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″_methodʺ″)

Page 133: Beneath the Surface - Rubyconf 2013

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter

result.should == ʺ″_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″_methodʺ″)

Page 134: Beneath the Surface - Rubyconf 2013

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter

result.should == ʺ″_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″_methodʺ″)

Spec Fails!

Page 135: Beneath the Surface - Rubyconf 2013

Expected: ʺ″_Methodʺ″Got: ʺ″_methodʺ″

Spec Failure:

Page 136: Beneath the Surface - Rubyconf 2013

Problem:Matches Letters AND Underscores

/ /\A\w

Page 137: Beneath the Surface - Rubyconf 2013

Matches Only

Lowercase Letters

/ /\A[a-z]

Page 138: Beneath the Surface - Rubyconf 2013

Matches an underscore

/ /\A [a-z]_

Page 139: Beneath the Surface - Rubyconf 2013

?

Makes underscore optional

/ /\A [a-z]_

Page 140: Beneath the Surface - Rubyconf 2013

case_converter.rb

def upcase_chars(string)

end

re = string.gsub(re){|char| char.upcase}

/ /[a-z]\A _?

Page 141: Beneath the Surface - Rubyconf 2013

case_converter.rb

def upcase_chars(string)

endstring.gsub(re){|char| char.upcase}

Spec Passes!

re = / /[a-z]\A _?

Page 142: Beneath the Surface - Rubyconf 2013

Find any character that follows an underscore and capitalize it

snake_case to CamelCase

Page 143: Beneath the Surface - Rubyconf 2013

it ʺ″capitalizes letters after an underscoreʺ″ do

end

result = @case_converter

result.should == ʺ″Some_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″some_methodʺ″)

Page 144: Beneath the Surface - Rubyconf 2013

it ʺ″capitalizes letters after an underscoreʺ″ do

end

result = @case_converter

result.should == ʺ″Some_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″some_methodʺ″)

Page 145: Beneath the Surface - Rubyconf 2013

?/ /\A [a-z]_

Page 146: Beneath the Surface - Rubyconf 2013

Pipe For Alternation

|[a-z]?/ /\A [a-z]_

Page 147: Beneath the Surface - Rubyconf 2013

Look Behind

| [a-z]?/ /\A [a-z]_ (?<=_)

Page 148: Beneath the Surface - Rubyconf 2013

case_converter.rb

def upcase_chars(string)

end

re = string.gsub(re){|char| char.upcase}

| [a-z](?<=_)/ /[a-z]\A _?

Page 149: Beneath the Surface - Rubyconf 2013

case_converter.rb

def upcase_chars(string)

endstring.gsub(re){|char| char.upcase}

Spec Passes!

re = | [a-z](?<=_)/ /[a-z]\A _?

Page 150: Beneath the Surface - Rubyconf 2013

Remove underscores

snake_case to CamelCase

Page 151: Beneath the Surface - Rubyconf 2013

it ʺ″removes underscoresʺ″ do

end

result = @case_converter

result.should == ʺ″somemethodʺ″

case_converter_spec.rb

.rmv_underscores(ʺ″some_methodʺ″)

Page 152: Beneath the Surface - Rubyconf 2013

it ʺ″removes underscoresʺ″ do

end

result = @case_converter

result.should == ʺ″somemethodʺ″

case_converter_spec.rb

.rmv_underscores(ʺ″some_methodʺ″)

Page 153: Beneath the Surface - Rubyconf 2013

it ʺ″removes underscoresʺ″ do

end

result = @case_converter

result.should == ʺ″somemethodʺ″

case_converter_spec.rb

.rmv_underscores(ʺ″some_methodʺ″)

Page 154: Beneath the Surface - Rubyconf 2013

MatchesAn

Underscore

/ /_

Page 155: Beneath the Surface - Rubyconf 2013

case_converter.rb

def rmv_underscores(string)

end

re = string.gsub(re, “”)

/ /_

Page 156: Beneath the Surface - Rubyconf 2013

case_converter.rb

def rmv_underscores(string)

endstring.gsub(re, “”)re = / /_

Page 157: Beneath the Surface - Rubyconf 2013

case_converter.rb

def rmv_underscores(string)

endstring.gsub(re, “”)

Spec Passes!

re = / /_

Page 158: Beneath the Surface - Rubyconf 2013

Combine results of two methods

snake_case to CamelCase

Page 159: Beneath the Surface - Rubyconf 2013

it ʺ″converts snake_case to CamelCaseʺ″ do

end

result = @case_converter

result.should == ʺ″SomeMethodʺ″

case_converter_spec.rb

.snake_to_camel(ʺ″some_methodʺ″)

Page 160: Beneath the Surface - Rubyconf 2013

it ʺ″converts snake_case to CamelCaseʺ″ do

end

result = @case_converter

result.should == ʺ″SomeMethodʺ″

case_converter_spec.rb

.snake_to_camel(ʺ″some_methodʺ″)

Page 161: Beneath the Surface - Rubyconf 2013

it ʺ″converts snake_case to CamelCaseʺ″ do

end

result = @case_converter

result.should == ʺ″SomeMethodʺ″

case_converter_spec.rb

.snake_to_camel(ʺ″some_methodʺ″)

Page 162: Beneath the Surface - Rubyconf 2013

case_converter.rb

def snake_to_camel(string)

endupcase_chars(string)

Page 163: Beneath the Surface - Rubyconf 2013

case_converter.rb

def snake_to_camel(string)

endupcase_chars(string)rmv_underscores( )

Page 164: Beneath the Surface - Rubyconf 2013

case_converter.rb

def snake_to_camel(string)

endupcase_chars(string)rmv_underscores( )

Spec Passes!

Page 167: Beneath the Surface - Rubyconf 2013

Develop regular expressions in small pieces

Page 168: Beneath the Surface - Rubyconf 2013
Page 169: Beneath the Surface - Rubyconf 2013

If you write code, you can write regular expressions

Page 170: Beneath the Surface - Rubyconf 2013

Move beyond the fear