and now you have two problems

27
And now you have two problems Ruby regular expressions for fun and profit Luca Mearelli @lmea Codemotion Rome - 2013

Upload: luca-mearelli

Post on 19-Jan-2015

198 views

Category:

Lifestyle


0 download

DESCRIPTION

A wise hacker said: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Regular expressions are a powerful tool in our hands and a first class citizen in ruby so it is tempting to overuse them. But knowing them and using them properly is a fundamental asset of every developer. We'll see hands-on examples of proper Reg Exps usage in ruby code, we'll look at bad and ugly cases, and learn how to approach writing and debugging them.

TRANSCRIPT

Page 1: And Now You Have Two Problems

And now you have two problems

Ruby regular expressions for fun and profit

Luca Mearelli @lmeaCodemotion Rome - 2013

Page 2: And Now You Have Two Problems

@lmea

Regular expressions

•cat catch indicate ...

•2013-03-22, YYYY-MM-DD, ...

•$ 12,500.80

patterns to describe the contents of a text

Page 3: And Now You Have Two Problems

@lmea

Regexps: good for...

Pattern matching

Search and replace

Page 4: And Now You Have Two Problems

@lmea

Regexp in ruby

Regexp object: Regexp.new("cat")literal notation #1: %r{cat}literal notation #2: /cat/

Page 5: And Now You Have Two Problems

@lmea

Regexp syntax

literals: /cat/ matches any ‘cat’ substring

the dot: /./ matches any character

character classes: /[aeiou]/ /[a-z]/ /[01]/

negated character classes: /[^abc]/

Page 6: And Now You Have Two Problems

@lmea

Regexp syntax

case insensitive: /./i

only interpolate #{} blocks once: /./o

multiline mode - '.' will match newline: /./m

extended mode - whitespace is ignored: /./x

Modifiers

Page 7: And Now You Have Two Problems

@lmea

Regexp syntax

/\d/ digit /\D/ non digit

/\s/ whitespace /\S/ non whitespace

/\w/ word character /\W/ non word character

/\h/ hexdigit /\H/ non hexdigit

Shorthand classes

Page 8: And Now You Have Two Problems

@lmea

Regexp syntax

/^/ beginning of line /$/ end of line

/\b/ word boundary /\B/ non word boundary

/\A/ beginning of string /\z/ end of string

/\Z/end of string. If string ends with a newline,

it matches just before newline

Anchors

Page 9: And Now You Have Two Problems

@lmea

Regexp syntax

alternation: /cat|dog/ matches ‘cats and dogs’

0-or-more: /ab*/ matches ‘a’ ‘ab’ ‘abb’...

1-or-more: /ab+/ matches ‘ab’ ‘abb’ ...

given-number: /ab{2}/ matches ‘abb’ but not ‘ab’ or the whole ‘abbb’ string

Page 10: And Now You Have Two Problems

@lmea

Regexp syntax

greedy matches: /.+cat/ matches ‘the cat is catching a mouse’

lazy matches: /.+?\scat/ matches ‘the cat is catching a mouse’

Page 11: And Now You Have Two Problems

@lmea

Regexp syntax

grouping: /(\d{3}\.){3}\d{3}/ matches IP-like strings

capturing: /a (cat|dog)/ the match is captured in $1 to be used later

non capturing: /a (?:cat|dog)/ no content captured

atomic grouping: /(?>a+)/ doesn’t backtrack

Page 12: And Now You Have Two Problems

@lmea

String substitution

"My cat eats catfood".sub(/cat/, "dog") # => My dog eats catfood

"My cat eats catfood".gsub(/cat/, "dog") # => My dog eats dogfood

"My cat eats catfood".gsub(/\bcat(\w+)/, "dog\\1") # => My cat eats dogfood

"My cat eats catfood".gsub(/\bcat(\w+)/){|m| $1.reverse} # => My cat eats doof

Page 13: And Now You Have Two Problems

@lmea

String parsing

"Codemotion Rome: Mar 20 to Mar 23".scan(/\w{3} \d{1,2}/)# => ["Mar 20", "Mar 23"]

"Codemotion Rome: Mar 20 to Mar 23".scan(/(\w{3}) (\d{1,2})/) # => [["Mar", "20"], ["Mar", "23"]]

"Codemotion Rome: Mar 20 to Mar 23".scan(/(\w{3}) (\d{1,2})/){|a,b| puts b+"/"+a}# 20/Mar# 23/Mar# => "Codemotion Rome: Mar 20 to Mar 23"

Page 14: And Now You Have Two Problems

@lmea

Regexp methods

if "what a wonderful world" =~ /(world)/

puts "hello #{$1.upcase}" end# hello WORLD

if /(world)/.match("The world") puts "hello #{$1.upcase}" end# hello WORLD

match_data = /(world)/.match("The world")puts "hello #{match_data[1].upcase}" # hello WORLD

Page 15: And Now You Have Two Problems

@lmea

Rails app examples

# in routing

match 'path/:id', :constraints => { :id => /[A-Z]\d{5}/ }

# in validations

validates :phone, :format => /\A\d{2,4}\s*\d+\z/

validates :phone, :format => { :with=> /\A\d{2,4}\s*\d+\z/ }

validates :phone, :format => { :without=> /\A02\s*\d+\z/ }

Page 16: And Now You Have Two Problems

@lmea

Rails examples

# in ActiveModel::Validations::NumericalityValidatordef parse_raw_value_as_an_integer(raw_value) raw_value.to_i if raw_value.to_s =~ /\A[+-]?\d+\Z/end

# in ActionDispatch::RemoteIp::IpSpoofAttackError# IP addresses that are "trusted proxies" that can be stripped from# the comma-delimited list in the X-Forwarded-For header. See also:# http://en.wikipedia.org/wiki/Private_network#Private_IPv4_address_spacesTRUSTED_PROXIES = %r{ ^127\.0\.0\.1$ | # localhost ^(10 | # private IP 10.x.x.x 172\.(1[6-9]|2[0-9]|3[0-1]) | # private IP in the range 172.16.0.0 .. 172.31.255.255 192\.168 # private IP 192.168.x.x )\.}x

WILDCARD_PATH = %r{\*([^/\)]+)\)?$}

Page 17: And Now You Have Two Problems

@lmea

Regexps are dangerous

"If I was going to place a bet on something about Rails security, it'd be that there are more regex vulnerabilities in the tree. I am uncomfortable with how much Rails leans on regex for policy decisions."

Thomas H. Ptacek (Founder @ Matasano, Feb 2013)

Page 18: And Now You Have Two Problems

@lmea

Tip #1Beware of nested quantifiers

/(x+x+)+y/ =~ 'xxxxxxxxxy'

/(xx+)+y/ =~ 'xxxxxxxxxx'

/(?>x+x+)+y/ =~ 'xxxxxxxxx'

Page 19: And Now You Have Two Problems

@lmea

Tip #2Don’t make everything optional

/[-+]?[0-9]*\.?[0-9]*/ =~ '.'

/[-+]?([0-9]*\.?[0-9]+|[0-9]+)/

/[-+]?[0-9]*\.?[0-9]+/

Page 20: And Now You Have Two Problems

@lmea

Tip #3Evaluate tradeoffs

/\b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,4}\b/

/(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"

.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*))*)?;\s*)/

Page 21: And Now You Have Two Problems

@lmea

Tip #4Capture repeated groups and don’t repeat a captured group/!(abc|123)+!/ =~ '!abc123!'# $1 == '123'

/!((abc|123)+)!/ =~ '!abc123!'# $1 == 'abc123'

Page 22: And Now You Have Two Problems

@lmea

Tip #5use interpolation with care

str = "cat"

/#{str}/ =~ "My cat eats catfood"

/#{Regexp.quote(str)}/ =~ "My cat eats catfood"

Page 23: And Now You Have Two Problems

@lmea

Tip #6Don’t use ^ and $ to match the strings beginning and end

validates :url, :format => /^https?/

"http://example.com" =~ /^https?/

"javascript:alert('hello!');%0Ahttp://example.com"

"javascript:alert('hello!');\nhttp://example.com" =~ /^https?/

"javascript:alert('hello!');\nhttp://example.com" =~ /\Ahttps?/

Page 24: And Now You Have Two Problems

@lmea

From 060bb7250b963609a0d8a5d0559e36b99d2402c6 Mon Sep 17 00:00:00 2001

From: joernchen of Phenoelit <[email protected]>Date: Sat, 9 Feb 2013 15:46:44 -0800Subject: [PATCH] Fix issue with attr_protected where malformed input could circumvent protection

Fixes: CVE-2013-0276--- activemodel/lib/active_model/attribute_methods.rb | 2 +- activemodel/lib/active_model/mass_assignment_security/permission_set.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/activemodel/lib/active_model/attribute_methods.rb b/activemodel/lib/active_model/attribute_methods.rbindex f033a94..96f2c82 100644--- a/activemodel/lib/active_model/attribute_methods.rb+++ b/activemodel/lib/active_model/attribute_methods.rb@@ -365,7 +365,7 @@ module ActiveModel end @prefix, @suffix = options[:prefix] || '', options[:suffix] || ''- @regex = /^(#{Regexp.escape(@prefix)})(.+?)(#{Regexp.escape(@suffix)})$/+ @regex = /\A(#{Regexp.escape(@prefix)})(.+?)(#{Regexp.escape(@suffix)})\z/ @method_missing_target = "#{@prefix}attribute#{@suffix}" @method_name = "#{prefix}%s#{suffix}" enddiff --git a/activemodel/lib/active_model/mass_assignment_security/permission_set.rb b/activemodel/lib/active_model/mass_assignment_security/permission_set.rbindex a1fcdf1..10faa29 100644--- a/activemodel/lib/active_model/mass_assignment_security/permission_set.rb+++ b/activemodel/lib/active_model/mass_assignment_security/permission_set.rb@@ -19,7 +19,7 @@ module ActiveModel protected def remove_multiparameter_id(key)- key.to_s.gsub(/\(.+/, '')+ key.to_s.gsub(/\(.+/m, '') end end -- 1.8.1.1

Page 25: And Now You Have Two Problems

@lmea

From 99123ad12f71ce3e7fe70656810e53133665527c Mon Sep 17 00:00:00 2001From: Aaron Patterson <[email protected]>Date: Fri, 15 Mar 2013 15:04:00 -0700Subject: [PATCH] fix protocol checking in sanitization [CVE-2013-1857]

Conflicts: actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb--- .../action_controller/vendor/html-scanner/html/sanitizer.rb | 4 ++-- actionpack/test/template/html-scanner/sanitizer_test.rb | 10 ++++++++++ 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb b/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rbindex 02eea58..994e115 100644--- a/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb+++ b/actionpack/lib/action_controller/vendor/html-scanner/html/sanitizer.rb@@ -66,7 +66,7 @@ module HTML # A regular expression of the valid characters used to separate protocols like # the ':' in 'http://foo.com'- self.protocol_separator = /:|(&#0*58)|(&#x70)|(%|&#37;)3A/+ self.protocol_separator = /:|(&#0*58)|(&#x70)|(&#x0*3a)|(%|&#37;)3A/i # Specifies a Set of HTML attributes that can have URIs. self.uri_attributes = Set.new(%w(href src cite action longdesc xlink:href lowsrc))@@ -171,7 +171,7 @@ module HTML def contains_bad_protocols?(attr_name, value) uri_attributes.include?(attr_name) &&- (value =~ /(^[^\/:]*):|(&#0*58)|(&#x70)|(%|&#37;)3A/ && !allowed_protocols.include?(value.split(protocol_separator).first.downcase))+ (value =~ /(^[^\/:]*):|(&#0*58)|(&#x70)|(&#x0*3a)|(%|&#37;)3A/i && !allowed_protocols.include?(value.split(protocol_separator).first.downcase.strip)) end end enddiff --git a/actionpack/test/template/html-scanner/sanitizer_test.rb b/actionpack/test/template/html-scanner/sanitizer_test.rbindex 4e2ad4e..dee60c9 100644--- a/actionpack/test/template/html-scanner/sanitizer_test.rb+++ b/actionpack/test/template/html-scanner/sanitizer_test.rb@@ -176,6 +176,7 @@ class SanitizerTest < ActionController::TestCase %(<IMG SRC="jav&#x0A;ascript:alert('XSS');">),

Page 26: And Now You Have Two Problems

@lmea

ToolsPrint a cheatsheet!

Info:

http://www.regular-expressions.info

Debug:

http://rubular.com

http://rubyxp.com

Visualize:

http://www.regexper.com/

Page 27: And Now You Have Two Problems

Thank you!