scanning strings at supersonic speed (euruko 2011)
TRANSCRIPT
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 1/46
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 2/46
Scanning
Strings———— at ————
Supersonic
Speed
Scanning
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 3/46
@
murphy
(Kornelius Kalnbach)
[email protected]@murphy_karasu murfy
Everyththat can go
will go
sofatutor.com
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 4/46
Scanning"Lorem ipsum dolor sit amet, consectetur adielit, sed do eiusmod tempor incididunt ut la
dolore magna aliqua. Ut enim ad minim venianostrud exercitation ullamco laboris nisi ut al
ea commodo consequat. Duis aute irure doreprehenderit in voluptate velit esse cillum d
fugiat nulla pariatur. Excepteur sint occaecat cnon proident, sunt in culpa qui officia deseru
anim id est lakorum."
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 5/46
Strings"Lorem ipsum dolor sit amet, consectetur adielit, sed do eiusmod tempor incididunt ut la
dolore magna aliqua. Ut enim ad minim venianostrud exercitation ullamco laboris nisi ut al
ea commodo consequat. Duis aute irure doreprehenderit in voluptate velit esse cillum d
fugiat nulla pariatur. Excepteur sint occaecat cnon proident, sunt in culpa qui officia deserun
anim id est lakorum."
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 6/46
Strings"<!DOCTYPE html><html lang="en"><head><charset="utf-8"><title>Agenda –EuRuKo 20title><link rel="stylesheet" href="/styleshe
screen.css"><link rel="stylesheet" href="fanc
jquery.fancybox-1.3.4.css"><linkrel="altertype="application/atom+xml" title="ATO
feed"href="http://euruko2011.org/feed.atomrel="alternate"type="application/atom+x
title="github feed"…".scan
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 7/46
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 8/46
Strings"class Songdef initialize name, author@name = name@author = author
enddef to_s"this is :#{@name}----#{@author
end
end".scan
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 9/46
Supersonic
3KB
Mach 1.8 = 1900 km/h
2514 pages per second
7.5 MB/s
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 10/46
SpeedWhy?
• Syntax Highlighting
• Parsing (eg. HAML)
• Rite
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 11/46
Shootout
12.283
3.565
11.571
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 12/46
write a supersonic scannerwith pure Ruby code
Target
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 13/46
a fast machine
Resources
♥
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 14/46
big examples
5+ seconds
here: 160 MB
Resources
= 9 times ruby-head
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 15/46
rvm
Resources
rvm.beginr
♥
♥
♥
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 16/46
Time
Endurance
Craziness
Resources
← rvm.beginrescueend.com
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 17/46
General Ideas
C
• avoid convenient APIs
• write everything yourself
• write for the CPU
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 18/46
General Ideas
Ruby
• embrace the core libraries
• write less code
• write for the interpreter
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 19/46
General Ideas
Ruby
• embrace the core libraries
• write less code
• write for the interpreter
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 20/46
String
s = "<head><title>EuRuKo 2011</title>"
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 21/46
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 22/46
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 23/46
String + RegExp
s = "<head><title>EuRuKo 2011</title>"
puts s.scan(/<(\w+)/)
headtitle
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 24/46
String + RegExp
s = "<head><title>EuRuKo 2011</title>"
s.scan(/<(\w+)/) do |tag|puts tag
end
headtitle
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 25/46
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 26/46
StringScanner
Why?
• avoid big RegExp
• control the scan process
• use patterns depending on state
• create patterns on the fly
h k
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 27/46
Benchmark
s = "<head><title>EuRuKo 2011</title>"
s *= 5_000_000
s.scan(/<(\w+)/) do |tag|tag
endsoni
1.81.9
jru
rb
ma
S i S
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 28/46
StringScanner
scanner = StringScanner.new(s)
until scanner.eos?if scanner.scan(/<(\w+)/)tag = scanner[1]
else scanner.getch
endend
soni
1.81.9
jru
rb
ma
St i S M R b
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 29/46
StringScanner on MacRuby
http://www.macruby.org/trac/ticket/938
St i S
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 30/46
StringScanner
scanner = StringScanner.new(s)
until scanner.eos?if scanner.scan(/<(\w+)/)tag = scanner[1]
else scanner.getch
endend
soni
1.81.9
jru
rb
ma
sonic™ 21.2
1.8.7 13.61.9.2 5.4
jruby 4.7
rbx 29.1
mac 34.7
G l Id
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 31/46
General Ideas
Ruby
• embrace the core libraries
• write less code
• write for the interpreter
L C d
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 32/46
Less Code
class HTML
def initialize html @scanner = StringScanner.new(html)end
def scanuntil @scanner.eos?
if @scanner.scan(/<(\w+)/)yield @scanner[1]
...end
end
L C d
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 33/46
Less Code
class HTML
def initialize html @scanner = StringScanner.new(html)end
def scanuntil eos?
if scan(/<(\w+)/)yield self[1]
...end
end
L C d
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 34/46
Less Code
class HTML
def initialize html @scanner = StringScanner.new(html)enddelegate :eos?, :scan, :[]def scanuntil eos?
if scan(/<(\w+)/)yield self[1]
...end
end
Less Code
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 35/46
Less Code
class HTML
def initialize html @scanner = StringScanner.new(html)enddelegate :eos?, :scan, :[]def tokenizeuntil eos?
if scan(/<(\w+)/)yield self[1]
...end
end
Less Code
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 36/46
Less Code
class HTML < StringScanner
def initialize html superend
def tokenizeuntil eos?
if scan(/<(\w+)/)yield self[1]
...end
end
Less Code
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 37/46
Less Code
class HTML < StringScanner
def tokenizeuntil eos?if scan(/<(\w+)/)yield self[1]
...end
end
General Ideas
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 38/46
General Ideas
Ruby
• embrace the core libraries
• write less code
• write for the interpreter
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 39/46
out =Encoder
.new.encodeScanner
.new(in)
• Scanner: simple, 9 rules
• Encoder: does nothing
Single Core
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 40/46
Single Core
out =Encoder
.new.encodeScanner
.new(in)
soni
jru
1.
• Scanner: simple, 9 rules
• Encoder: does nothing
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 41/46
threads = []
in.lines.each_slice 300_000 do |lines|threads << Thread.new do
chunk = lines.joinThread.current[:out] = Encoder.new.encode Scanner
endend
threads.each(&:join)
out = threads.map { |thread| thread[:out] }.joinsoni
jru
1.
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 42/46
threads = []chunk_offsets = [0]in.lines.each_slice slice_size do |lines|chunk_offsets << chunk_offsets.last + lines.join.by
end
chunk_offsets.each_cons(2) do |this_chunk, next_chunkthreads << Thread.new do
chunk = code[this_chunk...next_chunk] Thread.current[:out] = Encoder.new.encode Scannerend
end
threads.each(&:join)
out = threads.map { |thread| thread[:out] }.join
soni
jru
1.
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 43/46
threads = []chunk_offsets = [0]in.lines.each_slice slice_size do |lines|chunk_offsets << chunk_offsets.last + lines.join.by
end
chunk_offsets.each_cons(2) do |this_chunk, next_chunkthreads << Thread.new do
chunk = code[this_chunk...next_chunk] Thread.current[:out] = Encoder.new.encode Scannerend
end
threads.each(&:join)
out = threads.map { |thread| thread[:out] }.join
inp
joi
offs
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 44/46
threads = []chunk_offsets = [0]in.lines.each_slice slice_size do |lines|chunk_offsets << chunk_offsets.last + lines.join.by
end
chunk_offsets.each_cons(2) do |this_chunk, next_chunkthreads << Thread.new do
chunk = code[this_chunk...next_chunk] Thread.current[:out] = Encoder.new.encode Scannerend
end
threads.each(&:join)
out = threads.map { |thread| thread[:out] }.join
soni
2 co
4 co
Questions?
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 45/46
Questions?
not allowed:
• When will CodeRay 1.0 be released?
Thank you!
8/6/2019 Scanning Strings at Supersonic Speed (EuRuKo 2011)
http://slidepdf.com/reader/full/scanning-strings-at-supersonic-speed-euruko-2011 46/46
Thank you!
• @euruko
• @yukihiro_matz
• @bovensiepen
• @heinz_gies
• my girlfriend