anthony molinaro, openx, erlang la meetup slides
DESCRIPTION
Knowing Your OptionsWhat a micro optimization exercise taught me about Ports, NIFs, and RE2From the first Erlang LATRANSCRIPT
Knowing Your OptionsWhat a micro optimization exercise taught me about
Ports, NIFs, and RE2
Wednesday, June 8, 2011
Introductions
• Me (https://github.com/djnym)
• OpenX (http://openx.org/)
Wednesday, June 8, 2011
The Problem
• General
• Given a list of patterns and a string determine if the string matches one of the patterns
• Specifically
• IAB Spiders and Bots check of User Agent
Wednesday, June 8, 2011
Current Solution• Implemented in Java
• 324 alternates in a large pattern
• each segment in pattern is basically a substring match
• there are a couple of ‘^’ and other regex pieces, not too many, but enough to want to leave this as a regex
• case insensitive match
Wednesday, June 8, 2011
Exampleindy\\+library|infolink|inktomi search|inktomi\\+search|internet ninja|internet\\+ninja|internetseer|inverse ip
insight|inverse\\+ip\\+insight|isilo|jakarta|jobo|justview|keynote|kilroy|larbin|libwww-perl|linkbot|linkchecker|
linklint|linkscan|linkwalker|lisa|^lwp|lydia|magus bot|magus\\+bot|mediapartners-google|mfc_tear_sample|microsoft scheduled cache content download service|microsoft url
control|microsoft\\+scheduled\\+cache\\+content\\+download\\+service|microsoft\\+url\\+control|minuteman|
miva|mj12bot|mobipocket webcompanion|mobipocket\\+webcompanion|monitor|monster|mozilla/5\\.0 \\
(compatible; msie 5\\.0\\)|Wednesday, June 8, 2011
Try 1 : re module
• Precompile the large pattern of alternates using re:compile/2
• Use re:run/3 to match
Wednesday, June 8, 2011
Try 1 : Code 1
Wednesday, June 8, 2011
Try 1 : Code 2
Wednesday, June 8, 2011
Try 1 : Code 3
Wednesday, June 8, 2011
Try 1 : Results• Poor!
1> re_test:test_all("ua.10000").Processed 10000 resulting in 100 matches and9900 nomatchesRE Alternates : 69341006 : 6934.100600 micros avgok
• about 7 ms per call (70 seconds for 10000)
• about 2x current overhead of component
Wednesday, June 8, 2011
Try 2 : perl port
• Curious about perl performance, implemented a simple program to run alternate pattern using perl, it ran really fast, so decided to turn it into a port
Wednesday, June 8, 2011
Try 2 : Code 1
Wednesday, June 8, 2011
Try 2 : Code 2
Wednesday, June 8, 2011
Try 2 : Code 3
Wednesday, June 8, 2011
Try 2 : Code 4
Wednesday, June 8, 2011
Try 2 : Code 5
Wednesday, June 8, 2011
Try 2 : Code 6
Wednesday, June 8, 2011
Try 2 : Results• Better
1> re_test:test_all("ua.10000").Processed 10000 resulting in 100 matches and9900 nomatchesPerl Server : 8151691 : 815.169100 micros avgok
• about 815 micro seconds per call (8.15 seconds for 10000)
Wednesday, June 8, 2011
Try 3 : re module again
• Wanted to sanity check my use of re module and see if separate patterns and regexes would improve performance
Wednesday, June 8, 2011
Try 3 : Code 1
Wednesday, June 8, 2011
Try 3 : Code 2
Wednesday, June 8, 2011
Try 3 : Results• Better Still?
1> re_test:test_all("ua.10000").Processed 10000 resulting in 100 matches and9900 nomatchesRE List : 7776324 : 777.632400 micros avgok
• about 777 micro seconds per call (7.77 seconds for 10000)
Wednesday, June 8, 2011
Try 4 : re2 NIF
• From the re2 website (http://code.google.com/p/re2/)
"Backtracking engines are typically full of features and convenient syntactic sugar but can be forced into taking exponential amounts of time on even small inputs. RE2 uses automata theory to guarantee that regular expression searches run in time linear in the size of the input."
• NIF available (https://github.com/tuncer/re2.git)
Wednesday, June 8, 2011
Try 4 : Code 1
Wednesday, June 8, 2011
Try 4 : Results• Awesome!
1> re_test:test_all("ua.10000").Processed 10000 resulting in 100 matches and9900 nomatchesRE2 Alternates : 265289 : 26.528900 micros avgok
• about 26 micro seconds per call (265 milliseconds for 10000)
Wednesday, June 8, 2011
But...
• larger lists required upping the maximum memory used from 8MB to 32MB for large lists (1800+ elements)
• less regex syntax, no backreferences, no zero width look aheads
Wednesday, June 8, 2011
Questions and Links
• http://trapexit.org/Reading_Lines_from_a_File
• http://trapexit.org/Writing_an_Erlang_Port_using_OTP_Principles
• https://github.com/tuncer/re2.git
• http://code.google.com/p/re2/
Wednesday, June 8, 2011