sat and csp competitions & benchmark libraries: some lessons learnt? toby walsh nicta & unsw...

45
SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Post on 19-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

SAT and CSP competitions &

benchmark libraries:some lessons learnt?

Toby WalshNICTA & UNSW

Sydney, Australia

Page 2: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Whats the best way to benchmark systems?

QuickTime™ and a decompressor

are needed to see this picture.

Page 3: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Outline

» Benchmark libraries» Founding CSPLib.org

» Competitions» SAT competition judge» TPTP competition judge» …

Page 4: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Why?

» Why did I set up CSPLib.org» I needed problems against which to benchmark my latest inference techniques

» Zebra and random problems don’t cut it!

» I thought it would help unify and advance the CP community

Page 5: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Random problems

» +ve» Easy to generate» Hard (if chosen from phase transition)

» Impossible to cheat» You can solve 1000 variable random 3SAT problems at l/n=4.2, I’ll be impressed

Page 6: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Random problems

» -ve» Lack structures found in real world» Unrepresentative

» E.g. random 3SAT either have many solutions or none

» Different methods work well on them» Random SAT: forward looking algorithms» Industrial SAT: backward looking algorithms

Page 7: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Why?

» Thesis: every mature field has a benchmark library» Deduction started in 1960s

» TPTP set up in 1993

» SAT started in 1960s» SAT DIMACS challenge in 1992» SATLib set up in 1999

» CP started in 1970s» CSPLib set up in 1998

Page 8: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Why?

» Thesis: every mature field has a benchmark library» Spatial and temporal reasoning started in early 80s (or before?)

» It’s been approximately 30 years so it’s about time you guys set one up!

Page 9: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Benchmark libraries» CSPLib.org

» Over 35k unique visitors

» Still not everything I’d want it to be

» But state of the art for experimentation is now much better than it was» I haven’t seen a zebra for a very long time

QuickTime™ and a decompressor

are needed to see this picture.

Page 10: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

An ideal library

» Desiderata taken from:» CSPLib: a benchmark library for constraints, Proc. CP-99

QuickTime™ and a decompressor

are needed to see this picture.

Page 11: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

An ideal library

» Location» On the web and easy to find

» TPTP.org» CSPLib.org» SATLib.org» QBFLib.org» …» http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsplib.html

» http://mat.gsia.cmu.edu/COLOR/instances.html

Page 12: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

An ideal library

» Easy to use» Tools to make benchmarking as painless as possible

» tptp2X, …

» Diverse» To help prevent over-fitting

Page 13: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

An ideal library

» Large» Growing continuously» Again helps to prevent over-fitting

» Extensible» To new problems or domains

Page 14: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

An ideal library

» Complete» One stop for your problems

» Topical» For instance, it should report current best solutions found

Page 15: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

An ideal library

» Independent» Not tied to a particular solver or proprietary input language

» Mix of difficulties» Hard and easy problems» Solved and open problems» With perhaps even a difficulty index?

Page 16: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

An ideal library

» Accurate» It should be trusted

» Used» A valued resource for the community

Page 17: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Problem format

» Lo-tech or hi-tech?

QuickTime™ and a decompressor

are needed to see this picture.

Page 18: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Lo-tech formats

» DIMACS format used in SATLib

c a simple examplep cnf 3 21 -1 01 2 3 0

This represents: x v -x, x or y or z

Page 19: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Lo-tech formats

» DIMACS format used in SATLib» +ve

» All programming languages can read integers!

» Small amount of extensibility built in (e.g. QBF)

» -ve» Larger extensions are problematic (e.g. beyond CNF to arbitrary Boolean circuits)

Page 20: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Hi-tech formats

» CP competition<instance>

<presentation name="4-queens" description="This problem involves placing 4 queens on a chessboard" nbSolutions="at least 1" format="XCSP1.1 (XML CSP Representation 1.1)"

/> <domains nbDomains="1">

<domain name="dom0" nbValues="4" values="1..4" /> </domains> <variables nbVariables="4"> <variable name="X0" domain="dom0"/>

… </variables>

<relations nbRelations="3"> <relation

name="rel0" domain="dom0 dom0” nbConflicts="10 conflicts="(1,1)(1,2)(2,1)(2,2)(2,3)(3,2)(3,3)(3,4)(4,3)(4,4)" />

… </relations > <constraints nbConstraints="6">

<constraint name="C0" scope="X0 X1" relation="rel0"/>…

Page 21: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Hi-tech formats

» XML» +ve

» Easy to extend» Parsing tools can be provided

» -ve» Complex and verbose» Computers can parse terse structures easily

Page 22: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

No-tech formats

» CSPLib» Problems are specified in natural language» No agreement at that time for an input language

» One focus was on how you model a problem

» Today there is more consensus on modelling languages like Zinc

Page 23: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

No-tech formats

» CSPLib» Problems are specified in natural language

» But you can still provide in one place» Input data» Results» Code» Parsers …

Page 24: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Getting problems

» Submit them yourself» Initially, you must do this so library has some critical mass first time people look at it

» But it becomes tiresome and unrepresentative to do so continually

» Ask at every talk» Tried for several years but it (almost) never worked

Page 25: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Getting problems

» Need some incentive» Offer money?» Price of entry for the competition?» If you have a competition, users will submit problems that their solver is good at?

Page 26: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Competitions

QuickTime™ and a decompressor

are needed to see this picture.

Page 27: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Libraries + Competitions

» You can have a library without a competition» But you can’t have a competition without a library

QuickTime™ and a decompressor

are needed to see this picture.

Page 28: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Libraries + Competitions

» Libraries then competition» TPTP then CASC» Easy and safe!

» Libraries and competition» Planning» RoboCup» …

Page 29: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Increasing complexity

» Constraints» 1st year, binary extensional» 2nd year, limited number of globals» 3rd year, unlimited

» Planning» Increasing complexity» Time, metrics, uncertainty, …

QuickTime™ and a decompressor

are needed to see this picture.

Page 30: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Benefits

» Gets ideas implemented

» Rewards engineering» Progress needs both science and engineering!

» Puts it all together

QuickTime™ and a decompressor

are needed to see this picture.

Page 31: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Benefits

» Gives greater importance to important low-level issues» In SAT:

» Watched literals» VSIDS» …

Page 32: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Benefits

» Witness the progress in SAT» 1985, 10s vars» 1995, 100s vars» 2005, 1000s vars» …» Not just Moore’s law at play!

Page 33: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Pitfalls

» Competitions require lots of work» Organizers get limited (academic) reward

» One solution is to organize also competition special issues

QuickTime™ and a decompressor

are needed to see this picture.

Page 34: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Pitfalls

» Competitions encourage incremental improvements» Don’t have them too often!

» You may discover a local minimum» E.g. MDPs for speech recognition» Give out best new solver prize?

Page 35: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

The Chaff story

» Industrial problems, SAT & UNSAT instances» 2008, 1st MiniSAT (son of zChaff)» 2007, 1st RSAT (son of MiniSAT)» 2006, 1st MiniSAT» 2005, 1st SatELite GTI

(MiniSAT+preprocessor)» 2004, 1st zChaff (Forklift from 2003 was

better)» 2003, 1st Forklift» 2002, 1st zChaff

Page 36: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Other issues

» Man-power» Organizers

» One is not enough?

» Judges» All rules need interpretation

» Compute-power» Find a friendly cluster

QuickTime™ and a decompressor

are needed to see this picture.

Page 37: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Other issues

» Multiple tracks» SAT/UNSAT» Random/industrial/crafted» …» Certificate/Uncertificated

Page 38: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Other issues

» Holding problems back if possible» Release some problems so competitors can ensure solver compliance

» But hold most back so competition is blind!

Page 39: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Other issues

» Multiple phases» Too many solvers for all to compete with long timeouts

» First phase to test correctness » Second phase to throw out the slow solvers (who cost you many timeouts)

» Third phase to differentiate between better solvers

Page 40: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Other issues

» Reward function» <#completed, average time, …>» solution purse + speed purse

» Points for each problem divided between those solvers that solve it

» Getting buy in from competitors» It will (and should) evolve over time!

Page 41: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Other issues

» Prizes» Give out many!» Good for people’s CVs

» Good motivator for future years

QuickTime™ and a decompressor

are needed to see this picture.

Page 42: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Other issues

» Open or closed source?» Open to share progress» Closed to get the best

» Last year’s winner» Condition of entry» To see progress is being made!

Page 43: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Other issues

» Smallest unsolved problem» Give a prize!

» Timing» Run during the conference» Creates a buzz so people enter next year» Get a slot in program to discuss results» Get a slot in banquet to give out prizes

Page 44: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Conclusions

» Benchmark libraries» When an area is several decades old, why wouldn’t you have one?

» Competitions» Designed well, held not too frequently, & with buy-in from the community, why wouldn’t you?

Page 45: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia

Questions

» Disagreements» Other opinions» Different experiences

» …QuickTime™ and a

decompressorare needed to see this picture.