to bug or not to bug: lessons learned developing bug-compatible software 22 june 2005 eric keiter...

29
To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia National Laboratories Albuquerque, NM, USA Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000.

Upload: delphia-dorsey

Post on 17-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

To bug or not to bug: Lessons learned developing

bug-compatible software

22 June 2005

Eric Keiter

Dept. 9237: Electrical and Microsystems Modeling

Sandia National Laboratories

Albuquerque, NM, USA

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.

Page 2: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Outline

Context What is circuit simulation? What is Xyce? What do you mean, “bug-compatible”?

Lessons learned

Development tools: Bugzilla, Bonsai, CVS, autotools, etc.

The release process

Page 3: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Analog Circuit Simulation (SPICE or Xyce)

System of Coupled Differential Algebraic Equations (DAEs)

Kirchoff’s Laws, Arbitrary Network Formulation: Modified Nodal Analysis.

(modified KCL) Most equations are Kirchoff Current Law

(KCL) equations. Most solution variables are nodal voltages. Most currents are obtained via an Ohm’s

law relationship.

Nonlinear system of equations: Implicit time integration Newton’s method for the nonlinear solve at

each time step. Linear system of equations (Ax=b) solved

at each Newton iteration.

Kirchoff’s Current Law (KCL)

Kirchoff’s Voltage Law (KVL)

Page 4: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Simulation Hierarchy

Analog circuit simulation is just one of many types of simulation used by electrical designers.

Tradeoff between fidelity and speed/problem size. Digital (VHDL) simulation: very fast, but low

fidelity TCAD Device simulation: very slow, but very

high fidelity. Circuit simulation: in-between.

Digital/Digital/Mixed-SignalMixed-SignalSimulationSimulation

Analog (ODE)Analog (ODE)SimulationSimulation

Device-Scale(PDE)Device-Scale(PDE)SimulationSimulation

SoftwareSoftware 0110010101100101

Speed Fidelity

Digital (VHDL)Analog

(like SPICE)Xyce

TCAD DeviceCharonBoard and

Device Parasitics

Page 5: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

History of analog circuit simulation (SPICE)

SPICE = Simulation Program with Integrated Circuit Emphasis. Developed at UC-Berkeley.

1969: Class project, called CANCER = Computer Analysis of Nonlinear Circuits, Excluding Radiation. 6000 lines of Fortran.

1971: SPICE 1. Released into public domain. 1975: SPICE 2. PhD thesis of Larry Nagal. 8000 lines of Fortran. 1989: SPICE 3. PhD thesis of Tom Quarles. 135,000 lines of C.

Xyce, excluding solver libraries, is ~400,000 lines of C++. Progress! SPICE was popular because it was public domain, and it was quickly

adopted by universities, and by industry. SPICE became an industry standard for analog circuit simulation.

Around 1990, most circuit simulation innovation became commercialized, mostly with codes built on the SPICE3 engine. Pspice, Hspice, SmartSpice. Sporty Spice, Scary Spice, Posh Spice

More recently, EDA companies have been buying each other, and retiring a lot of SPICE-related products.

Page 6: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

More Miscellaneous Context

10 years ago, conventional wisdom: iterative matrix solvers don’t work for circuits. wrong!

In 2000, there were over 500 PSpice licenses at Sandia.

Many SPICE models are terrible; part of the standard. Simplifications based on 1960’s computers. Discontinuities, bad derivatives.

As SPICE is a design tool, a bad model can be OK. Sometimes, a designer just wants a generic behavior, rather

than a precise one. “bug compatible” is expected.

Page 7: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

One Perspective on SPICE

Q: If you could go back over your career and change something within your field, what would you change?

Pease: “I’d shoot the guys who were going to invent SPICE.”

Robert A. Pease, ad insert from EDN “Movers and Shakers” 2003 supplement.

• Taken from a talk by Larry Nagal, the inventor of SPICE

Page 8: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Xyce Requirements

Xyce is an ASC project, started in 1999. Requirements: SPICE-compatible Massively parallel Include special Sandia physics models (ie radiation) Useful to unsophisticated users

Another way to put it: “Be the same as SPICE, but also be much better” “I never want to see the “Time step too small” error again” –

Ken Marx, Sandia Electrical Analyst

Page 9: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Xyce Status, circa 2005

We’ve mostly succeeded: SPICE-compatible (sort of) Radiation models Demonstrated Parallel

scalability on “real” problems (see chart)

Real customers using Xyce on real DSW problems.

We still have a lot to do!

NWCC/Spirit Simulation Time

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 50 100 150

Processors

Time(s)

Photograph of Permafrost ASIC (PA)

PA Scaling study (~800,00 devices)

Page 10: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Number of Processors

Sca

led

Spe

ed

0

128

256

384

512

640

768

896

1024

0 128 256 384 512 640 768 896 1024

Optimal Speedup

Total Solution

Scaled Problem Speedup14,000 Devices/Processor

Largest Analog Circuit Calculation Ever Performed !

XyceTM massively parallel circuit analysis code

Nonlinear transmission line problem

14 million electrical devices modeled

6 million unknowns Over factor of 500

speedup using 1024 processors of LLNL’s ASCI White supercomputer 14,000 electrical devices

per processor

Page 11: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

How did we do it? Lessons learned

Hire good people Get everyone on the same page.

Difficult, but REALLY important!

Open mind about legacy (ie SPICE) codes Support the low end as well as the high end

Most Xyce simulations are not massive.

Use development tools (within reason) Bugzilla, Bonsai, UML, etc. Regression testing: nightly, weekly, monthy

Rigorous requirement-tracking, issue-tracking, and bug-tracking.

Formal release process

Page 12: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Lesson: Hire good people

“Hire the best people you can and then get out of their way!” – Bill Camp, circa 1993.

This may seem obvious, but in practice is difficult. Look for:

Lots of programming experience, even if not SPICE (or whatever) Programs for fun, has done so for years. (maybe as a kid) Background in numerical programming

Handholding a bad developer is (eventually) a waste of time.

One great developer is worth 5 lousy developers.

Page 13: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Lesson: Get everyone on the same page

Note: Make sure you have a page. Good people often disagree Good people, when they are new, are going to do

“dumb” things. This includes everyone, including the best people. Don’t get upset - plan for it! Have an established code-development “business model” Interact with new developers a LOT. Correct misconceptions early. Have documentation ready: developer guide, theory guide,

etc.

This is the most important issue.

Page 14: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

The Same Page: Coding standard/guidelines

This is the first issue for Xyce where getting all the developers to agree was difficult. C++ or C-style comments? Philosophy of pointers and/or references? Curly bracket usage? Standard class and function comment headers. Should we use STL? (ASC Red didn’t support it) Directory structure?

The important thing was not so much WHAT we chose, but that we chose SOMETHING and STUCK TO IT.

Many of these things are a matter of taste.

In general, to achieve consensus, final choices were: Acceptably Imperfect Something everyone was willing to live with

Page 15: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

The Same Page:Plan for new developers in the code.

Some parts of the code benefit a LOT from a very sophisticated structure Especially modules with just one developer (Xyce topology)

However, be careful: Some parts of the code will be touched by many developers,

with varying skill levels Xyce Device module (>10 developers since 2000)

Continual refactoring and abstracting in the multi-developer sections can be problematic

In my initial development of the Xyce device package, part of that development included getting other developers working in it, giving me feedback. This included (among others) 2 undergrad student interns.

Page 16: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Lesson: Open mind about legacy code

Initially, we thought all of SPICE was lousy Slamming it was popular (by users and developers)

“I’d shoot the people who were going to invent SPICE” Much of it looked strange, “out of the mainstream” The last “free” version of Spice came out in the late 1980’s.

Reality: some of it was lousy, some of it very clever. Kundert Sparse direct solver Voltage limiting

As the project has evolved, we’ve referred to the Spice3 source a lot. It is the standard we are attempting to emulate, after all. To an extent, we have to be “bug-compatible”!

Page 17: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Prototype

Make your initial mistakes in a code that you plan to throw away.

Summer, 2000 - Xyce prototype was written. Some of the structure was retained for Xyce Some was thrown away.

Page 18: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Developer tools

Rational Rose - UML

CVS/ Bonsai

Bugzilla

Regression testing

Certification testing

Page 19: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Xyce UML Diagrams

Package DiagramPackage Diagram

Class DiagramClass Diagram

“Abstract Factory”Design Pattern

“Abstract Factory”Design Pattern

C++ CodeC++ Code

We used Rational Rose for initial code generation.

For a larger project, round-trip engineering may have been worth doing - for us, it wasn’t.

One danger - spending months developing a UML diagram before attempting code generation - result: un-compilable code.

Page 20: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

CVS/Bonsai

Like with most codes, configuration management is very important.

We use CVS/Bonsai.

Hyperlinked to bugzilla.

Page 21: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Bugzilla

We use Bugzilla extensively. Over 700 issues since 2000. We use it for tracking all sorts of project planning - it is

not just for user reported bugs. Most bugzilla issues are entered by developers

A typical usage: In FY05, we’ve used it to track all the development neede for

our Advanced Development (AD) project. One “blocker” issue for AD, that sits on top of a large number of

specific issues. (like “develop the SOI device model”)

Bugzilla plays a major, major role in our release process.

Page 22: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

Testing: Regression and Certification

Regression testing - last chance “safety net” Nightly tests Weekly tests 49 different builds

Certification testing - used for release process Each test tied to a specific bugzilla issue Many certification tests ultimately become

regression tests later, after a release.

Page 23: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

The release process

Page 24: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

The release process

Planning: identify features, and bugzilla issues for this release.

QA: Every fixed issue must have a certification test to start QA, demonstrating the issue.

The code in “QA” until every certification test, for every support platform, passes.

If any test fails, we drop out of QA, fix code, and restart QA, from the beginning.

Production builds: created if Xyce gets through QA w/o failure, for all platforms.

Certification: Signed by a manager.

Regression: Certification tests are migrated to regression tests, post-release.

Page 25: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

The release process: Planning

The Planning phase happens first.

Developer meeting.

Decisions made: Which bugzilla issues will be fixed (addressed) for the release,

and which ones will be deferred? Which issues are “deal killers”, and which ones are

expendable? Who is assigned to each issue? (we usually know already, but

sometimes assignments change) What date do we freeze/branch the code?

For each issue, developers commit to produce a “certification test” that proves the issue is fixed and/or addressed.

These certification tests are used in QA.

Note: Of course, some issues can’t easily be tested - those issues simply get documented. (see Dilbert cartoon)

Page 26: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

The release process: Branching

The Branching the code is crucial.

On the freeze date, tag and branch to create a “release branch”.

From that point onward: New features are only developed in the “development (main) branch” The release branch will be changed minimally, for small bugfixes.

Later in the release process, we sometimes branch the code again, to plan for patch releases.

After the release process is over, the branches are merged back together.

Page 27: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

The release process: QA

The QA phase dominates the release process, in terms of time.

Typically, we plan for 3 rounds of QA.

In our last release, we needed 6 (too many!)

We typically plan for each QA round to take 2 weeks.

Once the code is in a round of QA, it should be completely frozen.

If the code needs to be fixed (certification test fails): the current round of QA stops. the release branch code is fixed, and QA restarts, from the beginning.

QA is complete only when the code can pass every test, without any code modifications.

In general, the release branch code should change minimally over all the QA cycles.

Page 28: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

The release process: Post-QA

Once we get through a round of QA with no certification test failures, we can move to the next phase(s).

Code is branched again, to plan for possible “patch” release.

Production builds

Documentation: Users/Reference guides, Release notes.

Certification: This includes setting up and filing paperwork, which requires getting signatures from: Manager HPEMS PI Technical lead Testing team lead

Announcement/Distribution

Release branch is merged back into the development branch. If the release process has taken a long time (months), this is a big job.

Certification tests are migrated to regression tests, within reason.

Page 29: To bug or not to bug: Lessons learned developing bug-compatible software 22 June 2005 Eric Keiter Dept. 9237: Electrical and Microsystems Modeling Sandia

The End