the sound of one system crashing deploying sakai at indiana university david goodrum lance speelmon...
TRANSCRIPT
The Sound of One System Crashing
Deploying Sakai at Indiana University
David GoodrumLance SpeelmonBarry Walsh
Indiana University
2
AmphiHerpin
• Description: Lessons (some hard) from the implementation of Sakai 2.4 at 8 campuses. Technical, application and communication issues stressed the Teaching and Learning, Infrastructure, Application and Support organizations over a five day period at the beginning of the fall semester 2007.
3
Good Headline?
4
The Week from Hell
Monday Tuesday Wednesday
5
It’s all Relative
•August 98.84%; 12 outages
•September 99.69%; 4 outages
•October 100%; 0 Outages
•November 99.92%; 1 outage
•December 99.67%; 1 outage
6
It was the worst/best of times etc.
• Faculty and students were in the ditch;• Learning Technologies staff were almost
powerless because of the technical problems;
But….• Many people performed in outstanding
ways– Communication!!– Problem determination– Reaching out to the Sakai community
7
Problem Determination:
• Onion Peeling• Each layer revealed another
problem/bottleneck• Race not to ‘the swift’
– No substitute for analytical ability;– Cool heads were at a premium!
8
The Main Culprits
• Less than adequate load testing• Connection pooling software (DBCP)• App Server settings
9
Proposed Enhancements Deliverables Timeline Status/Next Steps
System: Database memory upgraded Reduction in system response time August 2007 Completed
System: Periodic extract of critical OnCourse production data (as established by the OnCourse Priorities Committee) deployed in a separate environment, allowing faculty and staff to conduct course management activities in the face of OnCourse service interruption. The data can be exported for analysis and manual changes but these changes will not be reflected in OnCourse.
Parameter driven query in the OneStart service tab to link to extract of critical production data from OnCourse.
Nov 26 Planning meetingDetermine frequency of extractPilot implementationFull implementation
System: Re-provision the test environment to replicate the production environment during service interruptions in the production environment, providing read-only access to critical functions for a limited subset of users.
Limited access to critical functions in fully replicated environment during system interruptions.
Phase I: Environment configuration, planning and testingPhase II: Move replicated environment to alternative campus
System: Research and develop additional monitoring and alarming systems
Early alerting to system technologists of possible problems
[Status/next steps]
System: Research and develop additional load testing and profiling software
Robust testing before high load periods [Status/next steps ]
Procedural: Work with faculty to determine critical functionality
List of priority functions during critical times November 2007 Completed
Procedural: Development of training on how faculty can prepare for and work around technologically-based interruptions
Training and materials developed by the campus centers for teaching and learning December 2007
Procedural: Provide ongoing videoconferencing for system outages and critical troubleshooting
UITS Collaboration Polycom video bridge with phone access
October 2007 Completed
Communications: Document UITS Support Communications Process
Process documentation November 2007 Drafted [can we provide?]Completed and Posted
Communications: Develop targeted mailing lists to facilitate quicker turn around on notification to impacted communities.
More comprehensive pre-developed notification lists.
December 2007
Communications: Research and develop additional notification mechanisms, e.g., information posted on web splash screen.
Post system outages/impacts on relevant web splash screens;Other (?)
[Status/next steps]
10
Bad Timin
g
Barry
What is the suggestions enhancement process at IU?
https://oncourse.iu.edu/access/content/user/ocadmin/developmentcl.htm
Oncourse Enhancements Process (current)
Suggestions analysis process at IU
Support team responds immediately to support issues and bug reports
Remaining entries compiled monthly, forwarded to Functional Requirements Committee (FRC)
START: Suggestions, questions and comments entered by users
FRC members analyze suggestions for trends & update summary reports
FRC combines suggestions data, opt-out rationales, and other sources into a proposal to the Priorities Committee
Faculty Priorities Committee rank orders priorities for upcoming development