recovery-oriented computing user study

26
Recovery-Oriented Computing User Study Training Materials October 2003

Upload: walt

Post on 07-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Recovery-Oriented Computing User Study. Training Materials October 2003. Overview. Informed consent & Introduction User study scenario & your role Training (20 minutes) Two study sessions (30 minutes each) Wrapup and questionnaire. Informed Consent. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recovery-Oriented Computing User Study

Recovery-Oriented ComputingUser Study

Training Materials

October 2003

Page 2: Recovery-Oriented Computing User Study

Slide 2

Overview

• Informed consent & Introduction

• User study scenario & your role

• Training (20 minutes)

• Two study sessions (30 minutes each)

• Wrapup and questionnaire

Page 3: Recovery-Oriented Computing User Study

Slide 3

Informed Consent

• Please read the overview of the study and the informed consent form– please feel free to ask any questions you have

about the experiment, its goals, its procedures, etc.

• If you agree to participate in the experiment, please sign the informed consent form

Page 4: Recovery-Oriented Computing User Study

Slide 4

Introduction

• This study is evaluating new recovery tools – the tools are designed to help system

administrators recover from problems affecting server systems

• You will be playing the role of a system administrator– in each of two sessions, you will be trying to

recover an e-mail server system from a pre-existing problem

Page 5: Recovery-Oriented Computing User Study

Slide 5

Introduction (2)

• In each session, you may (or may not) be given an experimental recovery tool to use

• We are trying to understand when the tool is useful for you and when it is not– so if you are given the tool, please think carefully

about whether or not to use it when you are attempting to recover from a problem» at the end of the session, you will be asked to explain

why you chose to use (or not use) the tool

Page 6: Recovery-Oriented Computing User Study

Slide 6

The Scenario

Page 7: Recovery-Oriented Computing User Study

Slide 7

User Study Scenario

• You are one of several system administrators of an electronic mail (e-mail) service– the administrators work in shifts– the study starts when you arrive for your shift

• You arrive to find users complaining that the e-mail service is not working– you will be provided with details of the complaint– the e-mail failure may be caused by:

» failure of the e-mail software, or» an error made by the administrator on the previous shift

Page 8: Recovery-Oriented Computing User Study

Slide 8

User Study Scenario: Your Role• Your responsibilities and goals:– restore the e-mail service to normal operation

as quickly as possible– minimize the amount of lost e-mail and user

work

• Note:– you should prioritize restoring service over

preserving changes made by other administrators

Page 9: Recovery-Oriented Computing User Study

Slide 9

User Study Scenario: Resources• Resources you will have:– a log of all actions performed by administrators

in previous shifts– a day-old backup of the server’s file systems– the Internet– a test e-mail account– a guru

» during each session, you may make up to one request for help to the guru

• Plus any experimental recovery tool that we provide (described later)

Page 10: Recovery-Oriented Computing User Study

Slide 10

Training: E-mail Server

Page 11: Recovery-Oriented Computing User Study

Slide 11

E-mail Overview

• This study concerns e-mail store servers– e-mail stores receive and store e-mail for their users

» users’ mailboxes live on the e-mail store

– they do not handle sending or routing of outgoing mail

• E-mail stores use two protocols– SMTP: used to deliver incoming e-mail to a mailbox

» SMTP is spoken between a remote server that sends the message, and the local recipient e-mail store server

– IMAP: used to retrieve & manipulate mail in a mailbox» IMAP is spoken between a user’s e-mail client and their

local e-mail store server

Page 12: Recovery-Oriented Computing User Study

Slide 12

E-mail Server Configuration

• Mailboxes are text files in /var/mail, e.g. /var/mail/user173• sendmail: process that receives and delivers incoming e-mail• imapd: process that provides remote access to mailboxes• Mail store configuration files can be found in /etc/mail

E-mail Server (Linux)undovmN.cs.berkeley.edu N={1,2,3}

Mailboxes/var/mail/userNNN

SMTPServerProcess

sendmail

IMAPServer

Process

imapdInternet incominge-mail

Users

readinge-mail

SMTP IMAP

Page 13: Recovery-Oriented Computing User Study

Slide 13

Simple Familiarization Task

• Take some time to get familiar with the console and the e-mail system– by performing a basic task as described below

• Goals:– ensure sendmail is running– reconfigure server to recognize mail sent to

[email protected]– restart sendmail to activate reconfiguration

• First step:– connect to undovm3.cs.berkeley.edu with ssh

continues...

Page 14: Recovery-Oriented Computing User Study

Slide 14

Simple Familiarization Task (2)• Next, check if sendmail is running:– execute the command:

ps ax | grep sendmail

• Reconfigure server to accept new host name:– edit /etc/mail/local-host-names to add the line:

roc.cs.berkeley.edu

• Finally, restart sendmail:– run /etc/init.d/sendmail restart

• Try this task now!

Page 15: Recovery-Oriented Computing User Study

Slide 15

Training: Experimental Recovery

Tool

Page 16: Recovery-Oriented Computing User Study

Slide 16

Recovery Tool: an Undo System• The undo system can undo administrative

changes to the e-mail store, including:– changes to configuration files– software upgrades– deleted or altered files

• It can be used to restore the e-mail server to a previously known-good state– by “rewinding” to a date when the system worked OK

• The undo system preserves incoming e-mail and user mailbox changes

Page 17: Recovery-Oriented Computing User Study

Slide 17

When Can the Undo System Help?• The undo system is useful:– when you cannot tell what is causing a problem

» but you know that the system was working at some point in the past

– when a problem affects system state» typically, the same cases where restoring a backup

would fix the problem

• It does not help when the problem does not affect state– like if a server process (e.g., sendmail) has

crashed cleanly without corrupting state

Page 18: Recovery-Oriented Computing User Study

Slide 18

Why Use the Undo System?

• Unlike using a backup, the undo system also repairs the side effects of problems– example: if a problem caused e-mail to be lost,

using undo to fix the problem will restore the lost e-mail» the undo system does this by recording incoming e-mail

and users’ mailbox edits, then restoring them during recovery

• Undo is also useful when you cannot diagnose a problem– simply undo the system to a point in time when it

was known to be working

Page 19: Recovery-Oriented Computing User Study

Slide 19

Undo System Operation

• An undo cycle has two stages:– rewind: the e-mail system’s state is reverted to the

way it appeared at a past time (the “rewind point”)» all changes to the system made since the rewind point

are undone, including:• changes made by administrators• changes due to software bugs• incoming e-mail delivery and user mailbox edits

– commit: makes the rewind permanent but restores incoming e-mail & user mailbox edits to present time

• Net effect: undo cycle undoes all changesexcept incoming e-mail and mailbox edits

Page 20: Recovery-Oriented Computing User Study

Slide 20

Illustration of Undo Cycle

• Before undo:

• After rewind:

• After commit:

time

Rewind point

admin changes

user events(incoming e-mail, mailbox edits)

time

time

admin changes

user events(incoming e-mail, mailbox edits)

admin changes

user events(incoming e-mail, mailbox edits)

user event

admin change

undone changes

restored user events

note that admin changes remain undone

Page 21: Recovery-Oriented Computing User Study

Slide 21

Controls for the Undo System

• Rewind: begins an undo cycle– defines a rewind point and undoes all later changes– may cause e-mail server to automatically reboot– takes 4 to 5 minutes to execute

• Commit: completes the undo cycle– makes the rewind permanent

» restores incoming e-mail & mailbox edits to present time

– takes about 5 minutes to execute

• Cancel: aborts the undo cycle– restores e-mail server to the state it was in before

rewinding

Page 22: Recovery-Oriented Computing User Study

Slide 22

Undo System Interface

• Main window: normal state» time is divided

into 5-minute intervals

» each interval contains user events like incoming mail

» it’s fastest to rewind to a checkpoint

Timeline(color indicatesrelative load)

Current timeCheckpoints

Current undo status

Intervalscontaining

checkpoints

Intervals

Page 23: Recovery-Oriented Computing User Study

Slide 23

Undo System Interface (2)

• Main window: rewound state

Current undo status

Commit andCancel buttons

Current time (inthe past) indicates

undo point

History of undooperations

Page 24: Recovery-Oriented Computing User Study

Slide 24

Undo System Interface (3)• Event window– used to initiate rewind– to view, double-click on an interval in main window

Selected event(rewind point)

Current time

Click to invokeundo cycle

Description of event(here, user170 is examining their mailbox)Event sequence #

Page 25: Recovery-Oriented Computing User Study

Slide 25

Familiarization, Part II

• Try out the undo system interface– note: actually performing an undo cycle may take 10 or more

minutes to complete

• Familiarize yourself with the various resources available to you during the study– Outlook Express e-mail client– the test e-mail account:

[email protected] N={1,2,3}

– the system backup: /backup– books, documentation, the Internet– guru advice: at most one question per session

Page 26: Recovery-Oriented Computing User Study

Slide 26

Resources for More Information• E-mail in general

– About Internet email protocols http://perl.about.com/library/weekly/aa020600a.htm

– E-mail references: http://www.newt.com/email/references.html

• Sendmail– O’Reilly Sendmail book (next to your workstation)– Sendmail home page: http://www.sendmail.org– SMTP RFC: http://www.isi.edu/in-notes/rfc2821.txt

• IMAPd– IMAP general info: http://www.imap.org/– UW-IMAP home page: http://www.washington.edu/imap/– IMAP RFC: http://www.isi.edu/in-notes/rfc3501.txt