chapter 12 system management understanding operating systems, fourth edition

Chapter 12System Management

Understanding Operating Systems, Fourth Edition

Understanding Operating Systems, Fourth Edition 2

ObjectivesObjectives

You should be able to describe:

• The tradeoffs to be considered when attempting to improve overall system performance

• The roles of system measurement tools such as positive and negative feedback loops

• Two system monitoring techniques

• The importance of sound accounting practices by system administrators

• The fundamentals of patch management


Evaluating an Operating SystemEvaluating an Operating System

• To evaluate an operating system, you need to understand:– Its design goals and history– How it communicates with users– How resources are managed– Tradeoffs made to achieve goals

• An operating system’s strengths and weaknesses need to be weighed in relation to: – Users– Hardware– Purpose


Cooperation Among ComponentsCooperation Among Components

• Performance of any one resource depends on performance of other system resources

• Any system improvement can be made only after extensive analysis of:– Needs of the system’s resources, requirements,

managers, and users• System changes often result in trading one set of

problems for another• Consider performance of entire system and not just

individual components


Role of Memory ManagementRole of Memory Management

• Before making memory related changes, consider actual system operating environment

• There’s a tradeoff between memory use and CPU overhead– As memory management algorithms grow more

complex, CPU overhead increases and overall performance can suffer

– Some operating systems perform remarkably better with additional memory


Role of Processor ManagementRole of Processor Management

• Multiprogramming requires synchronization among Memory Manager, Processor Manager, and I/O devices– Tradeoff: Better use of CPU versus increased

overhead, slower response time, and decreased throughput


Role of Processor ManagementRole of Processor Management (continued)(continued)

• System saturation point could be reached if CPU is fully utilized but allowed to accept additional jobs– Results in higher overhead and less time to run

programs

• Under heavy loads, CPU time required to manage I/O queues could dramatically increase time required to run jobs

• With long queues forming at channels, control units, and I/O devices, CPU could be idle waiting for processes to finish I/O


Role of Device ManagementRole of Device Management

• Ways to improve I/O device utilization include buffering, blocking, and rescheduling I/O requests to optimize access times– Tradeoffs: Increased CPU overhead and additional

memory space used• Blocking reduces number of physical I/O requests,

but increases overhead• Buffering helps CPU match slower speed of I/O

devices but requires memory space– Tradeoff: Reduced multiprogramming versus better

use of I/O devices


Role of Device ManagementRole of Device Management (continued)(continued)

• Rescheduling requests helps optimizing I/O times– Overhead function– Speed of both CPU and I/O device must be weighed

against time to execute reordering algorithm



Table 12.1: System with three CPUs and four disk drives of different speeds. Assuming system requires 1,000 instructions to reorder I/O requests, advantages of reordering vary depending on combination of CPU and disk.



Figure 12.1: Combination of CPU 1 and Disk Drive A without reordering

A system consisting of CPU 1 and Disk Drive A has to access Track 1, Track 9, Track 1, and then Track 9, and the arm is already located at Track 1.

Without reordering, data access requires: 35 + 35 + 35 = 105 ms,



Figure 12.2: Combination of CPU 1 and Disk Drive A with reordering

After reordering, the arm can perform both accesses on Track 1 before traveling, in 35 ms, to Track 9

With reordering, data access requires: 35 + 30 = 65 ms



• Reordering requests aren’t always warranted– Consider CPU 1 and much faster Disk Drive C

• Without reordering, access time: 5 + 5 + 5 = 15 ms

• With reordering, access time: 5 + 30 = 35 ms

• Reordering algorithm is either always on or always off– Can’t be changed by systems operator without

reconfiguration– Initial setting must be determined by evaluating

system on the average


Role of File ManagementRole of File Management

• Secondary storage allocation schemes help user organize and access files on system– Different schemes offer different flexibility, but

tradeoff for increased file flexibility is increased CPU overhead

• Example: Accessing all records of a file stored noncontiguously could be time-consuming and require compaction, which takes CPU time

• Volume’s directory location affect retrieval time• File management is closely related to device on

which files are stored


Role of File ManagementRole of File Management (continued) (continued)

Table 12.2: A system with four disk drives of different speeds and a CPU speed of 1.2 ms

If file’s directory is loaded into memory, access speed affects only initial retrieval and not subsequent retrievals


Role of Network ManagementRole of Network Management

• The Network Manager– Routinely synchronizes the load among remote

processors– Tries to select most efficient communication paths

over multiple data communication lines– Allows network administrator to monitor use of

individual computers and shared hardware– Ensures compliance with software license

agreements– Simplifies updating data files and programs on

networked computers


Measuring System PerformanceMeasuring System Performance

• Total system performance can be defined as efficiency with which a computer system meets its goals

• System efficiency is not easily measured – Affected by three major components: user programs,

operating system programs, and hardware

• System performance can be very subjective and difficult to quantify


Measurement ToolsMeasurement Tools

• Most designers and analysts rely on following measures of system performance: – Throughput– Capacity– Response time– Turnaround time– Resource utilization– Availability– Reliability


Measurement ToolsMeasurement Tools (continued) (continued)

• Throughput: Composite measure that indicates productivity of system as a whole– Usually measured under steady-state conditions – Reflects quantities such as “the number of jobs

processed per day” or “the number of online transactions handled per hour”

– Can also be a measure of volume of work handled by one system unit

– Can be monitored by either hardware or software



• Capacity: Maximum throughput level – Resource becomes saturated and processes in

system aren’t being passed along• Thrashing is a result

– Main memory has been over-committed and level of multiprogramming has reached a peak point

– Can be monitored by either hardware or software– Bottlenecks can be detected by monitoring queues

forming at each resource



• Response time: Interval required to process user’s request – From when user presses key to send message until

system indicates receipt of message – Depends on:

• Workload handled by system at time of request

• Type of job or request being submitted

– Should include both average values and variance



• Turnaround time: Response time for batch jobs– Time from submission of job until output is returned

to user– Same dependencies and measurement

requirements as response time



• Resource utilization: Measure of how much each unit is contributing to overall operation– Usually given as percentage of time a resource is

actually in use• Example: Is CPU busy 60 percent of time?

– Helps analyst to determine: • If there is a balance among system units

• If system is I/O-bound or CPU-bound



• Availability: Indicates likelihood that resource will be ready when user needs it– Influenced by two factors:

• Mean time between failures (MTBF): Average time unit is operational before it breaks down

• Mean time to repair (MTTR): Average time needed to fix failed unit and put it back in service

MTTRMTBF

MTBFty(A)Availabili



Table 12.3: Availability of certain platforms based on 24 hours, 365 days/year use



• Reliability: Measures probability that unit will not fail during given time period– Function of MTBF

))(1()( tMTBFetR



• Measures of performance can’t be taken in isolation from workload being handled by system

• Overall system performance varies with time– Important to define the actual working environment

before making generalizations


Feedback LoopsFeedback Loops

• Feedback loop: A mechanism to monitor system’s resource utilization so adjustments can be made– Prevents processor from spending more time doing

overhead than executing jobs

• Types of feedback loops:– Negative feedback loop– Positive feedback loop


Feedback Loops Feedback Loops (continued)(continued)

• Negative feedback loop: Causes the arrival rate of processes to decrease when system becomes too congested– Helps stabilize system– Keeps queue lengths close to estimated mean

values

• Positive feedback loop: Causes arrival rate to increase when system becomes underutilized– Used in paged virtual memory systems– More difficult to implement than negative loops



Figure 12.3: Negative feedback loop



Figure 12.4: Positive feedback loop


MonitoringMonitoring

• Implemented using hardware or software – Hardware monitors are more expensive

• Have minimum impact on system because they’re outside of it and attached electronically

– Examples: counters, clocks, and comparators, etc.

– Software monitors are relatively inexpensive • Can distort results of analysis because they become

part of system

• Must be developed for each specific system

• Difficult to move from system to system


Monitoring Monitoring (continued)(continued)

• In early systems, performance measurements monitored only CPU speed

• Today’s measurements include other hardware units,OS, compilers, and other system software

• Measurements are made in a variety of ways– Using real programs, usually production programs

• Run with different configurations of CPUs, operating systems, and other components

• Results are called benchmarks

– Using simulation models



• Benchmarks demonstrate specific advantages of a new CPU, operating system, compiler, or piece of hardware– Useful when comparing systems that have gone

through extensive changes– Results highly dependent upon:

• System’s workload

• System’s design and implementation

• Specific requirements of applications loaded on system



Table 12.4: Benchmarking results



Table 12.4 (continued): Benchmarking results


AccountingAccounting

• Pays bills and keeps system financially operable

• In a single-user environment easy to calculate cost of system

• In a multiuser environment, computer costs are usually distributed among users – Based on how much each one uses system’s

resources


Accounting Accounting (continued)(continued)

• For distribution, operating system must be able to:– Set up user accounts– Assign passwords– Identify which resources are available to each user– Define quotas for available resources, such as disk

space or maximum CPU time allowed per job

• Pricing policies vary from system to system



• Pricing policies include some or all of the following:– Total amount of time spent between job submission

and completion– CPU time, main memory usage– Secondary storage used during program execution– Secondary storage used during billing period– Use of system software, number of I/O operations– Time spent waiting for I/O completion– Number of input records read, output records

printed, page faults



• Pricing policies often used as a way to achieve specific operational goals

• Pricing incentives can be used to encourage users to access more plentiful and cheap resources

• Method of billing information depends on environment

• Maintaining billing records online: – Status of each user can be checked before the

user’s job is allowed to enter the READY queue– Results in increased overhead


Patch ManagementPatch Management

• Systematic updating of the operating system and other system software

• Patch: Piece of programming code that replaces or changes code that makes up software

• Primary reasons for software patches: – Need for vigilant security precautions against attacks– Need to assure system compliance with government

regulations– Need to keep systems running at peak efficiency

• Among top eight technologies used most


Patch Management Patch Management (continued)(continued)

Table 12.5: 2004 E-Crime Watch survey results of security and law enforcement executives


Patching FundamentalsPatching Fundamentals

• Essential steps to take before patch installation:1. Identify the required patch

2. Verify patch’s source and integrity

3. Test patch in a safe environment

4. Deploy patch throughout system

5. Audit system to gauge success of patch deployment

• Never patch operating system without a recent data backup in hand


Patching Fundamentals Patching Fundamentals (continued)(continued)

• Patch Availability: Identify the criticality of patch– If critical, plan to apply patch as soon as possible– If not critical, possible to delay installation until a

regular patch cycle begins

• Patch Integrity: Validate source and integrity– Check digital signature or validation tool that comes

with software– Validate digital signature used by vendor to send

new software on a regular basis



• Patch Testing: Test new patch on a sample system or an isolated machine– Test to see:

• If system reboots after patch is installed

• If patched software performs its assigned tasks

– Tested system should resemble complexity of target network

– Test contingency plans to uninstall patch and recover old software if something goes wrong



• Patch Deployment: Installation of patch– On single-user computer, patch deployment is a

simple task• Install software and reboot computer

– On multiplatform system with many users, task is exceptionally complicated

• Should have an accurate inventory of all hardware and software

– Can be gleaned from network mapping software

• Can launch deployment in stages



• Audit the Finished System: Confirm resulting system meets expectations– Verifying all computers are patched correctly and

perform fundamental tasks as expected– Verifying no users had unauthorized versions of

software on computers and thus ineligible for patch– Verifying no users left with unpatched software on

computers



• Audit the Finished System (continued)– Should include:

• Documentation of changes made to system

• Success or failure of each stage of process

– Keep a log of all system changes for future reference– Get feedback from users to verify deployment’s

success


Software OptionsSoftware Options

• Patches can be managed in two ways:– Manually, one at a time– Automatically using software

• Deployment software falls into two groups:– Agent-based software

• Agent must be installed on all target systems before patches can be deployed

– Agentless software• Attractive for administrators of large, complex networks


Timing the Patch CycleTiming the Patch Cycle

• Critical patches must be applied immediately

• Less-critical patches can be scheduled at convenience of systems group

• Routine patches can be:– Applied monthly or quarterly– Timed to coincide with vendor’s service pack release

• Advantage of routine patch cycles: – Allow for thorough review of patch and testing cycles

before deployment


SummarySummary

• The operating system is the orchestrated cooperation of every piece of hardware and software

• When one part of the system is favored, it’s often at the expense of others

• System’s managers must make sure they are using appropriate measurement tools and techniques to verify effectiveness of the system

• System’s managers must evaluate degree of improvement

chapter 12 system management understanding operating systems, fourth edition

Documents

io slide

operating systems strengths

cpu time

system resources

systems resources

additional memory slide

io queues

increased cpu overhead