operating system support for improving data locality on cc-numa machines cse597a presentation by...

17
Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Upload: vernon-shaw

Post on 27-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Operating System Support for improving data locality on

CC-NUMA machinesCSE597A Presentation

By

V.N.Murali

Page 2: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

WHY CC-NUMA?

• Scalable with increase in number of nodes

• Attractive properties.Transparent access to local and remote memory at the cost of increased access latency to remote memory.

• 2 variations,CC-NUMA-(Stanford DASH,MIT Alewife,Sequent),CC-NOW(SUN s3.mp).

Page 3: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

OS support

• Most important issue :Data locality,

• Performance enhancement provided by OS supported page migration and replication by as much as 30%

Page 4: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Issues in Migration/Replication

• When should pages be migrated?

• When should pages be replicated?

• Both are needed to boost performance.

• When not to migrate/replicate is also important.

• Which system parameter can be used to decide? Ideas?

Page 5: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Differences with S/W shared memory

• M & R in S/W DSM is needed for correctness.On CC-NUMA M&R is purely an optimization.

• M & R in S/W DSM is triggered by page faults.On CC-NUMA M&R is triggered by cache misses.

Page 6: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

• If workload exhibits good cache locality,less benefits from M&R.Hence selective criteria for moving pages.

• Study based on SimOS environment.

Page 7: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Solution

• How do we improve data locality?

• 3 access patterns a)primarily accessed by a single process b)mostly read access by many processes c)both read and write access by many processes

• Which method has to be applied for a),b),c)?

Page 8: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Costs to be considered

• 1)Cost of determining candidate pages for M&R. (Cost of cache misses/TLB misses)

• 2)Overhead of M&R.(new mappings,allocating a page,flushing TLB)

• 3)Actual data transfer

• 4)Memory pressure!

Page 9: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali
Page 10: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Key Parameters

Parameters Semantics

Reset interval Number of cycles for reset of all counters

Trigger threshold Number of misses after which page is “hot” for M/R

Sharing threshold Number of misses from another processor for R.

Write threshold Number of writes after which no R

Migrate threshold Number of migrates after which no M.

Page 11: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Summary of the algorithm

• “Hot page”:page whose counter for a processor reaches the trigger threshold

• If the miss counter for this page (on any other processor) reaches the sharing threshold then it is considered for replication else it is considered for migration.

• Replicated only if write counter has not exceeded write threshold.Migrated only if the migrate counter has not exceeded migrate threshold

Page 12: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Implementation details

• Directory controller maintains the miss counters and generates a low-priority interrupt.

• Bunches a couple of pages before raising interrupt.

• Writes to replicated pages are collapsed to a single page

Page 13: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

IRIX changes

• Replication support

• Finer grain locking

• Page table back mappings

Page 14: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Workloads

• Engineering workload:large sequential + memory intensive,used Verilog simulator,Flashlite.

• Parallel application : Raytrace which is a parallel graphics algorithm

• Scientific workload : Splash • Decision support database• Multiprogrammed software: Pmake

Page 15: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Performance analysis

• 3 factors a)user stall time ,b)fraction of misses satisfied in local memory,c)kernel overhead.

• Engineering:large user stall time=>best performance gain.M&R were used successfully

• Raytrace: read only accesses mostly.Mainly benefits from replication.

Page 16: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

• Splash:3 parallel applications,Raytrace,Ocean,Volume rendering.For ocean migration is helpful.Raytrace and Volume can benefit from replication

• Database:mostly read access and hence replication

Page 17: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

Alternative policies

• Static policies,dynamic policies.

• Static:Round robin,First touch,Post facto(similar to optimal page replacement algorithm)

• Dynamic:Migration only,replication only,Migration-Replication.