operating system support for improving data locality on cc-numa machines cse597a presentation by...
TRANSCRIPT
![Page 1: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/1.jpg)
Operating System Support for improving data locality on
CC-NUMA machinesCSE597A Presentation
By
V.N.Murali
![Page 2: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/2.jpg)
WHY CC-NUMA?
• Scalable with increase in number of nodes
• Attractive properties.Transparent access to local and remote memory at the cost of increased access latency to remote memory.
• 2 variations,CC-NUMA-(Stanford DASH,MIT Alewife,Sequent),CC-NOW(SUN s3.mp).
![Page 3: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/3.jpg)
OS support
• Most important issue :Data locality,
• Performance enhancement provided by OS supported page migration and replication by as much as 30%
![Page 4: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/4.jpg)
Issues in Migration/Replication
• When should pages be migrated?
• When should pages be replicated?
• Both are needed to boost performance.
• When not to migrate/replicate is also important.
• Which system parameter can be used to decide? Ideas?
![Page 5: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/5.jpg)
Differences with S/W shared memory
• M & R in S/W DSM is needed for correctness.On CC-NUMA M&R is purely an optimization.
• M & R in S/W DSM is triggered by page faults.On CC-NUMA M&R is triggered by cache misses.
![Page 6: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/6.jpg)
• If workload exhibits good cache locality,less benefits from M&R.Hence selective criteria for moving pages.
• Study based on SimOS environment.
![Page 7: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/7.jpg)
Solution
• How do we improve data locality?
• 3 access patterns a)primarily accessed by a single process b)mostly read access by many processes c)both read and write access by many processes
• Which method has to be applied for a),b),c)?
![Page 8: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/8.jpg)
Costs to be considered
• 1)Cost of determining candidate pages for M&R. (Cost of cache misses/TLB misses)
• 2)Overhead of M&R.(new mappings,allocating a page,flushing TLB)
• 3)Actual data transfer
• 4)Memory pressure!
![Page 9: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/9.jpg)
![Page 10: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/10.jpg)
Key Parameters
Parameters Semantics
Reset interval Number of cycles for reset of all counters
Trigger threshold Number of misses after which page is “hot” for M/R
Sharing threshold Number of misses from another processor for R.
Write threshold Number of writes after which no R
Migrate threshold Number of migrates after which no M.
![Page 11: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/11.jpg)
Summary of the algorithm
• “Hot page”:page whose counter for a processor reaches the trigger threshold
• If the miss counter for this page (on any other processor) reaches the sharing threshold then it is considered for replication else it is considered for migration.
• Replicated only if write counter has not exceeded write threshold.Migrated only if the migrate counter has not exceeded migrate threshold
![Page 12: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/12.jpg)
Implementation details
• Directory controller maintains the miss counters and generates a low-priority interrupt.
• Bunches a couple of pages before raising interrupt.
• Writes to replicated pages are collapsed to a single page
![Page 13: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/13.jpg)
IRIX changes
• Replication support
• Finer grain locking
• Page table back mappings
![Page 14: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/14.jpg)
Workloads
• Engineering workload:large sequential + memory intensive,used Verilog simulator,Flashlite.
• Parallel application : Raytrace which is a parallel graphics algorithm
• Scientific workload : Splash • Decision support database• Multiprogrammed software: Pmake
![Page 15: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/15.jpg)
Performance analysis
• 3 factors a)user stall time ,b)fraction of misses satisfied in local memory,c)kernel overhead.
• Engineering:large user stall time=>best performance gain.M&R were used successfully
• Raytrace: read only accesses mostly.Mainly benefits from replication.
![Page 16: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/16.jpg)
• Splash:3 parallel applications,Raytrace,Ocean,Volume rendering.For ocean migration is helpful.Raytrace and Volume can benefit from replication
• Database:mostly read access and hence replication
![Page 17: Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali](https://reader036.vdocuments.net/reader036/viewer/2022072011/56649e385503460f94b29ad5/html5/thumbnails/17.jpg)
Alternative policies
• Static policies,dynamic policies.
• Static:Round robin,First touch,Post facto(similar to optimal page replacement algorithm)
• Dynamic:Migration only,replication only,Migration-Replication.