pm10white_paper(1).pdf

Upload: captainjackzzz609

Post on 10-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

  • White Paper:

    Parallel processing in PowerMILL 10

    Delcam plc

  • This paper aims to remove the marketing hypesurrounding parallel processing and its performanceimpact on CAM systems. Delcams research to datehelps to separate the fact from the fiction and gives you a true understanding of parallel processing inthe CAM environment.

    In particular, this paper addresses the followingquestions:

    What is parallel computing?

    What influence does hardware configuration have on toolpath calculation times?

    How does parallel computing really benefit end users?

    How will Delcam continue to harness the power of the latest multi-core processors to benefit all aspects of CAM programming?

    White Paper:

    Parallel processing in PowerMILL 10Mark Jacobs - Principal Engineer

    Abstract

  • 1. Introduction ........................................................................................................................

    1.1 What savings can be expected? ...............................................................

    1.2 How realistic are these potential savings? ...........................................

    2. Increased productivity in PowerMILL 10 ..................................................................

    2.1 Background processing ..............................................................................

    2.2 Parallel processing ........................................................................................

    3. Performance improvements in PowerMILL 10 .......................................................

    4. Hardware effects................................................................................................................

    4.1 Adding more cores .......................................................................................

    4.2 Adding more processors ............................................................................

    4.3 Parallel processing and background processing ...............................

    4.4 Which computer? ..........................................................................................

    5. Future developments ......................................................................................................

    5.1 Faster toolpath calculations ......................................................................

    5.2 Larger models ................................................................................................

    6. Terminology and further reading ...............................................................................

    1

    2

    2

    2

    2

    3

    3

    4

    4

    5

    6

    6

    6

    6

    6

    7

    Contents

  • 1. Introduction

    The new buzz words in computing at the momentseem to be multi-core and parallel processing.Increasing the clock speed of the processor has beenreplaced by increasing the number of processor coresin your computer. But what advantage does increasingthe number of cores in a processor give you and doesthe reality actually live up to the hype?

    In PowerMILL 10, parallel computing techniques have been applied in two distinct ways:

    simultaneously. This will undoubtedly provide agreater performance gain for PowerMILL 10 than withany previous PowerMILL release.

    Tests conducted on a range of strategies at Delcamshow that toolpath calculation improvements of threeor four times compared with earlier versions on asingle-core PC are possible. The actual improvementdepends heavily on your hardware configuration andon the toolpath strategy you are calculating. This isdiscussed later in the paper.

    PowerMILL 10 benefits

    Four times faster raster toolpath calculations

    Reduces programming time by up to 2.5 times*

    Less waiting time whilst toolpaths are calculating

    Increases capacity for additional work

    Significantly improves manufacturing productivity

    Reduced lead times

    Ability to handle even larger memory intensive

    files

    *On tests conducted at Delcam over a range of toolpaths. (See Section 3.

    Performance Improvements in PowerMILL 10 - page 3.)

    Figure 1: Intel Core i7 Processor One of many multi-coreprocessors tested in PowerMILL 10 benchmarks.

    Figure 2: PowerMILL 10 diagram showing parallel processingof both foreground and background calculations on a quad-corePC.

    Firstly, you can prepare, calculate or edittoolpaths in the foreground while calculatingother toolpaths in the background, with minimaldegradation in processing speed. This effectivelydoubles your potential productivity. This is whatDelcam terms Background Processing. It workson any hardware but the benefits are greater onmulti-core machines.

    Secondly, parallel processing performs differentparts of a complex calculation at the same time.Essentially, this takes a single function andprocesses it on all the cores in the CPU chip toreduce overall calculation time. To benefit fromparallel processing you need a computer withmore than one processor.

    A third, and, we believe unique benefit offered byPowerMILL 10, is that parallel processing is used inboth foreground and background calculations

    1.

  • 1.1. What savings can be expected?

    If we look at a simplistic example, where in an averageworking week toolpath calculations consume 50% ofan 8 hour shift, then annually this would equate toapproximately 120 days continuous toolpath calculations.

    Tests conducted on a range of toolpaths show thatusing PowerMILL 10 on a quad-core machine resultsin a 60% saving in toolpath calculation times,therefore taking you from the 120 days to less than 50 days.

    In dollar terms, if the internal cost for generatingtoolpaths is $50 per hour (taking into considerationoperator costs, down time, and machining delays due todata starvation), this would equate to 70 days worthof toolpath calculation savings or cost reductions in excess of$28,000.00.

    1.2. So how realistic are these potential savings?

    While it is unrealistic to assume across the boardsavings of three or four times, as some claim,it is realistic to expect major speed increases in calculation time. You will also see significant productivity gains as you can plan, create, and edit toolpaths in the foreground while calculating toolpaths in the background, giving you a new competitive edge at a time when you need it most.

    2. Increased productivity in PowerMILL 10

    Many of the enhancements in PowerMILL 10 eithershorten calculation times or enable calculations toprocess during PowerMILLs idle time. Both of theseaspects greatly improve productivity.

    Background processing allows you to calculate toolpaths, boundaries, or individual stock model states in the background while continuing to interact with PowerMILL.

    Parallel processing the overall calculation is divided into subtasks that can be processed simultaneously. This is only possible when more than one processor is available. Parallel processing can greatly reduce calculation times.

    Specific toolpath speed-ups less memory is used when area clearance toolpaths are calculated, and calculation time has been reduced. This is particularly beneficial when working on large models as it reduces the likelihood of running out of memory.

    2.1. Background processing

    PowerMILL 10 allows you to perform background operations, such as toolpath or boundary creation, while at the same time you can continue preparing, editing or even calculating toolpaths in the foreground of PowerMILL.

    Figure 3: Potential savings (in days) using PowerMILL 10 ona multi-core PC.

    Figure 4: Potential savings (in $) using PowerMILL 10 on a multi-core PC.

    2.

  • To use background processing all you have to do isclick the new Queue button rather than Calculate on a toolpath dialog. PowerMILL checks that everything isset up correctly (such as block, tool...), and adds thetoolpath to the background queue. While you continue working, PowerMILL calculates the toolpaths in the queue in a background process.

    Figure 5: Time compression of background processing in PowerMILL 10. While preparing, editing or calculating toolpaths in the foreground you can also be calculating toolpaths in the background.

    Note: background processing works for boundaries and stock models as well as toolpaths.

    Figure 6: Raster toolpath multi-threading on all 4 cores.

    Other strategies utilizing this code include:

    Constant Z 3D Offset Area clearance Interleaved constant Z Optimised constant Z Boundary calculations

    In addition, the calculation to apply a toolpath to thestock model runs entirely in parallel.

    Parallel processing happens automatically if yourcomputer is suitable; you do not need to do anything to activate it.

    3. Performance improvements inPowerMILL 10

    Delcam has tested PowerMILL 10 on a range oftypical 3-axis parts. These tests show major speedimprovements for raster machining when using multi-processor machines. Figure 7 on page 4 shows the raster toolpath calculation time for PowerMILL 10 as a percentage of the time taken by PowerMILL 9 on thesame computer, for a number of different processorconfigurations.

    3.

    Preparation

    Edit

    Calculation

    Foreground

    Background

    2.2. Parallel processing

    Possibly the most important, but least visibleimprovement in PowerMILL 10 is the use of parallelprocessing in toolpath calculations.

    In PowerMILL 9, Point Distribution performed manycalculations in parallel to improve its performance. InPowerMILL 10, the code that calculates how a toolruns over the model also uses parallel processing. As a result, raster machining calculations runalmost entirely in parallel.

  • There are slight improvements on the singleprocessor machine, caused by other optimizations.

    Raster machining benefits the most from parallelprocessing at the moment. Other strategies benefit too, but not to the same extent. What matters for most users is the overall performance for a typicalmachining project. The graph below compares theperformance of PowerMILL 10 and PowerMILL 9 fora benchmark using a range of strategies.

    Figure 8: PowerMILL 10 benchmark calculations compared withPowerMILL 9 on different processor configurations.

    On the quad-core processors the benchmark runs about 1.5 times faster in PowerMILL 10.These benchmark tests can be requested by [email protected] and will also be includedin the PowerMILL examples folder on the installationDVD.

    4. Hardware effects

    It is tempting to think that once parallel processing issupported then the way to improve performance is toadd more and more processors. However, the testresults show that things are not quite that simple. It isapparent from both performance graphs that theprocessor configuration has a significant effect on thecalculation time. It is not immediately obvious why twodual-core processors are significantly slower than asingle quad-core, and it is surprising that two quadcore processors (eight cores in total) perform worsethan a single quad-core.

    The trends we can see in both graphs are:

    Adding more cores improves performance, but... More than four cores makes little difference, and... Adding more processors reduces performance.

    4.

    Figure 7: PowerMILL 10 raster calculation compared withPowerMILL 9 on different processor configurations.

    4.1. Adding more cores

    Why then does the benefit of adding processors tailoff?

    The problem of parallel processing in a computerprogram is very much like organising the productionof a single product in a company. You need to decide who does what, how their efforts are coordinated and how different peoples output gets combined into the final product.

    The interaction between people means some form ofmanagement is necessary, which is generally anoverhead. The first problem is to make a management system that works at all. The second problem is tominimise this overhead.

    Consider the production of a magazine. The basic process is:

    Write articles. Edit and collate, to produce the magazine.

  • an article each, taking half an hour. However there is still only one editor who will still take two hours to edit the magazine. Putting more and more people into writing articles is never going to speed this up; now you have to think aboutusing more than one editor.

    Deciding how to apply parallel processing in CAMsoftware is very similar. There is often one step thattakes the majority of the time. When you parallelisethis task it exposes other tasks that are now taking themost time. To get the calculation to go significantlyfaster you need to parallelise other tasks.

    Performance gains are limited by the fraction of theprogram that can be parallelised to run on multiplecores simultaneously; this effect of diminishing returnsis known as Amdahls law. For example if only 50%of a toolpath can be parallelised, the theoreticalspeedup would be 1.9x, as shown in Figure 9, no matter how many cores are available.

    Figure 9: Amdahls law illustrating the maximum theoretical speedup achievable using up to 32 processors, for tasks where differentproportions of the work can be done in parallel.

    In PowerMILL the most performance-critical processis the production of gouge-free tool passes over themodel. This now runs in parallel, but the nature of the algorithms means that most of the improvements are achieved with four processes running in parallel. Toachieve further speed increases we need to make otherroutines work in parallel.

    4.2. Adding more processors

    Figures 7 and 8 show that two dual-core processors are slower than a single quad-core. They also show that two quad-core processors are slower than the single quad-core. We have already seen that PowerMILL gains limited benefits from more than four cores, but why is performance reduced when the cores are in separate processor packages?

    In this case the obvious target for parallel processing is the writing of the articles. If there are four articles, then three authors and the editor could write one each, taking 5 hours and reducing the total production time to 7 hours (about three times faster than one person).

    What happens if more authors are available? It might be possible to get two authors to work on each article, but this would require much closer cooperation, and it is unlikely that they would complete the work in half the time it would take for a single author to do it. If they could complete an article in 3 hours between them, the a team of seven authors plus the editor could produce the whole magazine in 5 hours. Doubling the team from four to eight has only increased the speed by 40%.

    If there were forty articles and forty authors, there is a different problem. The authors could write

    5.

    Task Man hoursWrite articles 20

    Edit magazine 2

    Production time 22

    Let us assume that the effort to produce a typical magazine breaks down as shown in the table:

    The clock speed of modern processors is so high that a major limit on their performance is the time it takes to access main memory. Processor manufacturers reduce this bottleneck by including fast cache memory on the processor chip. Frequently used data is kept in the cache where it can be accessed quickly. When a processor has multiple cores they share the same on-chip cache.

    When processor cores are working in parallel, thecommunication between cores to coordinate their tasks can take advantage of the shared cache provided they are all on the same chip. However, the benefit of the shared cache is lost if some of the cores are on a separate chip and communication has to use anexternal bus or main memory.

  • To obtain the most benefit from PowerMILL 10 werecommend: Intel Core2 Quad Q9550 (2.83GHz, 1333MHz FSB, 12MB L2 Cache, Quad Core) 375W 8GB (4 x 2.0GB DIMM) 800MHZ ECC Dual Channel Memory (requires 64-bit O/S) 512MB PCIe x16 nVidia Quadro FX 3700 (MRGA15), Dual Monitor DVI or VGA Graphics Card (configured in a hardware mirror) 2 x 320GB (7,200 rpm) SATA 3.0Gb/s Hard Drive with NCQ and 16MB DataBurst Cache Genuine Windows Vista Business x64 SP1

    If calculation time is a major issue then dual

    quad-core processors will help foreground and

    background calculations to run at maximum

    speed.

    2 x Intel Xeon X5450 (3.00GHz, 1333MHz, 2x6MB

    Cache, Quad Core)

    16GB, 667MHz, ECC Memory (8x2GB)

    512MB PCIe x16 nVidia Quadro FX 3700 (MRGA15),

    Dual Monitor DVI or VGA Graphics Card

    2 x 320GB (7,200 rpm) SATA 3.0Gb/s Hard Drive

    with NCQ and 16MB DataBurst Cache

    Genuine Windows Vista Business x64 SP1 WITH

    Media

    5. Future developments

    5.1. Faster toolpath calculationsWe expect that future versions of PowerMILL willgive faster overall calculation times in two ways:-

    1. Increasing the amount of multi-threading in the program. This will improve the overall benchmark time on dual-core and quad-core machines.2. Optimising data structures to make better use of processor caches. This will allow multi-chip computers to work more efficiently and will improve the dual quad-core times significantly.

    5.2. Larger Models

    Future versions of PowerMILL will include full 64-bitsupport. On 64-bit machines, the amount of RAM thatcan be used will only be limited by what can be installed. This will allow extremely large or complexparts to be processed successfully.

    A further overhead arises because cache coherencymust be maintained - the contents of the caches must be kept in step with the contents of main memory, and vice-versa. It is quite complex to keep a single cache up to date, but the problem becomes much morecomplex and time consuming when coherency has tobe maintained between two or more caches and main memory.

    4.3. Parallel processing and backgroundprocessing

    Background processing allows you to organise youractivities so that you dont have to wait forPowerMILL to calculate toolpaths; parallel processingreduces toolpath calculation times.

    Toolpath calculations in PowerMILL 10 benefit fromparallel processing whether they are running in theforeground or in the background. Therefore, byrunning foreground and background calculationssimultaneously, it is possible (on a suitably equippedcomputer) to make full use of up to eight cores and 8GB of memory.

    4.4. Which computer should I use?

    PowerMILL 10 will work on the same hardware asPowerMILL 9. Background processing works onsingle processor machines and dual-processormachines will show noticeable performance benefits from parallel processing as well.

    6.

  • 6. Terminology and Further Reading

    There are a lot of very similar terms used to describeparallel computing. We have tried to use terminologyconsistently as follows:

    Processor the part of the computer that does the real work, sometimes known as a Central Processing Unit or CPU. In the past, processors were packaged singly, but increasingly these days multi-core processors include two, four or more processors on a single chip.

    Background processing - the ability to prepare (or calculate) toolpaths in the foreground whilst calculating another toolpath in the background. In this case two separate calculations are performed at the same time. This is sometimes referred to as multi-tasking.

    Parallel processing - the ability to perform different parts of a single calculation simultaneously, essentially taking a single function and dividing it into parts that can be processed at the same time on different processors. This is sometimes referred to as multi-threading.

    Parallel computing - the ability to performmany calculations simultaneously. This canbe either parallel processing or backgroundprocessing (or both). This is sometimesreferred to as multi-processing or multi-coreprocessing.

    There is a lot of material about parallel computingavailable on-line. Below is a sample of useful links;most include references to much more detailed information.

    http://en.wik ipedia.org/wik i/Parallel_computing is a good overview of the whole subject of parallel computing.h t t p : / / e n . w i k i p e d i a . o r g / w i k i / M u l t i c o r e _(computing) discusses the evolution and different types of multi-core processor.h t t p : / / e n . w i k i p e d i a . o r g / w i k i / M u l t i p r o c e s s i n g goes into detail about different types of multiprocessing.http://en.wikipedia.org/wiki/Multithreading_(computer_hardware) talks about the differenttypes of multi-threading.

    7.