intel® atom™ processor - graphics developer's guide · 8 optimization notice ... this...

47
Intel® Atom™ Processor - Graphics Developer's Guide How to maximize graphics and game performance on Intel® Atom™ processor-based platforms Copyright © 2008-2011 Intel Corporation All Rights Reserved Revision: 1.0 Contributors: Ron Fosner, Orion Granatir World Wide Web: http://www.intel.com

Upload: others

Post on 24-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel®

Atom™ processor-based platforms

Copyright © 2008-2011 Intel Corporation

All Rights Reserved

Revision: 1.0

Contributors: Ron Fosner, Orion Granatir

World Wide Web: http://www.intel.com

Page 2: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

2

Disclaimer and Legal Information

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO

LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR

OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD

CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are

available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Software Source Code Disclaimer

Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license:

Intel Sample Source Code License Agreement

This license governs use of the accompanying software. By installing or copying all or

any part of the software components in this package, you (“you” or “Licensee”) agree

to the terms of this agreement. Do not install or copy the software until you have

carefully read and agreed to the following terms and conditions. If you do not agree

to the terms of this agreement, promptly return the software to Intel Corporation

(“Intel”).

Page 3: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 3

1. Definitions:

A. “Materials" are defined as the software (including the Redistributables and

Sample Source as defined herein), documentation, and other materials, including any updates and upgrade thereto, that are provided to you under this Agreement.

B. "Redistributables" are the files listed in the "redist.txt" file that is included in

the Materials or are otherwise clearly identified as redistributable files by Intel.

C. “Sample Source” is the source code file(s) that: (i) demonstrate(s) certain functions for particular purposes; (ii) are identified as sample source code;

and (iii) are provided hereunder in source code form.

D. “Intel‟s Licensed Patent Claims” means those claims of Intel‟s patents that

(a) are infringed by the Sample Source or Redistributables, alone and not in combination, in their unmodified form, as furnished by Intel to Licensee and (b) Intel has the right to license.

2. License Grant: Subject to all of the terms and conditions of this Agreement:

A. Intel grants to you a non-exclusive, non-assignable, copyright license to use

the Material for your internal development purposes only.

B. Intel grants to you a non-exclusive, non-assignable copyright license to reproduce the Sample Source, prepare derivative works of the Sample Source and distribute the Sample Source or any derivative works thereof

that you create, as part of the product or application you develop using the

Materials. C. Intel grants to you a non-exclusive, non-assignable copyright license to

distribute the Redistributables, or any portions thereof, as part of the product or application you develop using the Materials.

D. Intel grants Licensee a non-transferable, non-exclusive, worldwide, non-

sublicenseable license under Intel‟s Licensed Patent Claims to make, use, sell, and import the Sample Source and the Redistributables.

3. Conditions and Limitations:

A. This license does not grant you any rights to use Intel‟s name, logo or

trademarks.

B. Title to the Materials and all copies thereof remain with Intel. The Materials are copyrighted and are protected by United States copyright laws. You will not remove any copyright notice from the Materials. You agree to prevent

any unauthorized copying of the Materials. Except as expressly provided herein, Intel does not grant any express or implied right to you under Intel patents, copyrights, trademarks, or trade secret information.

Page 4: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

4

C. You may NOT: (i) use or copy the Materials except as provided in this Agreement; (ii) rent or lease the Materials to any third party; (iii) assign this Agreement or transfer the Materials without the express written consent of Intel; (iv) modify, adapt, or translate the Materials in whole or in part except as provided in this Agreement; (v) reverse engineer, decompile, or

disassemble the Materials not provided to you in source code form; or (vii) distribute, sublicense or transfer the source code form of any components of the Materials and derivatives thereof to any third party except as provided in this Agreement.

4. No Warranty:

THE MATERIALS ARE PROVIDED “AS IS”. INTEL DISCLAIMS ALL EXPRESS OR IMPLIED WARRANTIES WITH RESPECT TO THEM, INCLUDING ANY

IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, AND FITNESS FOR ANY PARTICULAR PURPOSE.

5. Limitation of Liability: NEITHER INTEL NOR ITS SUPPLIERS SHALL BE

LIABLE FOR ANY DAMAGES WHATSOEVER (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR OTHER LOSS) ARISING OUT OF THE USE OF OR INABILITY TO USE THE SOFTWARE, EVEN IF INTEL HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. BECAUSE SOME JURISDICTIONS PROHIBIT THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL

DAMAGES, THE ABOVE LIMITATION MAY NOT APPLY TO YOU. 6. USER SUBMISSIONS: You agree that any material, information or other

communication, including all data, images, sounds, text, and other things embodied therein, you transmit or post to an Intel website or provide to

Intel under this Agreement will be considered non-confidential ("Communications"). Intel will have no confidentiality obligations with

respect to the Communications. You agree that Intel and its designees will be free to copy, modify, create derivative works, publicly display, disclose, distribute, license and sublicense through multiple tiers of distribution and licensees, incorporate and otherwise use the Communications, including derivative works thereto, for any and all commercial or non-commercial purposes

7. TERMINATION OF THIS LICENSE: This Agreement becomes effective on the

date you accept this Agreement and will continue until terminated as provided for in this Agreement. Intel may terminate this license at any time if you are in breach of any of its terms and conditions. Upon termination, you will immediately return to Intel or destroy the Materials and all copies thereof.

8. U.S. GOVERNMENT RESTRICTED RIGHTS: The Materials are provided with "RESTRICTED RIGHTS". Use, duplication or disclosure by the Government is subject to restrictions set forth in FAR52.227-14 and DFAR252.227-7013 et seq. or its successor. Use of the Materials by the Government constitutes acknowledgment of Intel's rights in them.

9. APPLICABLE LAWS: Any claim arising under or relating to this Agreement shall be governed by the internal substantive laws of the State of Delaware,

Page 5: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 5

without regard to principles of conflict of laws. You may not export the Materials in violation of applicable export laws.

* Other names and brands may be claimed as the property of others.

Intel and Intel Atom are trademarks of Intel Corporation in the U.S. and/or other countries.

Copyright (C) 2008 – 2011, Intel Corporation. All rights reserved.

Revision History

Revision Number Description Revision Date

1.0 Intel® Atom™ Processor Developer's Guide Feb 2011

Page 6: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

6

Contents

Disclaimer and Legal Information ..............................................................................................2

Software Source Code Disclaimer .............................................................................................2

Copyright (C) 2008 – 2011, Intel Corporation. All rights reserved. ................................................5

Revision History......................................................................................................................5

1 About this Document ...........................................................................................8

1.1 Intended Audience ...................................................................................9 1.2 Conventions, Symbols, and Terms .............................................................9 1.3 Related Information ............................................................................... 10

2 Intel® Atom™ Processor Optimization ................................................................. 12

2.1 Overview .............................................................................................. 12 2.2 Intel® Atom™ Processor Series ............................................................... 12

2.2.1 Detecting Intel® Atom™ Processors ........................................... 13 2.3 Intel® Atom™ Processor Block Diagram ................................................... 16 2.4 Front End .............................................................................................. 17

2.4.1 Locating x87 instructions ........................................................... 17 2.4.2 Avoid x87 instructions ............................................................... 18 2.4.3 Intel® Hyper-Threading Technology ........................................... 19

2.5 Execution Core ...................................................................................... 20 2.5.1 Optimization with Intel® Streaming SIMD Extensions (Intel® SSE) 20 2.5.2 Optimization for In-order Execution ............................................ 21 2.5.3 64-bit support .......................................................................... 23

2.6 Tools .................................................................................................... 23 2.6.1 Intel® Composer XE (Compilers and Libraries) ............................. 23 2.6.2 Intel® VTune™ Amplifier XE ...................................................... 24 2.6.3 Intel® Graphics Performance Analyzers (Intel® GPA) - Platform

Analyzer .................................................................................. 25 2.7 Intel® Atom™ Processor-based Platform Optimizations .............................. 26

2.7.1 Tune for Power ......................................................................... 26 2.7.2 Tools ....................................................................................... 27

3 Intel® Atom™ Processor Integrated Graphics ...................................................... 28

3.1 Overview .............................................................................................. 28 3.2 Understanding the Intel® Atom™ Processor 3D Graphics Systems ............... 29 3.3 Intel® Graphics Media Accelerator 950/3150 ............................................ 30 3.4 Intel® Graphics Media Accelerator 500/600 .............................................. 31 3.5 Graphics API Support ............................................................................. 31 3.6 Detecting GPUs...................................................................................... 32

4 Quick Tips: Graphics Performance Tuning ............................................................ 33

4.1 Primitive Processing ............................................................................... 33 4.1.1 Vertex Capabilities .................................................................... 33 4.1.2 Tips On Vertex/Primitive Processing ............................................ 33

4.2 Shader Capabilities ................................................................................ 34

Page 7: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 7

4.2.1 Tips on Shader Capabilities ........................................................ 35 4.3 Texture Sample and Pixel Operations ....................................................... 36

4.3.1 Tips on Texture Sampling / Pixel Operations ................................ 36 4.4 Managing Constants on Microsoft DirectX* ................................................ 37 4.5 Graphics Memory ................................................................................... 38

4.5.1 Resource Management .............................................................. 38 4.5.2 Checking for Available Memory ................................................... 39

4.6 Creating a Microsoft DirectX* 9 Device for Intel® Atom™ Processor Graphics 39

5 Performance Analysis with Intel® Graphics Performance Analyzers ......................... 42

5.1 Intel® GPA Monitor ................................................................................ 42 5.2 Intel® GPA System Analyzer HUD ............................................................ 43 5.3 Intel® GPA Frame Analyzer .................................................................... 43 5.4 Diagnosing Performance Bottlenecks ........................................................ 43

6 Support ........................................................................................................... 45

7 References ....................................................................................................... 46

8 Optimization Notice ........................................................................................... 47

§

Page 8: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

8

1 About this Document

This document provides development hints and tips to ensure that your customers will have a great experience playing your games and running other interactive 3D graphics applications on platforms with Intel® Atom™ processors. This document details

software development practices encompassing the entire range of Intel® Atom™ processors with a focus on performance analysis using Microsoft DirectX*. Intel® Software Development Products useful in optimizing and profiling graphics applications are discussed throughout this document.

Figure 1 - The Intel® Atom™ processors are the brand names for a family of low-power

processors and platforms designed specifically for mobile Internet devices.

Intel Atom processors enable a broad range of devices including netbooks, entry-level desktops, tablets, handhelds, smartphones, consumer electronics (CE) devices, and other companion devices. Today Intel Atom processors integrate features such as controllers for memory, graphics, video, and display for a host of new applications

that deliver flexibility and innovation. In the future, 32nm-based System-on-Chip (SoC) solutions will provide even greater functionality and form factor options.

Page 9: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 9

Intel Atom processors are optimized to enable new connected experiences with a

range of capabilities:

A new range of power-efficient devices with excellent performance enabled by

industry-leading 45nm high-k metal gate technology and soon, 32nm silicon

process technology

Highly integrated application processor that transforms everyday devices

Smaller, more compact designs with a thermal design power (TDP) ranging from

less than 1 watt to 13 watts

Low power options in select devices enabling incredibly low idle, allowing

devices to conserve energy

Better performance and increased system responsiveness enabled by Intel®

Hyper-Threading Technology (Intel® HT Technology)

Therefore, it makes sense to write your 3D applications to take advantage of this broad market and optimize the experience for the greatest number of people. By following the tips and tricks in this document, you have the opportunity to make your application shine with the graphics volume market leader.

1.1 Intended Audience

This document is targeted at experienced graphics developers who are familiar with

OpenGL*/Microsoft DirectX*, C/C++, multithread and shader programming, Microsoft Windows* operating systems, and 3D graphics.

1.2 Conventions, Symbols, and Terms

The following conventions are used in this document.

Table 1 Coding Style and Symbols Used in this Document

Source code:

for(int i=0;i<10; ++i ){

cout << i << endl;

The following terms are used in this document.

Table 2 Terms Used in this Document

1. Intel Integrated Graphics Hardware (IIG)

a. GPU – Graphics Processing Unit

b. GMCH – Graphics and Memory Controller Hub – a parent component

architecture and chipset housing some Intel integrated graphics hardware

(GPU)

Page 10: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

10

c. GMA – Graphics Media Accelerator – component name describing the GPU

chipset component in Intel integrated graphics.

d. UMA – Unified Memory Architecture – an architecture where the graphics

subsystem does not have exclusive dedicated memory and uses the host

system‟s memory (SDRAM)

e. DVMT – Dynamic Video Memory Technology – a memory allocation scheme

in UMA systems which allocates an exclusive, dynamically resizable chunk of

main memory to the graphics (driver)

f. VF – Vertex Fetch

g. VS – Vertex Shader

h. PS – Pixel Shader

i. GS – Geometry Shader

j. EU – Execution Unit, a vector machine component

k. CS – Command Stream manager component controlling 3D and media

l. I$ - Instruction cache

m. SO – Stream Output

2. Imagination Technologies POWERVR*

a. USSE – Universal Scalable Shader Engine

b. CGS – Course Grain Scheduler

c. ISP – Image Synthesis Processor

3. SWGP – Software geometry processing, a superset of CPU-based processing that

includes CPU vertex processing. SWGP is not equivalent to the Microsoft DirectX*

reference device.

4. SWVP – Software vertex processing

5. HWVP – Hardware vertex processing

1.3 Related Information

There are several other places you can look for additional information on Intel graphics, including the following sites:

Intel® HD Graphics: http://software.intel.com/en-us/articles/intel-graphics-

developers-guides/

Intel® 4 Series Chipsets (the Intel® 4500, X4500, and X4500HD GMAs) Developer‟s

Guide: http://software.intel.com/en-us/articles/intel-graphics-media-accelerator-developers-guide/

Intel® 3 Series Express Chipsets including the Intel® 3000 GMA and Intel® X3000 GMA Developer‟s Guide: http://software.intel.com/en-us/articles/intel-gma-3000-and-

x3000-developers-guide/.

We hope your questions are covered in these resources, including this guide. We are constantly updating these resources and welcome your comments and suggestions. If you have questions not answered in these resources, or have suggestions on

Page 11: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 11

improving the guide, please get in touch with us at: [email protected]. If you are actively working with Intel already, you can also reach us through your engineering or account management contacts.

Page 12: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

12

2 Intel® Atom™ Processor Optimization

2.1 Intel® Atom™ Processor Overview

The Intel® Atom™ processor was designed for general performance requirements of modern workloads while maintaining low power consumption.

The key features allowing the Intel Atom processors to maintain this low power

consumption and efficient performance include: - Intel® Hyper-Threading Technology provides two logical processors for

multitasking and multi-threading workloads.

- Support for Single-Instruction Multiple-Data (SIMD) extensions up to Intel® Streaming SIMD Extensions 3 (Intel® SSE3) and Supplemental Streaming SIMD Extensions 3 (SSSE3).

- Enhanced Intel SpeedStep® Technology enables the operating system (OS) to program a processor to transition to lower frequency and/or voltage levels while executing a workload.

- Support deep power down technology to reduce static power consumption by

turning off power to cache and other sub-systems in the processor.

- For greater power efficiency, Intel Atom processors utilize in-order processing. This differs from common out-of-order processors found in desktops and laptops. Intel Atom processors will not reorder an instruction stream to

extract instruction-level parallelism like other Intel® processors.

Note: For an in-depth resource on optimizing for Intel® Atom™ processors, please review the Intel® 64 and IA-32 Architectures Software Developer's Manuals:

http://www.intel.com/products/processor/manuals/.

For Intel Atom processors, see Chapter 12 of the Intel® Architecture Optimization Reference Manual.

2.2 Intel® Atom™ Processor Series

The best place to get information about all available Intel® Atom™ processors is ark.intel.com.

Page 13: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 13

Intel® Atom™ Processor Series

Devices 64-bit Support

Hyper-threading

N270, N280 Netbooks

No Yes

N4xx series , N5xx series Netbooks

Yes Yes

D4xx series, D5xx series Entry level desktops

(e.g. nettops)

Yes Yes

230, 330 Entry level desktops

(e.g. nettops)

Yes Yes

Z5xx Mobile Internet Devices (MIDs), some netbooks

No Yes (except Z510)

Z6xx Mobile Internet Devices

(MIDs), some netbooks

No Yes

CE4100 Consumer Electronics

(e.g. Internet TV)

No Yes

E-series Embedded

No Yes

2.2.1 Detecting Intel® Atom™ Processors

An application can use the CPUID instruction to determine information about the host processor. This includes detecting Intel® Atom™ processors and support for features

like Intel® Hyper-Threading Technology.

Note: For a more in-depth discussion and full cross-platform processor detection, please refer to Chapter 14 of the Intel® 64 and IA-32 Architectures Software

Developer’s Manual Volume 1: Basic Architecture.

Beginning with the Intel486™ processor family, the type of CPU can be determined based on the processor identification signature. For all currently shipping Intel Atom processors (those manufactured using the 45 nm process), the processor identification

signature will be (values are in binary):

Extended Family Extended Model Type Family Code Model No.

00000000 0001 00 0110

1100

Page 14: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

14

The following source code will determine if the application is running on an Intel Atom

processor and check for Intel Hyper-Threading Technology. Please note, this source code is not completely cross-platform because it doesn‟t properly support older CPUs (e.g., Intel386™ and Intel486 CPUs). For a more in-depth discussion and full cross-platform processor detection, please review the Intel® 64 and IA-32 Architectures

Software Developer’s Manual Volume 1: Basic Architecture.

struct CPUInfoStruct

{

union {

char CPUBrandString[48];

__int32 nCPUBrandString[16*3];

};

int nSteppingID;

int nModel;

int nFamily;

int nProcessorType;

int nBasicProcessorID;

int nExtendedModel;

int nExtendedFamily;

bool bAtomProcessor;

bool bHyperThreading;

char CPUString[13];

};

bool isAtom( const CPUInfoStruct& info)

{

// firstChar is beginning pointer, c is end minus the string

// length we're looking for

char const * firstChar = info.CPUBrandString;

// Atom(TM) = 8 chars

char const * c =

info.CPUBrandString + sizeof(info.CPUBrandString)/sizeof(char) - 8;

// search backwards, looking for 'A' 't' 'o' 'm' '(' 'T' 'M' ')' or till

// we hit decrement past firstChar

while ( c >= firstChar )

{

if ( c[0] == 'A' &&

c[1] == 't' &&

c[2] == 'o' &&

c[3] == 'm' &&

c[4] == '(' &&

c[5] == 'T' &&

c[6] == 'M' &&

c[7] == ')' )

{

return true;

}

--c;

}

return false;

}

// This function fills up the CPU Info Struct for us.

// multiple CPUID calls are necessary to get all the information and

// each call gives you more information about the depth of calls you can make.

void fillCPUInfo(CPUInfoStruct& info)

{

__int32 CPUInfo[4] = {0};

Page 15: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 15

::memset(&info, 0, sizeof(CPUInfoStruct));

// cpuid intrinsic calls the cpuid instruction and returns 4 32bit values

// the results depend upon the InfoType parameter passed in

// Get the number of maximum InfoType value we can call for this

// processor and also get the ID string

::__cpuid(CPUInfo, 0);

// Swap last two to put in readable form

int temp = CPUInfo[2];

CPUInfo[2] = CPUInfo[3];

CPUInfo[3] = temp;

// Copy 12 characters

::memcpy(info.CPUString, &(CPUInfo[1]), 12 ); // 13th position is zero

// Check to see if we can make the next call

if( CPUInfo[0] < 1 )

{

return;

}

// Call with InfoType == 1

// CPUInfo will be set to the following:

// 0: Bits 0-3: Stepping ID

// 0: Bits 4-7: Model Number

// 0: Bits 8-11: FamilyCode

// 0: Bits 12-13: Processor Type

// 0: Bits 14-15: Reserved

// 0: Bits 16-19: Extended Model

// 0: Bits 20-27: Extended Family

// 0: Bits 28-31: Reserved

// 3: Bit 28: Hyper-threading technology

::__cpuid(CPUInfo, 1);

info.nSteppingID = CPUInfo[0] & 0xf; // bits 0-3

info.nModel = (CPUInfo[0] >> 4) & 0xf; // bits 4-7

info.nFamily = (CPUInfo[0] >> 8) & 0xf; // bits 8-11

info.nProcessorType = (CPUInfo[0] >> 12) & 0x3; // bits 12-13

info.nExtendedModel = (CPUInfo[0] >> 16) & 0xf; // bits 16-19

info.nExtendedFamily = (CPUInfo[0] >> 20) & 0xff;// bits 20-27

info.bHyperThreading = (CPUInfo[3] & 0x10000000) != 0;// bit 28

// Check to see if we can get the Processor Brand String

// Call with InfoType == 0x80000000

::__cpuid(CPUInfo, 0x80000000);

if( CPUInfo[0] < 0x80000004 ) // extended info supported up to 4?

{

return;

}

// Yes, make the 3 calls (16 chars each or 4 ints each)

// to make up the brand string - it's null terminated

::__cpuid(info.nCPUBrandString + 0, 0x80000002);

::__cpuid(info.nCPUBrandString + 4, 0x80000003);

::__cpuid(info.nCPUBrandString + 8, 0x80000004);

// Now we can check for Atom(TM) processors

info.bAtomProcessor = isAtom( info );

}

int _tmain(int argc, _TCHAR* argv[])

Page 16: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

16

{

// This is how to use the code

CPUInfoStruct info; // Create the struct

fillCPUInfo(info); // Fill it

// Query the bits

printf_s("CPUString : %s\n", info.CPUString);

printf_s("Brand String : %s\n", info.CPUBrandString);

printf_s("Hyperthreaded?: %s\n", info.bHyperThreading ? "Yes": "No");

printf_s("is it an Atom : %s\n", info.bAtomProcessor ? "Yes": "No");

return 0;

}

2.3 Intel® Atom™ Processor Block Diagram

Page 17: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 17

2.4 Front End

The front end features a power-optimized pipeline that can deliver up to two instructions per cycle to the instruction queue for scheduling. This means the ideal retired “cycle per instruction” is 0.5.

Tip: By default, Intel® VTune™ Amplifier XE will show “Clocks per Instructions Retired – CPI”. For this metric, the ideal is 0.5. However, there is a lot of things that can prevent an ideal scenario like delays due to cache misses, etc. See Section 2.6.2 for

more details on Intel VTune Amplifier XE.

It‟s important to avoid legacy x87 instructions (see the next Section “Locating x87 instructions” for more details). Back-to-back x87 can cause the front end to stall because the front end can only handle decoding one x87 instruction per cycle.

Tip: Avoid x87 instructions; see Section 2.4.2 for more details. In general, Intel® SSE will have better performance at lower power utilization. Whenever possible, use Intel SSE for floating point-intensive operations. See Section 2.5.1 for more

information about Intel SSE.

2.4.1 Locating x87 instructions

If the compiler generates code using x87 instructions, then the disassembly view will appear similar to the following: for(int i=0; i!=n ;i++)

003238C8 mov esi,dword ptr [ebp+8]

003238CB push edi

003238CC mov edi,dword ptr [dest]

003238CF add esi,8

003238D2 mov ebx,4000h

N[i] = V[i] / magnitude(V[i]);

003238D7 fld dword ptr [esi-4]

003238DA fld dword ptr [esi-8]

003238DD fld dword ptr [esi]

003238DF fld st(1)

003238E1 fmulp st(2),st

003238E3 fld st(2)

003238E5 fmulp st(3),st

003238E7 fxch st(1)

003238E9 faddp st(2),st

003238EB fmul st(0),st

003238ED faddp st(1),st

003238EF fstp dword ptr [ebp-4]

003238F2 fld dword ptr [ebp-4]

003238F5 call _CIsqrt (3255B0h)

003238FA fstp dword ptr [ebp-4]

...

Page 18: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

18

The assembly instructions beginning with the letter 'f', including fmul, fld, faddp, and

fmulp, are legacy pre-Intel® Pentium® processor x87 math coprocessor instructions. Furthermore, in the 2nd to last line the code, call _CIsqrt, is invoking a function call to compute the square root rather than putting this inline. This sort of assembly is not ideal for high-performance code.

2.4.2 Avoid x87 instructions

2.4.2.1 Proper Microsoft Visual Studio* Settings

In Microsoft Visual Studio*, there is a setting that will avoid generating x87 instructions.

In the Project Properties, under C/C++ group is Code Generation options. Set “Enable Enhanced Instruction Set” to “Streaming SIMD Extensions 2” so the compiler will generate Intel® SSE instructions to better use all the execution units and avoid generating x87 instructions. Also change “Floating Point Model” to “Fast” so that the compiler will use 32-bit instead of double. Changing these options will require a

rebuild of all the code to take effect.

Tip: For Microsoft Visual Studio*, set Enhanced Instruction Set to Streaming SIMD Extensions 2 (/arch:SSE2).

Tip: For Microsoft Visual Studio*, set Floating Point Model to Fast.

2.4.2.2 Proper GCC Settings

The -ffast-math option is appropriate for games and allows the compiler to generate

faster math code that doesn‟t exactly implement IEEE or ISO rules and specifications for math functions. This option sets -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range.

Please note that this option might have adverse effects on functionality that requires a

high level of precision or cross-platform support.

Tip: For GCC, use the -ffast-math option.

The -mssse3 switch enables the compiler to generate Supplemental SSE3 (SSSE3) instructions. Since all Intel Atom processors support SSSE3, this will better utilize all execution units and avoid generating x87 instructions.

Tip: For GCC, use the -mssse3 option.

Changing these options will require a rebuild of all the code to take effect.

Page 19: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 19

2.4.2.3 Proper Intel® C++ Composer XE 2011 Settings

The /fp:fast and -fp-model fast options are appropriate for games and allow the

compiler to generate faster math code that doesn‟t exactly implement IEEE or ISO rules and specifications for math functions.

Please note that this option might have adverse effects on functionality that requires a high level of precision or cross-platform support.

Tip: For Intel® C++ Composer XE 2011 for Windows*, use the /fp:fast option. Tip: For Intel® C++ Composer XE 2011 for Linux*, use the -fp-model fast option.

The SSE_ATOM option allows the compiler to generate Supplemental SSE3 (SSSE3)

and MOVBE instructions. Since all Intel Atom processors support SSSE3, this will better utilize all execution units and avoid generating x87 instructions.

Tip: For Intel® C++ Composer XE 2011 for Windows*, use the /QxSSE3_ATOM

option. Tip: For Intel® C++ Composer XE 2011 for Linux*, use the -xSSE3_ATOM option.

2.4.3 Intel® Hyper-Threading Technology

Intel® Hyper-Threading Technology (Intel® HT Technology) enables multiple threads

to run on each core. Intel HT Technology is designed to increase processor throughput and overall performance on threaded software. Nearly all Intel Atom

processors support Intel HT Technology.

Tip: Use threading to fully utilize all components of the Intel® Atom™ processor. This

is especially true for multi-core Intel Atom processors.

The instruction queue is statically partitioned for scheduling instruction execution from

two threads. The scheduler is able to pick one instruction from either thread and dispatch to either of port 0 or port 1 for execution. The hardware makes selection choice on fetching/decoding/dispatching instructions between two threads based on criteria of fairness as well as each thread‟s readiness to make forward progress.

Basically, if one thread isn‟t using all execution units due to stalling from

dependencies or unbalanced instructions streams, a second thread can run on underutilized execution units.

Note: Intel® GMA 950/3150 graphics offload vertex processing to the CPU. This

means that 3D game applications and associated vertex processing work from the driver will being utilizing CPU resource.

The multithreaded graphics driver will be running alongside your app and utilizing

resources. Threading might incur a performance penalty due to oversubscription.

Page 20: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

20

However, on multi-core Intel Atom processors, use of multithreading is paramount to achieving maximum performance.

Tip: Use the Intel® Software Development Products to help measure performance

and scaling with multithreading. See Section 2.6 for more information.

2.5 Execution Core

Since the front-end can issues two instructions per cycle, the execution cores should

be making forward progress with two instructions whenever possible. The compiler

will handle most of the details with selecting the best instruction ordering.

Several instructions take more than one cycle to complete. In most cases, other multiple cycle instructions can be pipelined with longer instructions. However, single-cycle instructions will block due to the requirements of program order. Divides and

64-bit floating point operations are examples of multi-cycle instructions that do not pipeline well. Multiples are an example of instructions that pipeline well.

Tip: Divide instructions should only be used when absolutely necessary. In Intel®

VTune™ Amplifier XE, “DIV” and “CYCLES_DIV_BUSY” events can be used to determine if divides are a bottleneck in your program.

Tip: Use 32-bit floating point instead of 64-bit floating point whenever possible. 64-

bit instructions take longer to complete and generally can‟t be pipelined as well as 32-bit versions.

Tip: It‟s important to use Intel® Streaming SIMD Extensions (Intel® SSE) for performance critical code that is computationally intense. See “Optimization with

Intel® Streaming SIMD Extensions (Intel® SSE)” for more information.

2.5.1 Optimization with Intel® Streaming SIMD Extensions (Intel® SSE)

All Intel® Atom™ processors support Intel® Streaming SIMD Extensions (Intel® SSE), Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3

(SSSE3). Intel SSE instructions allow the CPU to work on 4 32-bit floating points with a single instruction. This can greatly increase floating point operations per second (FLOPS) and is vital for computationally intense code.

Page 21: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 21

In general, Intel SSE will promote better power efficiency with increased throughput.

Tip: Setting proper compiler options will allow the compiler to automatically general

Intel® SSE instructions:

Compiler Option to Enable Intel® SSE code

generation

Microsoft Visual Studio* Set “Enhanced Instruction Set” to “Streaming SIMD Extensions 2 (/arch:SSE2)”

GCC -mssse3

Intel® C++ Composer XE 2011 for

Windows*

/QxSSE3_ATOM

Intel® C++ Composer XE 2011 for

Linux*

-xSSE3_ATOM

There are multiple ways to use Intel SSE. For developers interested in maximum

control, intrinsics are the best way to utilize Intel SSE. Intrinsics are compiler-specific functions that generate inline highly efficient machine instructions. For developers targeting Microsoft DirectX* on PC or Microsoft Xbox*, the Microsoft XNA* Math Library wraps the use of intrinsics in a library that already supports vectors and matrices.

Tip: There are a few things to keep in mind when utilizing the Microsoft XNA* Math Library. First, be careful accessing individual elements. Getting and setting elements inside an Intel® SSE vector isn‟t free. It‟s best to put data into XMVECTORS and keep

it there as long as possible. Also, make sure you are using properly aligned data.

2.5.2 Optimization for In-order Execution

Instruction scheduling heuristics and coding techniques that apply to out-of-order microarchitectures may not deliver optimal performance on an in-order

microarchitecture. Likewise, instruction scheduling heuristics and coding techniques for an in-order pipeline like Intel® Atom™ microarchitecture may not achieve optimal performance on out-of-order microarchitectures.

Here is an example of where improperly ordered instructions can cause stalls that would otherwise be avoided in an out-of-order processor:

Page 22: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

22

The easiest way to optimize for the in-order nature of Intel Atom processors is to utilize Intel® C++ Composer XE 2011 with the xL option. This option will allow the

compiler to assume that the target system has an in-order processor and aggressive unroll loops.

Tip: For Intel® C++ Composer XE 2011 for Windows*, use the /QxL option to enable optimizations for in-order processors.

Tip: For Intel® C++ Composer XE 2011 for Linux*, use the –xL option to enable optimizations for in-order processors.

Loop unrolling can help find instructions to pair with long-latency operations (e.g.

multi-cycle instructions). For example, issuing multiple long-latency multiple instructions together will generate better throughput. It is worthwhile to investigate loop unrolling in critical sections of your application‟s code.

Page 23: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 23

Tip: Unrolling loops can help pair instructions for execution and pipelining. However, unrolling loops can put more pressure on the front-end. In Intel® VTune™ Amplifier XE, the “ICACHE_MISSES” event can be used to measure if the increase instruction

footprint is being harmful.

2.5.3 64-bit support

It‟s worthwhile to note that 64-bit support is not ubiquitous. To reach the broadest market, target 32-bit whenever possible.

Intel® Atom™ Processor Series

Support for 64-bit

N270, N280 No

N4xxseries , N5xx series Yes

D4xx series, D5xx series Yes

230, 330 Yes

Z5xx No

Z6xx No

CE4100 No

E-series No

2.6 Tools

2.6.1 Intel® Composer XE (Compilers and Libraries)

Intel C++ Composer XE 2011 (formerly the Intel® C++ Compiler) has several flags that are ideal for applications targeting Intel® Atom™ processor-based platforms:

Page 24: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

24

Compiler Platform Details

Microsoft Windows*

/QxSSE3_ATOM SSE_ATOM option allows the compiler to generate Supplemental SSE3 (SSSE3) and MOVBE instructions

/QxL Enables optimization around in-order execution

/fp:fast Allows the compiler to generate faster math code that doesn‟t exactly implement IEEE or ISO rules and specifications for math functions

Linux*

-xSSE3_ATOM SSE_ATOM option allows the compiler to generate

Supplemental SSE3 (SSSE3) and MOVBE instructions

-xL Enables optimization around in-order execution

-fp-model Allows the compiler to generate faster math code

that doesn‟t exactly implement IEEE or ISO rules and specifications for math functions

Intel® Composer XE also includes a set of parallel development mechanisms called the Intel® Parallel Building Blocks. Intel® Threading Building Blocks helps developers

to build performant task-based threading appropriate for games.

For more information on Intel® Composer XE, visit: http://software.intel.com/en-us/articles/intel-composer-xe/.

See Section 8 for a notice about optimizations with Intel® Software Development

Products.

2.6.2 Intel® VTune™ Amplifier XE

Intel® VTune™ Amplifier XE is a great tool for locating bottlenecks of CPU workloads. In addition to general profiling information, it includes several profiling events that are

appropriate for Intel® Atom™ processor-based platforms:

Page 25: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 25

Intel® VTune™ Amplifier XE Event

Details

Clocks per Instruction Retired (CPU_CLK_UNHALTED.THREAD /

INST_RETIRED.ANY)

Measure the average amount of latency per instruction

retired (completed). Ideally, this should be 0.5 because the front end can decode and issue 2 instructions per clock.

DECODE_RESTRICTION Count the number of occurrences in a workload that encountered delays causing reduction of decode throughput. Avoid back-to-back x87 instructions.

BACLEARS Can provide a means to evaluate whether loop unrolling is helping or hurting front end performance.

ICACHE_MISSES Can help evaluate if loop unrolling is increasing the

instruction footprint too much.

BR_MISSP_TYPE_RETIRED Can provide a means to evaluate branch prediction issues due to branch types.

DIV and CYCLES_DIV_BUSY Can provide a means to determine if divides are a

bottleneck.

For more information on VTune™ Amplifier XE, visit: http://www.intel.com/software/products/vtune.

2.6.3 Intel® Graphics Performance Analyzers (Intel® GPA) - Platform Analyzer

Intel® Graphics Performance Analyzer is a tool designed for games development to help profile and analyze Microsoft DirectX* graphic applications. For more information on System Analyzer and Frame Analyzer, see Section 5.

Intel® Graphics Performance Analyzer is a tool designed to visualize the execution

profile of the tasks in a code base on the heterogeneous (CPU+GPU) PC platform over time. This tool collects trace data during the application run to provide detailed analysis of how code executes across all threads, and correlates the CPU work with work being done on the GPU. The tool automatically aligns clocks across all cores in the entire system so that analyze can be done of CPU-based workloads together with

GPU-based workloads on the timeline.

Note: Use Intel® GPA System Analyzer HUD to capture traces. Intel® GPA Platform Analyzer will need to be run on a separate machine for most Intel® Atom™ processor-

based devices and to connect over a network. See Section 5 for more information.

Platform View requires a developer to instrument their code base; this involves marking up areas of the code with a simple API. Once a code base is properly

Page 26: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

26

instrumented, the tool will show performance over time which includes multithreaded support and automatic Microsoft DirectX* driver markup.

For more information on GPA visit: http://software.intel.com/en-us/articles/intel-gpa/.

2.7 Intel® Atom™ Processor-based Platform

Optimizations

2.7.1 Tune for Power

The Intel® Atom™ processor was designed to meet the performance requirements of modern workloads with minimal power consumption to facilitate small form-factor devices. It‟s important to be power-conscious when targeting Intel Atom processor-based platforms.

In general, avoid operations that frequently wake the hardware. For example, avoiding spin waits or hardware polling. Activities that spinning a hard drive or media device (CD/DVD) can use a significant amount of power.

Target a fixed frame rate. It‟s better to let the hardware idle and conserve power

instead of letting the frame rate be uncapped.

Tip: Reduce the number of cycles to complete a task and allow the hardware to sleep

sooner.

Page 27: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 27

The Intel® Laptop Gaming TDK gives an application access to information about

power source and battery life (see Section 2.7.2.1 for more details).

Tip: Save the game when the battery is about to die.

A developer can also use the Windows* Power Management functions. For example, listening for the WM_POWERBROADCAST event allows an application to detect system

suspension, hibernation, closed lid, low battery, and more.

For developers targeting Linux* platforms, see http://lesswatts.org for more tips on optimizing for power.

2.7.2 Tools

2.7.2.1 Intel® Laptop Gaming Technology Development Kit (TDK)

The Intel® Laptop Gaming TDK provides an easy interface to add mobile-aware features to a game. Here are some examples:

Power source: GetPwrSrc() – returns information about power source (battery

or A/C).

GetPercentBatteryLife() – returns the percentage of remaining battery life.

Get80211SignalStrength() – return the network connectivity strength.

The TDK also includes functionality to build a Wi-Fi Ad-Hoc peer-to-peer network.

For more information on the Intel® Laptop Gaming TDK, visit: http://software.intel.com/en-us/articles/intel-laptop-gaming-technology-development-kit/

Page 28: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

28

3 Intel® Atom™ Processor Integrated

Graphics

The latest generation of Intel® Atom™ processors contains an on-board low-power

GPU, designed to provide a satisfying user experience watching HD videos and 3-D games. The Intel Atom processor is powerful enough to play basic 3D games with the

on-board graphics process providing a new level of 3D gaming support.

Figure 2. On-chip graphics architecture of the mobile chipset featuring Intel® Atom™ processor codenamed Pineview and platform controller hub codenamed

Tigerpoint

3.1 Overview

Some versions of the Intel® Atom™ processors contain an on-board GPU. These range in power from the low-end Intel® Graphics Media Accelerator 500 (Intel® GMA 500) series to the current top-end Intel Atom graphics processor, the Intel® Graphics

Media Accelerator 3150 (Intel® GMA 3150). The Intel GMA 500 and Intel GMA 600

are based upon the Imagination Technologies POWERVR* graphics processor, while the Intel GMA 950 and Intel GMA 3150 are based upon the Intel® 945G Express chipset. This variation in graphics power can tend to complicate programming graphics on an Intel Atom processor with integrated graphics, so it‟s important to identify the particular GPU your application is running on and to program to the strengths of each GPU.

Page 29: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 29

The following table lists the Intel Atom processors that have an integrated graphics

process and the GPUs found on those chipsets.

Tip: Validate on a system with Intel® GMA 500/600 graphics and a system with

Intel® GMA 950/3150 graphics. The performance characteristics are different enough to warrant separate validation on these two classes of graphics hardware.

Since the Intel GMA 500/600 and Intel GMA 950/3150 are built on different core

technologies, it‟s important to remember to properly check the Microsoft DirectX* Caps.

3.2 Understanding the Intel® Atom™ Processor 3D

Graphics Systems

It is best to think of Intel® Atom™ processor-based graphics solutions in two categories: Intel® GMA 950/3150 and Intel® GMA 500/600. Unless otherwise stated, all advice in this guide applies to both the Intel GMA 950/3150 and Intel GMA

500/600. The Intel GMA 950/3150 and Intel GMA 500/600 are based on different core technologies, but are both integrated solutions backed by Intel Atom processor-based Intel chipsets, so they share similar characteristics.

Intel GMA 950/3150 has been designed with a deep pipelined architecture, where

performance is maximized by allowing each stage of the pipeline to simultaneously operate on different primitives or portions of the same primitive. The main blocks of the pipeline are the Setup Engine, Rasterizer, Texture Pipeline, and Raster Pipeline. A typical programming sequence would be to send instructions to set the state of the pipeline followed by rending instructions containing 3D primitive vertex data.

Graphics Solution

Intel® Atom™ Processor

Series

Microsoft DirectX* Support

OpenGL* Support

(Microsoft Windows*)

Vertex Processing

Intel® GMA 500

(Section 3.2)

Z5xx DirectX* 9.0c

(Shader Model 2)

OpenGL* 1.1 Hardware

Intel GMA 600 (Section 3.3) Z6xx DirectX* 9.0c

(Shader Model 2)

OpenGL* 1.1 Hardware

Intel GMA 950 (Section 3.4) N2XX DirectX* 9.0c

(Shader Model 2)

OpenGL* 1.4 Software

Intel GMA 3150 (Section 3.5) D4xx/D5xx,

N4xx/N5xx

DirectX* 9.0c (Shader Model 2)

OpenGL* 1.4 Software

Page 30: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

30

Intel GMA 500/600 cores are a tile-based 3D rendering architecture. They feature a

3D graphics engine as well as a 2D graphics engine. Since it is a tile-based architecture, the 3D engine will render and process small sections of a screen (called „tiles‟) to the frame buffer, rather than filling a frame buffer with an entire scene. Sending smaller sections of a scene to the engine permits more consistent utilization

of the graphics hardware and allows for a small internal frame buffer (similar to a cache) which is flushed to an external frame buffer. Traditionally, larger frame buffers have been used, which increases power consumption. The graphics core does internal Z processing which permits better organization of the write operations and eliminates the need for a physical Z buffer, also saving power.

Tip: Target a fixed frame rate. It‟s better to let the hardware idle and conserve power

instead of letting the frame rate be uncapped.

3.3 Intel® Graphics Media Accelerator 950/3150

The Intel® Graphics Media Accelerator 950 (Intel® GMA 950) is an integrated (on-

board) graphic chip on the Mobile Intel® 945G Express chipset for Intel processors. It is a faster clocked version of the Intel GMA 900.

The Intel GMA 3150 is a very low power integrated (shared memory) graphics part

that is located on the processor package (on die with the Intel® Atom™ processor). It features two processor cores clocked at 200 MHz.

Intel GMA 950/3150 are based on a deep pipelined architecture:

Intel GMA 950/3150 do not support hardware vertex processing. They support Microsoft DirectX* 9.0c with Shader Model 2.0 (with software Vertex Shader) and

OpenGL* 1.4. In addition to 3D acceleration, Intel GMA 950/3150 have extensive

hardware to accelerate 2D video.

Page 31: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 31

3.4 Intel® Graphics Media Accelerator 500/600

The Intel® Graphics Media Accelerator 500/600 (Intel® GMA 500/600) are graphics solutions for embedded products (e.g. MIDs), netbooks, and other small mobile devices. They are based on Imagination Technologies POWERVR SGX* cores (e.g.,

POWERVR SGX535*).

Intel GMA 500 is clocked at 100 (UL11L) or 200 MHz (US15L, US15W chipset). Intel GMA 600 has a max clock speed of 400 MHz.

Intel GMA 500/600 cores are a tile-based 3D rendering architecture:

Intel GMA 500/600 do support hardware vertex processing. They support Microsoft

DirectX* 9.0c with Shader Model 2.0 and OpenGL* 1.1. In addition to 3D acceleration, Intel GMA 500/600 both have extensive hardware to accelerate 2D video.

Note: For more information about POWERVR*, check out Imagination Technologies

POWERVR Insider* SDK at: http://www.imgtec.com/powervr/insider/powervr-sdk.asp.

3.5 Graphics API Support

This is the current level of support found in the drivers for Intel® Atom™ processor

GPUs. In addition there are variations in the particular level of support as new features are added in the drivers. You should check the latest drivers to see if there have been any updates to the level of support.

Page 32: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

32

GPU Microsoft DirectX*

Vertex SM

Pixel SM OpenGL* (Microsoft Windows*)

OpenGL* (Linux*)

Intel® GMA

500/600

9.0c

3.0 3.0 1.1 2.0

Intel GMA 950/3510 9.0c 3.0 (SW) 2.0 1.5 2.0

Tip: In general, Intel® GMA 500/600 have a high level of Microsoft DirectX* Capabilities (DX Caps). For applications targeting Intel® GMA 950/3150 and Intel

GMA 500/600, use Intel GMA 950/3150 capabilities as the target functionality. See

Section 4.1.1, 4.2, and 4.3 for more details.

3.6 Detecting GPUs

There is a short sample that demonstrates a way to detect the primary graphics present in a system available on Intel‟s Visual Computing Developer Community:

http://software.intel.com/en-us/articles/gpu-detect-sample. The source code determines the primary graphics device based on the Vendor ID and Device ID.

Tip: Uses proper GPU detection code to automatically set default feature levels.

The source code mentioned above can easily be extended for non-Intel hardware.

Since Intel® GMA 950/3150 and Intel® GMA 500/600 are built on different core technologies, it is important to identify which class of graphics hardware is present.

Page 33: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 33

4 Quick Tips: Graphics Performance

Tuning

4.1 Primitive Processing

Intel® GMA 500/600 supports both Hardware Vertex Processing (HWVP) and Software

Vertex Processing (SWVP). However, Intel GMA 950/3150 only supports Software Vertex Processing.

For some workloads on Intel GMA 500/600, CPU vertex processing may offer even

greater performance enhancements. For this reason, it is recommended to use D3DCREATE_PUREDEVICE during device creation. This allows software processing to be

enabled based on performance that is determined by the specific configuration, workload, and Intel integrated graphics capability. However, Intel GMA 950/3150 does not support D3DCREATE_PUREDEVICE.

Tip: See Section 4.6 for details on how to create a Microsoft DirectX* 9 Device that

will properly use HWVP and SWVP as needed.

4.1.1 Vertex Capabilities

Intel® GMA 950/3150 Intel® GMA 500/600

Max Primitive Count 64K 1.3 million

(64K DX9 limit)

Max Vertex Index 64K 16.7 million

Vertex Processing Software Hardware

4.1.2 Tips On Vertex/Primitive Processing 1. Use IDirect3DDevice9::DrawIndexedPrimitive (DirectX* 9)

a. The vertex cache size will increase over time and can be discovered using

D3DQUERYTYPE_VCACHE.

Page 34: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

34

2. Ensure adequate batching of primitives to amortize runtime and driver overhead.

a. Maximize batch sizes, in general bigger is better.

b. Minimize render state changes between batches to reduce the number of

pipeline flushes.

c. Use instancing to enable better vertex throughput, especially for small batch

sizes. This also minimizes state changes and Draw calls.

3. Use static vertex buffers as much as possible.

4. Do as much CPU side clipping as possible. Use visibility tests to reject objects that

fall outside the view frustum to reduce the impact of objects that are not visible.

a. Set D3DRS_CLIPPING to FALSE for objects that do not need clipping.

5. For Intel GMA 500/600, it is more important to sort by state than sort by distance

from the camera.

4.2 Shader Capabilities

Intel® GMA 950/3150

Intel® GMA 500/600

Vertex Shader Model 2.0 (Software) 3.0

Pixel Shader Format 2.0 3.0

Dynamic Flow Control No Yes

Predication No Yes

Number Instruction Slots (Pixel Shader)

96 512

Number of Temporary Registers

12 32

Page 35: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 35

4.2.1 Tips on Shader Capabilities

1. Use programmable shaders over fixed functions as much as possible. For

example, use shader-based fog instead of fixed function fog.

2. Do not use dynamic flow control or predication. Intel® GMA 950/3150 do not

support these features, and they can quickly be a bottleneck on Intel GMA

500/600. Static flow control, such as execution depending on uniform variables, is

supported but make sure to validate performance.

3. Favor per vertex calculations over per pixel calculations. For example, use per

vertex lighting instead of per pixel lighting.

4. Keep pixel shaders as short and simple as possible.

5. Balance texture samples and shader complexity.

a. Texture samples are executed in parallel to shader execution. For best result,

have a high ration of ALU instructions (math operations) per texture sample.

b. Although large shaders can be supported via cache structure, it is important to

be aware of limited number registers that are available, and running out of

these can drop the efficiency of the execution units.

6. Space texture sampling calls away from where they are used in pixel shaders

when possible and practical.

7. Optimize your shader performance by adequate use of your integrated graphics:

a. Reduce the use of macro/transcendental functions where possible.

Instructions like LOG, LIT, ARL, POW, EXP, INV, RSQ, SQRT, SIN, COS,

SINCOS, etc are more expensive, particularly for full screen effects.

b. In general, use full precision for non-transcendental instructions.

8. The following common shader effects typically affect performance and should be

tested for performance and optimization. Pay special attention to full screen post

processing affects including per-pixel and multiple pass techniques when

evaluating graphics related performance bottlenecks.

a. Glow/Bloom

b. Motion Blur

c. Depth of Field

d. HDR/Tone Mapping

e. Heat Distortion

f. Atmospheric Effects

g. Dynamic Ambient Occlusion

Page 36: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

36

4.3 Texture Sample and Pixel Operations

Gfx Arch Intel® GMA 950/3150 Intel® GMA 500/600

Format Support 16/32-bit fixed point

16/32-bit fixed point 16/32-bit floating point operations

Max # of Samples Up to 8 Up to 8

Vertex Textures No (needs Shader Model 3) No (needs Shader Model 3)

Max 2D/3D/Cube

Textures Dimension

2K/256/512 4K/4K/512

Filtering Type Support Bilinear, Trilinear, and Anisotropic

(max 4) Bilinear, Trilinear, and Anisotropic

(max 16)

Texture Compression DX9: DXT1/3/5 DX9: DXT1/3/5

Non Power of 2 Textures Yes Yes

Render to Texture Yes Yes

Multi-Sample Render (MSAA) No No

Multi-Target Render No Max=4

Max Texture Dimension 2048 4096

4.3.1 Tips on Texture Sampling / Pixel Operations 1. Use compressed textures and mipmaps.

2. Minimize the use of large textures even though the architecture supports up to

2K×2K. For optimal performance, use texture sizes that are 256x256 or less.

Page 37: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 37

3. Minimize the use of Trilinear and Anisotropic Filtering

a. Utilize a type of filtering based on the usage in a scene rather than using it

everywhere.

4. Do not use floating point textures. Intel® GMA 950/3150 do not support these

features, and floating point textures can quickly be a bottleneck on Intel GMA

500/600.

5. Minimize the number of Clear calls.

a. Clear surfaces, color and Z/Stencil buffer at the same time when required.

6. Minimize lock/blit of Z and/or stencil buffer to minimize bandwidth impact.

7. On Intel GMA 950/3150, utilize shadow maps instead of stencil shadows as they

are fill-intensive.

8. Multi-texture rendering is better than multi-pass rendering since multi-texture

rendering reduces state changes, driver overhead, and CPU load. In addition, Intel

integrated graphics utilizes main system memory for graphics. The intermediate

pixels computed in a multi-pass rendering need to be transported back to main

memory and then back to the graphics subsystem when needed again, causing a

full round trip over the bus per render target for each pass.

Tip: For Intel® GMA 950/3150, if the “Texture 2x2” state override in the Intel® GPA System Analyzer HUD shows a significant performance increase, the texture samplers

are likely a bottleneck. See Section 5 for more details.

4.4 Managing Constants on Microsoft DirectX*

Constants are external variables passed as parameters to the shaders; their values remain “constant” during each invocation of the shader program. Despite their name, constants are one of the most frequently changing values in a Microsoft DirectX*

application. A shader program can initialize a constant variable statically to a value in the shader file or at runtime through the application.

Most of the recommendations described here are not completely new and may have been described elsewhere. However, it is still very much applicable to Intel integrated

graphics and the recommendations attempt to detail them in a cohesive manner. In

addition to these points, it is worth noting that:

1. The driver optimizes access to the most frequently used constants. Use less than

32 constants to achieve the highest performance gain from this feature. Limit the

use of dynamic indexed constants (C[ax], C[r]) as these cannot be optimized by

the driver, causing high latency in shaders. These constants are normally found in

vertex shaders.

2. Higher performance is obtained with local constants over global constants.

Page 38: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

38

3. Immediate constants provide better performance than dynamic indexed constants.

In dynamic indexed constants, the driver cannot determine a prior the index value

and needs to create a full size constant buffer space in memory instead of using

the hardware constant buffer.

4. To take advantage of the optimization, limit the use of global constants and the

use of dynamically indexed constants C[ax] as these skip the Intel integrated

graphics optimization algorithm within the Intel driver.

4.5 Graphics Memory

Integrated graphics will continue to use the Unified Memory Architecture (UMA) and

Dynamic Video Memory Technology (DVMT). As with past integrated graphics solutions, UMA specifies that memory resources can be used for video memory when

needed. DVMT is an enhancement of the UMA concept, where in the optimum amount of memory is allocated for balanced graphics and system performance.

DVMT ensures the most efficient use of available memory - regardless of frame buffer or main memory size - for balanced 2D/3D graphics performance and system

performance. DVMT dynamically responds to system requirements and application's demands, by allocating the proper amount of display, texturing, and buffer memory after the operation system has booted. For example, a 3D application when launched may require more vertex buffer memory to enhance the complexity of objects or more texture memory to enhance the richness of the 3D environment. The operating system views the Intel graphics driver as an application, which uses a high speed mechanism for the graphics controller to communicate directly with system memory

called Direct AGP to request allocation of additional memory for 3D applications, and returns the memory to the operating system when no longer required.

4.5.1 Resource Management

Allocate surfaces in priority order. The render surfaces that will be used most frequently should be allocated first.

The 3D engines‟ performance is dependent on the memory bandwidth available. Systems that have more bandwidth available will outperform systems with less

bandwidth. The engines‟ performance is also dependent on the core clock frequency. The higher the frequency, the more data is processed.

Tip: On Microsoft DirectX* 9, use D3DPOOL_DEFAULT for lockable memory (dynamic

vertex/index buffers).

Tip: On Microsoft DirectX* 9, use D3DPOOL_MANAGED for non-lockable memory

(textures, back buffers, etc).

Page 39: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 39

Tip: Proper texture compression can greatly improve the utilization of memory bandwidth. The texture sample has dedicated hardware for uncompressing known texture formats (DXT1, DXT2, DXT3, etc).

4.5.2 Checking for Available Memory The operating system will manage memory for an application that uses Microsoft DirectX*. Video memory on Intel integrated graphics use dynamically allocated DVMT (Dynamic Video Memory Technology). This means that the graphics memory will be dynamically allocated from main memory as needed.

Developers should consider DVMT memory as “local memory” in addition to any

“dedicated” memory. Memory checks that only supply the available amount of “dedicated” graphics memory do not supply an appropriate number for the integrated graphics. In many software queries for integrated graphics, “Non-Local Video Memory” will show as ZERO (0). That number should not be used to determine “AGP” or “PCI Express” compatibility.

As a result of the dynamic allocation of graphics memory performed by the integrated graphics (based on application requests), you need to ensure that you understand all of the memory that is truly available to the graphics device.

The Microsoft DirectX* SDK (June 2010) includes the VideoMemory sample code which demonstrates 5 commonly used methods to detect the total amount of video memory. Of these tests only GetVideoMemoryViaD3D9 and GetVideoMemoryViaDXGI work properly on Intel GMA 950/3150 and Intel GMA 500/600. All other methods return only the local/dedicated graphics memory and consequently are incorrect for integrated graphics. For more information, see the sample code: http://msdn.microsoft.com/en-us/library/ee419018(v=VS.85).aspx.

4.6 Creating a Microsoft DirectX* 9 Device for

Intel® Atom™ Processor Graphics

The following code shows how to correctly initialize and detect Microsoft DirectX* 9 Software Vertex Processing (SWVP). This sample also shows how to switch to software vertex processing for the devices that support it, and conversely, hardware vertex processing for the devices that support that.

Tip: To determine the available graphics memory, use the GetVideoMemoryViaD3D9

method in the Microsoft DirectX* SDK VideoMemory sample code. GetVideoMemoryViaDXGI also works, but does not have support for Microsoft Windows* XP.

Page 40: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

40

HRESULT hr;

DWORD BehaviorFlags = 0;

IDirect3DDevice9* pDevice = NULL;

UINT nMinRequiredVertexShaderLevel = yourMinimumVSLevel; // i.e.D3DVS_VERSION(3,0)

UINT nMinRequiredPixelShaderLevel = yourMinimumPSLevel; // i.e.D3DPS_VERSION(2,0)

// Clear any vertex processing flags

BehaviorFlags &= ~(D3DCREATE_HARDWARE_VERTEXPROCESSING |

D3DCREATE_MIXED_VERTEXPROCESSING |

D3DCREATE_SOFTWARE_VERTEXPROCESSING);

// We’ll try to get ‘PURE’ hardware device first

BehaviorFlags |= D3DCREATE_PUREDEVICE;

hr = pD3D->CreateDevice(Adapter,

DeviceType,

hFocusWindow,

BehaviorFlags | D3DCREATE_HARDWARE_VERTEXPROCESSING,

pPresentationParameters,

&pDevice);

if(D3D_OK == hr)

{

// NOTE: We’re using pDevice->GetDeviceCaps and not pD3D->GetDeviceCaps

hr = pDevice->GetDeviceCaps(&Caps9);

}

if( (D3D_OK != hr)

|| (Caps9.VertexShaderVersion < nMinRequiredVertexShaderLevel)

|| (Caps9.PixelShaderVersion < nMinRequiredPixelShaderLevel) )

{

// We didn’t get a ‘PURE’ hardware device, so clear the flag.

BehaviorFlags &= ~D3DCREATE_PUREDEVICE;

hr = pD3D->CreateDevice(Adapter,

DeviceType,

hFocusWindow,

BehaviorFlags | D3DCREATE_MIXED_VERTEXPROCESSING,

pPresentationParameters,

&pDevice);

if(D3D_OK == hr)

{

hr = pDevice->GetDeviceCaps(&Caps9);

}

if( (D3D_OK != hr)

|| (Caps9.VertexShaderVersion < nMinRequiredVertexShaderLevel)

|| (Caps9.PixelShaderVersion < nMinRequiredPixelShaderLevel) )

{

hr = pD3D->CreateDevice(Adapter,

DeviceType,

hFocusWindow,

BehaviorFlags |

D3DCREATE_SOFTWARE_VERTEXPROCESSING,

pPresentationParameters,

&pDevice);

if(D3D_OK == hr)

{

pDevice->GetDeviceCaps(&Caps9);

if(Caps9.PixelShaderVersion < nMinRequiredPixelShaderLevel)

{

// Minimum specs for this application are

// higher than this system can handle

// Exit this application gracefully...

pDevice->Release;

Page 41: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 41

pDevice = NULL;

hr = E_FAIL;

}

}

}

}

Page 42: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

42

5 Performance Analysis with Intel®

Graphics Performance Analyzers

The Intel® Graphics Performance Analyzers (Intel® GPA) were created with the goal

of making a great Microsoft DirectX* tool that would provide all the information needed to analyze frame captures and improve graphics performance on Intel graphics hardware.

There are four major components to Intel GPA:

- Intel® GPA Monitor – See Section 5.1

- Intel® GPA System Analyzer HUD (Heads-Up Display) - See Section 5.2

- Intel® GPA Frame Analyzer – See Section 5.3

- Intel® GPA Platform Analyzer – See Section 2.6.3

Intel GPA will work on most Microsoft DirectX* graphics parts including Intel GMA 950/3150. However, at this time Intel GPA does not support tile-based rendering like Intel GMA 500/600. For Intel GMA 500/600, there are tools provided by Imagination Technologies* for POWERVR*-based graphics such as PVRTrace* and PVRTune*.

For more information on Intel GPA, visit: http://software.intel.com/en-

us/articles/intel-gpa/.

5.1 Intel® GPA Monitor

Intel® GPA Monitor connects Intel GPA to an application (locally or on a remote

computer), and enables the configuration of the Intel GPA System Analyzer HUD mode and hot keys.

On most Intel® Atom™ processor-based devices, the analysis tools must be run on a separate machine. The Intel GPA Monitor can be configured to connect to any

Microsoft DirectX* application or launch a specific application.

Note: Intel® GPA does not support Intel® GMA 500/600. For more information

about POWERVR* and PVRTune*, check out the Imagination Technologies POWERVR

Insider* SDK at: http://www.imgtec.com/powervr/insider/powervr-sdk.asp.

Tip: For remote analysis, start Intel® GPA Monitor on both the target machine (e.g. Intel® Atom™ processor-based device) and the host machine. Start the analysis tool

(Intel GPA Frame Analyzer or Intel GPA Platform Analyzer) on the host machine and enter the target machine‟s IP address.

Page 43: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 43

Review the Quick Start Guide included with Intel GPA for more information on using the Intel GPA Monitor.

5.2 Intel® GPA System Analyzer HUD

Intel® GPA System Analyzer HUD (Heads-Up Display) displays application

performance metrics in real time, overlaid on Microsoft DirectX* applications. This tool provides high-level performance profiling of graphics applications, in order to determine whether the application is CPU-bound or GPU-bound. If the application is GPU-bound, there is a hotkey to capture a GPU frame for detailed analysis by the Intel® GPA Frame Analyzer. If the application is CPU-bound, there is a hotkey to

capture a trace file for detailed analysis by the Intel® GPA Platform Analyzer. Press Ctrl-F1 in the HUD to see the hotkey list.

Note: Use Intel® GPA System Analyzer HUD to capture frames. Intel GPA Frame Analyzer will need to be run on a separate machine for more Intel® Atom™

processor-based devices and to connect over a network.

5.3 Intel® GPA Frame Analyzer

Intel® GPA Frame Analyzer provides a detailed view of a captured frame file, which contains all Microsoft DirectX* context used to render the selected 3D frame, as well

as per-draw call/region GPU metrics. This tool provides performance info from applications at the frame level, render target level, and draw call level. It enables detailed analysis and “what if” optimization experiments without the need to recompile or rebuild an application.

Note: Use Intel® GPA System Analyzer HUD to capture frames. Intel GPA Frame

Analyzer will need to be run on a separate machine for most Intel® Atom™ processor-based devices and to connect over a network.

5.4 Diagnosing Performance Bottlenecks

At a very high level, the graphics stack includes a rendering system that takes

polygons, textures, and commands as input to display the resulting picture on an output device.

The graphics stack consists of the CPU, main memory, and the bus which delivers the visual payload of data to the Intel integrated graphics chipset. Several scenarios

involving these components can affect overall performance. Considering that each of these computational systems resides along a highway where data is flowing, the following could occur:

Page 44: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

44

If any of these channels are underutilized, the system may be underperforming in

terms of overall capacity to do more work.

If any of these channels are overutilized, the system may be underperforming in

terms of capacity to keep the data moving fast enough.

For optimal performance, the application should maximize the performance of the graphics subsystem and operate the other channels optimally to keep the graphics subsystem continuously productive with minimal starving or blocking situations.

Tip: For Intel® GMA 950/3150, if the “Disable Draw Calls” override in the Intel® GPA System Analyzer HUD does not show a significant performance increase, the CPU is likely a bottleneck. This could be the application, graphics driver, or both.

If the application is CPU-bound, there is a hotkey to capture a trace file for detailed analysis by the Intel® GPA Platform Analyzer. See Section 2.6.3 for more information.

Tip: If decreasing the screen resolution doesn‟t increase the frame rate, it‟s likely that the application is CPU-bound, vertex processing bound, or limited by fixed function process (e.g. clipping).

If the application is GPU-bound, there is a hotkey to capture a GPU frame for detailed

analysis by the Intel® GPA Frame Analyzer. See Section 5.3 for more information.

There are several overrides available to investigate possible GPU limitations. Here are

just a few suggestions:

Override Significant Frame Rate Increase

No Change

Disable Draw Calls GPU-bound

CPU-bound

(application or driver)

Texture 2x2 Texture sampler or memory

bandwidth bound

--

Simple Pixel Shader Probably pixel shader bound

(possibly from texture sampling) If GPU-bound, investigate

vertex processing or other fixed function processing (e.g.

clipping)

For more information about overrides in Intel GPA System Analyzer HUD, see the

documentation included with Intel GPA.

Page 45: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 45

6 Support Intel® 64 and IA-32 Architectures Software Developer's Manuals:

http://www.intel.com/products/processor/manuals/

Intel‟s integrated graphics chipset development community forum:

http://software.intel.com/en-us/forums/developing-software-for-visual-computing/

Game programming resources:

http://software.intel.com/en-us/visual-computing/

Intel® Software Network:

http://software.intel.com/en-us/

Intel® Software Partner Program:

http://www.intel.com/software/partner/visualcomputing/

Intel® Visual Adrenaline graphics and gaming campaign:

http://www.intel.com/software/visualadrenaline/

Intel® Graphics Performance Analyzers (Intel® GPA):

http://software.intel.com/en-us/articles/intel-gpa/

Intel® Composer XE:

http://software.intel.com/en-us/articles/intel-composer-xe/

Intel® VTune™ Amplifier XE:

http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/

Page 46: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

46

7 References

[1] “Copying and Accessing Resource Data (Direct3D 10)”. Direct3D Programming

Guide. Microsoft DirectX* SDK (November 2008).

[2] “DirectX* Constants Optimizations for Intel Integrated Graphics”. Intel

Software Network, Intel: http://software.intel.com/en-us/articles/directx-

constants-optimizations-for-intel-integrated-graphics/.

Page 47: Intel® Atom™ Processor - Graphics Developer's Guide · 8 Optimization Notice ... This document provides development hints and tips to ensure that your customers will have a great

Intel® Atom™ Processor - Graphics Developer's Guide

How to maximize graphics and game performance on Intel® Atom™ processor-based platforms 47

8 Optimization Notice

Intel® compilers, associated libraries and associated development tools may include

or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are

reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler

Options." Many library routines that are part of Intel® compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code

and other factors, you likely will get extra performance on Intel microprocessors.

Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel®

Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

While Intel believes our compilers and libraries are excellent choices to assist in

obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; please let us know if you find we do not.

(Notice revision #20101101)