mobile graphics (part1)

68
www.crs4.it/vic/ Visual Computing Group Mobile Graphics Marco Agus Marcos Balsa June 2015 1

Upload: crs4-research-center-in-sardinia

Post on 12-Aug-2015

45 views

Category:

Science


7 download

TRANSCRIPT

Page 1: Mobile Graphics (part1)

www.crs4.it/vic/Visual Computing Group

Mobile Graphics

• Marco Agus• Marcos Balsa

June 2015

1

Page 2: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Outline

• Part 1: Evolution of mobile graphics• Part 2: Graphics development for mobile systems• Part 3: Mobile graphics trends and real time

visualization of massive models on mobile systems

2

Page 3: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Part 1

Evolution of mobile graphics

3

Page 4: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile evolution…. in movies

• Motorola DynaTac• Nickname “brick phone”• Weight: over 2 pounds• Cost: thousands of dollars• Battery life: around 35

minutes.

Money never sleeps…..This is your wake-up call, pal… GO TO WORK

Wall Street, 1987Michael Douglas in Gordon Gekko

11

Page 5: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile evolution… in movies

Hello Neo…Do you know who this is?

The Matrix, 1999Laurence Fishburne in Morpheus

• Nokia 8110• Nickname “banana phone”• 145g, display

monochrome, Smart SMS • It costed 1000 eur

12

Page 6: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile evolution… in movies

Skyfall, 2012Daniel Craig in James Bond

• Sony Xperia T• Smartphone Android• Display 4.6” 1280x720 • It costed 600 eur• 13 Mpixel camera +

position sensors

16

Page 7: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile evolution… in movies

Iron Man 3, 2013Robert Downey Jr in Tony Stark

• Future devices?• Transparent and foldable

high resolution screens• Gesture interfaces• Wearable / integrated to

body

17

Page 8: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile evolution... in games• Nokia Snakes

– From 1997 an estimated 350 Mdevices, making it one of the most widely distributed games ever created.

– Installed on Nokia devices until 2007

19

Page 9: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile evolution… in games

• Angry Birds (Rovio)– first released for Apple's

iOS in December 2009– 2 billion downloads across

all platforms– widespread diffusion end

popolarity– Adventure parks (Finland

and Malaysia)

20

Page 10: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile evolution… in games

• Unreal Engine (GDC & Google I/O 2014)– running on an Nvidia Tegra K1

processor– will support Google Tango and

Samsung Gear VR– easy porting of games– sophisticated 3d effects

21

Page 11: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile connectivity evolution

• Bandwidth is doubling every 18 months

• Mobile internet users overcame desktop internet users

• 2017 smartphone traffic expected at 2.7 GB per person per month

22

Page 12: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Displays and User Interface

• Before 2007 – old days– PDA Palm OS/ Windows Pocket / Windows CE – Stylus interaction (touch screens at early stages)

• Touch era– 2007 – iOS /iPhone– 2008 – Android / HTC Dream or G1– Touch-enabled devices (no stylus required)

• Nowadays– Wearables <2”– Smartphones 3-6”– Tablets >7-10”– DLP projectors integrated

23

Page 13: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Chip evolution (1/2)

25

Page 14: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Chip evolution (2/2)

26

Page 15: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

• Modern smartphones (tablets) are compact visual computing powerhouses

• DIFFUSION: more than 4.6 billion mobile phone subscriptions

– [Ellison 2010]

• NETWORKING: High speed internet connection (typical 1GB/month plan)

– 3G - < 0.6-3Mbps ~ 100KB/s - 400KB/s (latency ~ 100-125ms)

– 4G – < 3-10Mbps ~ 400KB/s - 1MB/s (latency ~ 60-70ms)

– 5G - 1Gbps (from 2016?)

• MEMORY: Increasing RAM and storage space

– RAM 1-3GB– Storage 8-64GB

• COMPUTING: Increasing processing power

– CPU 4-8 core @ 2.5Ghz– GPU 72-192 cores (~ALUs)

Scenario

27

Page 16: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Where are we going?

• Powerful devices for acquiring, processing and visualizing information

• Accessibility of information (anybody, any time, anywhere)

• Immense potential (integration of acquisition, processing, visualization, cloud computing, and collaborative tasks)

29

Page 17: Mobile Graphics (part1)

www.crs4.it/vic/Visual Computing Group

Mobile Graphics• Marco Agus• Marcos Balsa

June 2015

Development

30

Page 18: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile Graphics

OS

architecture

programminglanguages

3D APIsIDEs

Heterogeneity

31

Page 19: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile Graphics

OS

architecture

programminglanguages

3D APIs

IDEs

Heterogeneity

32

Page 20: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile Graphics• OS

• Programming Languages

• Architectures

• 3D APIs

• Cross-development

– X86 (x86_64): Intel / AMD– ARM (32/64bit): ARM + (Qualcomm, Samsung,

Apple, NVIDIA,…)– MIPS (32/64 bit): Ingenics, Imagination.

– Android– iOS– Windows Phone– Firefox OS, Ubuntu Phone, Tizen…

– C++– Obj-C / Swift– Java– C# / Silverlight– Html5/JS/CSS

– OpenGL / GL ES– D3D / ANGLE– Metal / Mantle / Vulkan (GL Next)

– Qt– Marmalade / Xamarin / – Muio– Monogame / Shiva3D / Unity / UDK4 / Cocos2d-x

33

Page 21: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Operating Systems

34

Page 22: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Operating Systems

• Linux based (Qt…)– Ubuntu, Tizen, BBOS…

• Web based (HTML5)– ChromeOS, FirefoxOS, WebOS (deceased?)…

• Windows Phone

• iOS (~unix + COCOA)

• Android (JAVA VM)

http://www.theregister.co.uk/2014/08/04/android_beats_ios_for_first_time/

2014

35

Page 23: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Operating Systems• Brief comparison (focus is research here )

– Windows phone 8+ Best IDE – Visual Studio (2013)- Windows development (porting Linux code?)- Market is quite restrictive (100$ /year. – 5 free uploads/year)

- Certification + review- OpenGL support ? Through ANGLE (over D3D)

– iOS+ Best devices ? (homogeneity, at least)+ Good IDE – Xcode + clang- Market is rather restrictive (100$/year)

- Review+ OpenGL support – PowerVR GPUs only

– Android+ Best platform ? (more open, more devices, more flexible ?)- Many IDEs – integration is not great (~tricky)

+ Visual Studio / Eclipse / QtCreator+ With GCC / clang compiler

+ Market is very accessible (25$)+ OpenGL support / OpenCL support

Monetization?

Research?

Monetization??

42

Page 24: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Programming Languages

http://www.tops-int.com/blog/which-programming-languages-are-used-for-web-desktop-and-mobile-apps/

43

Page 25: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Programming Languages

• C/C++– Classic, performance, codebase, control

• Objective C– Bit different style (message based), well-documented API for iOS, mainly

COCOA/iOS

• Java– Android is VM/JIT based, ~portability (API), well-known, extended, codebase

• C#– VM based, ~Java evolution, MONO (Win, Android, iOS)

• Swift– Apple new language, simplicity, performance, easy, LLVM-based compilers

• HTML5/JS– Web technologies, extended, compatibility

• Perl, Python, Ruby, D, GO (Google), Hack (facebook), …– More options, not so popular ?

44

Page 26: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Programming Languages

• C/C++– Classic, performance, codebase, control

• Objective C– Bit different style (message based), well-documented API for iOS, mainly

COCOA/iOS

• Java– Android is VM/JIT based, ~portability (API), well-known, extended, codebase

• C#– VM based, ~Java evolution, MONO (Win, Android, iOS)

• Swift– Apple new language, simplicity, performance, easy, LLVM-based compilers

• HTML5/JS– Web technologies, extended, compatibility

• Perl, Python, Ruby, D, GO (Google), Hack (facebook), …– More options, not so popular ?

Typically light front-end against main platform API programming language,

loading codebase in whatever language through dynamic code loading

45

Page 27: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures

MIPS

ARM

x86

47

Page 28: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures

CPU architecturesX86 – ARM – MIPS

48

Page 29: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures

• x86 (CISC 32/64bit)– Intel Atom Z3740/Z3770

• Bay Trail (2W)– AMD Mullins (not yet in the market)

• 4.5W

-Power consumption+Performance PartOf(desktop class GPU!)+compatibility with old SW ?

49

Page 30: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures

• x86 (CISC 32/64bit)– Intel Atom Z3740/Z3770

• Bay Trail (2W)– AMD Mullins (not yet in the market)

• 4.5W

• ARM– RISC 32/64bit

• MIPS– RISC 32/64bit– Acquired by Imagination, Inc. @2014

-Power consumption+Performance PartOf(desktop class GPU)+compatibility with old SW ?

+Power efficiency+Performance/watt+Smaller area (RISC) lower cost

+demonstrated its capacities on consoles (PS/PS2/PSP/N64/Wii…), also on SGI

50

Page 31: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – RISC vs. CISC but…

• CISC (Complex Instruction Set Computer)– Fast program execution (optimized complex paths)– Complex instructions (i.e. memory-to-memory instructions)

• RISC (Reduced Instruction Set Computer)– Fast instructions (fixed cycles per instruction)– Simple instructions (fixed/reduced cost per instruction)

• FISC (Fast Instruction Set Computer)– Current RISC processors integrate many improvements from

CISC: superscalar, branch prediction, SIMD, out-of-order– Philosophy fixed/reduced cycle count/instr. (SIMD?)– Discussion (Post-RISC):

• http://archive.arstechnica.com/cpu/4q99/risc-cisc/rvc-5.html

51

Page 32: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures

RISC Integrate complex instructions ARM / MIPS

CISC Reduce instruction complexity Intel Atom

MMX/SSE/Out-of-Order

52

Page 33: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – X86

• Intel (32/64 bit)– Competitive with Bay Trail Atom Z3470 ~4W– Pursuing low power consumption instead of performance– GPU: Intel HD graphics for Bay Trail ~ GF 8600M GT | GF210– Present in many tablets (i.e. Surface) with Windows Phone/Android– Present in a few smartphones

• AMD– Not yet competitive in low power > 4W – Good GPU performance (GCN 192 core)– No known smartphone/tablet shipped

• Supported on– Android, Windows Phone, Tizen, Firefox OS, Ubuntu Touch,…

57

Page 34: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – ARM

• ARM Ltd.– RISC processor (32/64 bit) – getting to 64bit ~ 2014/15– IP (intellectual property) – Instruction Set / ref. implementation– CPU / GPU (Mali)

• Licenses (instruction set OR ref. design)– Instruction Set license -> custom made design (SnapDragon,

Hummingbird in iPhone 4 & Galaxy S)• Optimizations (particular paths, improved core freq. control,…)

– Reference design (Cortex A9, Cortex A15, Cortex A53/A57…)

• Licensees (instruction set OR ref. design)– Apple, Qualcomm, Samsung, Nvidia, MediaTek, AMD @<2014…– Few IS licenses, mostly adopting reference design

• Manufacturers– Contracted by Licensees

• GlobalFoundries, United Microelectronics, TSM, and Intel (@2013)

59

Page 35: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures - MIPS

• MIPS– RISC processor (32/64 bit)– IP (intellectual property) – licensing– Recently acquired by Imagination, Inc.– Can provide full solution (SystemOnChip, SoC): wireless/cpu/gpu

• Performance/watt should be comparable to that of ARM• GPU from Imagination have demonstrated its value

– iDevices have always included its PowerVR SGX / Rogue cores– Good integration with CPU and other components on SoC could

provide a very competitive solution (i.e. Qualcomm)

• Supported on– Android, Mer (fork from MeeGo)

• Knowledge from previous HW (PSP, PS, PS2, WII…)– Pretty much the same with ARM HW

61

Page 36: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures

GPU architectures

64

Page 37: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Tessellation & Geom. Proc.

Architectures - GPU

PrimitivesVertex

ProcessingPrimitive Assembly

RasterizationPixel

ProcessingFramebuffer Operations

Vertex Shader

GeometryShader

FragmentShader

TessellationEval./Control

Shader

Image courtesy of: http://rnd.azoft.com/fluid-dynamics-simulation-on-ios/

Simplified OpenGL 3D pipeline (ES 2.0 3.0)

Fixed hardware:Vertex + Pixel Shaders

GPU Unified Shaders:

Vertex/Pixel -- Compute

Desktop

65

Page 38: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures - GPUDevice CPU GPU Gflops Gflops FP32 Cores/ALUs

Samsung Glxy Nexus (2011) TI OMAP 44602-core A9

PowerVR SGX 540 4,8 0,01 8 cores

Adv. chinese dev (2014) MediaTek MT6589T4-core A7

PoverVR SGX 544MP2 9,6 0,03 16 cores

Apple iPhone 5 (2012) Apple A62-core (armv7)

PowerVR SGX 543 MP3 28,8 0,08 48 cores

2013 Android flagship (HTC One, Galaxy S4)

Snapdragon 6004-core Krait 300

Adreno 320 51,2 0,14 64 cores

Avg Tegra 4 device (2014) (4+1)-core A15 GeForce ULP 96,8 0,20 24VS + 48PS= 72 cores

Apple iPad 4 (2012) Apple A6x2-core (armv7)

PowerVR SGX554 MP4 76,8 0,21 128 cores

Apple iPhone 5s (2014) Apple A72-core (armv8)

PowerVR G6430 115,2 0,21 128 cores

2014 Android flagship (Galaxy S5) Samsung Exynos 54-core A7 + 4 core A15

ARM Mali T628 MP6 102,4 0,28 96 cores

2014 Android flagship (Galaxy S5) SnapDragon 8014-core Krait 400

Adreno 330 166,5 0,46 128 cores

Sony PS 3 (2006) PowerPC 1 core + 7 SPE Nvidia G70 228,8 0,63 8VS + 24PS=136 cores

XBOX 360 (2005) 3-core x86_64 ATI R500 Xenos 240,0 0,66 192 cores

MiPad, Project Tango, HTC Volantis (2014)

4+1 cores A15 Nvidia Tegra K1 326 1,00 192 cores

One Plus 2 SnapDragon 81064bit 4xA53 + 4xA57

Adreno 430 324-388 1,0 256 core

Core ~ #ALU/MADDs

68

Page 39: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures - GPUDesktop ~ 2880 cores (GTX780i) ~5000 Gflops

VsMobile ~ 256 cores (Tegra X1) ~512 Gflops @ FP32

PS4 ~ 1840 coresXBOX ONE ~ 1240 cores

69

Page 40: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – GPU

• Immediate Mode Rendering (IMR)• Tile Based Rendering (TBR)• Tile Based Deferred Rendering (TBDR)

70

Page 41: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – GPU

• Inmediate Mode Rendering (IMR)– Geometry is processed in submission order

• High overdraw (shaded pixels can be overwritten)– Buffers are kept in System Memory

• High bandwidth / power / latency– Early-Z helps depending on geometry sorting

• Depth buffer value closer than fragment discard

http://blog.imgtec.com/powervr/understanding-powervr-series5xt-powervr-tbdr-and-architecture-efficiency-part-4

VS FS

71

Page 42: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – GPU

• Tile Based Rendering (TBR)– Rasterizing per-tile (triangles in bins per tile) 16x16, 32x32

• Buffers are kept on-chip memory (GPU) – fast! geometry limit?– Triangles processed in submission order (TB-IMR)

• Overdraw (front-to-back -> early z cull)– Early-Z helps depending on geometry sorting

http://blog.imgtec.com/powervr/understanding-powervr-series5xt-powervr-tbdr-and-architecture-efficiency-part-4

72

Page 43: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – GPU

http://blog.imgtec.com/powervr/understanding-powervr-series5xt-powervr-tbdr-and-architecture-efficiency-part-4

• Tile Based Deferred Rendering (TBDR)– Fragment processing (tex + shade) ~waits for Hidden Surface

Removal• Micro Depth Buffer – depth test before fragment submission

– whole tile 1 frag/pixel • iPAD 2X slower than Desktop GeForce at HSR

(FastMobileShaders_siggraph2011)– Possible to prefetch textures before shading/texturing– Hard to profile!!! ~~~Timing?

Limit: ~100Ktri + complex shader

73

Page 44: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – Power consumption

• Reduce working set Tiling• Optimize bandwidth Deferring• Minimize area/circuitry RISC?

Power consumption by memory access

Courtesy of: Shebanow – HPG 2013 keynote

Shared memory Fight for the bus!

BUT

Less CPU GPU copies! (expctd.)

75

Page 45: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – GPU• General issues

– Shared memory – no memory copy between CPU – GPU !!! • ~70% of memory available for app (GPU reserved memory + OS)

– Precision is relevant• Halving precision ~doubles operations/second (1 FP128 = 8 FP16)• vertex shader (medp/highp) | fragment shader (~lowp for color)

– Overdraw depending on the renderer front-to-back for IMR/TBR• Depth only pass can work on IMR/TBR depending on geometry count

– Texture compression!!! bandwidth, power, performance• Take ~5Gb/s as typical bandwidth on embedded devices (vs. 100Gb/s on desktop)

– Texture mipmapping / compression reduces bandwidth– glReadPixels(), glCopyTexImage(), glTexSubImage() on FBO… Block! Sync!– glDiscardFramebufferEXT() indicate render attachment is done with

76

Page 46: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – GPU

Texture CompressionGL ES Format(bpp) Devices Proprietary Notes

PVRTC >=1Ext RGB (2,4), RGBA(2,4)

PowerVR Imagination Good quality, not extended

S3TC(DXT1/3 & 5) >=1/2Ext RGB(4), RGBA(4,8)

TegraIntel HD

S3 D3D

ATC >=1Ext Adreno AMD Maps to DXT with minor conversion

ETC1 Core in 1Ext in 2

RGB(4) All GLES 1 devices

Free Most extended, only RGB

ETC2/EAC Ext in 2Core in 3

R(4), RG(8), RGB(4,8), RGBA(8)

GLES3 devicesMost GLES2 devices

Free Most extended, good compression (ETC2), compat. ETC1

ASTC >=2Ext Many(0.89 to 8bpp)

Mali / GLES3.1?

Free Includes 3D, various formats and texture types. Not spread

77

Page 47: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures -- GPU

• ASTC (Adaptive Scalable Texture Compression)– ARM general solution for texture compression (open /

complex HW)– 2D / 3D formats (normal, LDR/HDR, luminance, alpha, …)– 128bits per block map 4x4..12x12 pixels & 3^3…6^3– 0.59 bpp on 3D textures with 6^3 pixels per block – ARM Mali GPU T6xx support & next generation GPUs ?– Quality & compression ratio! – free– Wait till GLES3 is expanded --

• ETC2– Core in GLES3 and GL4.3 RGB + RGBA compressed

formats (~ S3TC)

78

Page 48: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Architectures – GPU

• Profiling tools– ARM SDK (ARM) – Windows/Linux

• DS-5 Streamline – ($$) Sw and GPU profiling and debugging

– PowerVR SDK (Imagination) – Linux/Windows/OSX• PVRUniSCo shader analyzer #cycles

– Tegra SDK (NVIDIA) – Linux/Windows/OSX• Tegra System Profiler• NVIDIA PerfHUD ES

– Adreno SDK (Qualcomm) – • Adreno Profiler – Windows only

79

Page 49: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs

80

Page 50: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs

Mantle Direct3D Metal

OpenGL Next 5.0 ?

81

Page 51: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

• Direct 3D– 3D API from M$ for Win OS (XBOX)– ANGLE library provides GL support on top of D3D

• Mantle– AMD 3D API with Low-level access D3D12 | GL_NG

• Metal– Apple 3D API with low-level access

• OpenGL Desktop/ES/WebGL– GL for embedded systems, now in version 3.0

• GLES3.1 ~ GL4.4 (GL_NG/Vulkan is coming…)

3D APIs

82

Page 52: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs

• Direct 3D– Games on Windows (mostly) / XBOX– Define 3D functionality state-of-the-art

• OpenGL typically following• 3D graphic cards highly collaborative• Multithread programming

– Proprietary – closed source – M$– Tested & stable – good support + tools

• Metal– Apple 3D API with low-level access– Much in the way of Mantle?

• buffer & image, command buffers, sync…

– Lean & mean simple + ~flexible

Win &Game research

Mac/iOS future ?

83

Page 53: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs• Mantle

– AMD effort – low level – direct access – 3D API– Direct control of memory (CPU/GPU) – multithreading done well

• User-required synchronization– API calls per frame <3k 100K– Resources: buffer & image – Simplified driver maintenance (vendors)

• High level API/Framework/Engines will be developed – Pipeline state

• shaders + targets (depth/color…) + resources + geometry– Command queues + synchronization

• Compute / Draw / DMA(mem. Copy)– Bindless – shaders can refer to state resources– OpenGL NEXT seems to move into ‘Mantle direction’– Direct 3D 12 already pursuing low-level access

84

Page 54: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs

Pre-compiled pipeline: shaders + resources execute

http://www.slideshare.net/DevCentralAMD/mantle-introducing-a-new-api-for-graphics-amd-at-gdc14

85

Page 55: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs Command queues generated for each processing unit: graphics/compute/memory access

http://www.slideshare.net/DevCentralAMD/mantle-introducing-a-new-api-for-graphics-amd-at-gdc14

86

Page 56: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs

The pipeline defines the association of variables to resource descriptors

http://www.slideshare.net/DevCentralAMD/mantle-introducing-a-new-api-for-graphics-amd-at-gdc14

87

Page 57: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs

• OpenGL (Desktop/ES/WebGL)– Open / research / cross-platform– Lagging in front of D3D Legacy support

• No more FIXED PIPELINE (1992)!! -- scientific visualization…– GLSL (2003)…GL 3.1(2009) deprecation/no fixed pipeline

• Compatibility profile legacy again…(till GL 4)• Core profile

– GLSL shader required– VAO

» group of VBO» we need a base VAO for using VBO!

– Simplifying VBO + GLSL only!

90

Page 58: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs

– OpenGL ES 1.1• Fixed pipeline – no glBegin/End – no GL_POLYGON -- VBO

– OpenGL ES 2 (OpenGL 1.5 + GLSL) ~ GL4.1• No fixed pipeline (shaders mandatory), ETC1 texture compress..

– OpenGL ES 3 ~ GL4.3• Occlusion queries + geometry instancing• 32bit integer/float in GLSL• Core 3D textures, depth textures, ETC2/EAC, many formats…• Uniform Buffer Objects (packed shader parameters)

– OpenGL ES 3.1 ~ GL4.4• Compute shaders (atomics, load/store)• Separate shader objects (reuse)• Indirect draw (shader culling…)• NO geometry/tessellation

91

Page 59: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

3D APIs

• GPGPU– OpenCL

• On Android it is not much loved– Use GPU vendor SDK provided libs

• On iOS is only accepted for system apps– Use old-school GPGPU (fragment shader -> FrameBuffer)

– RenderScript• Google solution for processing using GPU…• Too niche! ~ Android

– Compute shaders• GLES 3.1!!! General solution!!

– DirectCompute on D3D

93

Page 60: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Cross-development

http://www.appian.com/blog/enterprise-mobility-2/are-mobile-platform-choices-limiting-enterprise-process-innovation

94

Page 61: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Cross-development

• C++– QtCreator – Qt 5.3 – free non-

commercial / GPL– Marmalade – free license– Cocos-2dx -- free

• C#– Xamarin – free basic license– Mono for Android, Monotouch for iOS

• HTML5/JS– Appcelerator / Titanium – free dev. license – PhoneGap / Cordova ~ browser view -- free

• COCOA/Objective-C– Marmalade – Juice -- free license

• Ruby– RhoMobile -- free license

95

Page 62: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Cross-development

• C++– QtCreator – Qt 5.3– Marmalade– Cocos-2dx

• C#– Xamarin– Mono for Android, Monotouch for iOS

• HTML5/JS– Appcelerator– PhoneGap / Cordova ~ browser view – Intel XDK (on top of Cordova)– Titanium

• COCOA/Objective-C– Marmalade – Juice

• Ruby– Rho

Many options – no size fits all

Platform API access through framework

96

Page 63: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Cross-development

• C++ use case: QtCreator– Qt (~supports android, iOS, windows phone, linux, windows, mac)– Provides API abstraction for UI, in-app purchases, ~touch input– HOWTO (i.e. android):

• Android SDK• Android NDK (native C++ support, toolchain, libraries, GL, CL…)• Point environment variables ANDROID_SDK, ANDROID_NDK to folders• Qt 5.4 (+QtCreator 3.3) – community edition• Create new android project• Play!

– Notes:• Go for Qt 5.4 (touch events were tricky in previous versions)• Use QOpenGLWidget instead of QGLWidget• Enable touch events on each widget:

– QWidget::setAttribute(Qt::WA_AcceptTouchEvents);

97

Page 64: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Cross-development

• Codebase

Codebase(C/C++, …)

meta-project

CmakeQmakeAutoconfMakefile…

ARMv7Library

ARM64Library

MIPSLibrary

x86Library

iOS

Android

~Manually modified scripts

Internet toolchain (clang, gcc…) iOS, Android

CMAKE OpenCVQMAKE Qt 5.4Autoconf manually modify

99

Libraries:cURL -- httpXml -- doc

DevIL – imageAssimp – 3d loading

Page 65: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Cross-development

• Codebase Internet toolchain (clang, gcc…) iOS, Android

CMAKE OpenCVQMAKE Qt 5.4Autoconf manually modify

100

*Setup envionmentCC= clang –arch armv7 –sysroot $SYSROOT …CPP=clang++ …LD=ld …AR=ar …

*pointing to NDK_DIR/toolchain/$ARCH/bin/ where $ARCH={armv7, x86,…} search for gcc/g++/clang inside NDK directories

*once setup the environment ~[”typically”] most tools work (DEFINES, architecture types, platform supported functions, …)

Needed for!

Page 66: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Cross-development

• 3D framework / engine– HW/platform abstraction abstraction^2

• L1) Use 3D API portability issues! ( HW , OS )– Try using GL ES 2/3/3.1 < GL 2.1/3.3/4.4 (Desktop GLES!!)

• L2) Abstract 3D API (HW , OS )– minimum common function set {D3D, GLES, Metal, Vulkan…}– (~) buffer, image, shader, pipeline (config pkg)

Metal

GL ES

GL desktop

Vulkan

D3D

wrapper

Shader program

Pipeline

Buffer

Image buffer

Take a look at Metal!

WinPhone

Android

iOS

Unix/Linux

Windows

MacOS

101

Application

Page 67: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Cross-development

• WebGL– Based on OpenGL ES 2 (WebGL 2 ES 3)

• Exceptions WGL2(from ES3): MappedBuffers, drawRangeElements, ProgramBinaries

• Performance JS – TypedArrays [Khronos13]

• http://www.khronos.org/registry/typedarray/specs/latest/– asm.js (Mozilla)

• JS used in optimized way (i.e. var v1= v2 |0, ensuring type is int)• TypedArray large arrays memory allocation (pre-reserved )

• Porting C++ code– Emscripten C++ LLVM JS (TypedArrays + asm.js)

102

Page 68: Mobile Graphics (part1)

Marco Agus & Marcos Balsa

Mobile Graphics – Development

• Conclusions– 1) Native + platform UI …

• C++ [any language] LLVM compiler target platform• Platform Framework front-end 1 for each platform• Performance + flexibility• Call native code from platform code (JNI, Object C, …)

– 2) Native through framework …• Qt | Marmalade …• C++ code uses framework API

– Framework API abstracts platform API [N platforms]– BUT less flexible integration ?

– 3) Go web HTML5/JS …• Rewrite or Use Emscripten JS code + WebGL• ~Free portability (chrome / firefox / IE … ?)• BUT performance is 0.5X at most with asm.js

103