openvx, camera and streaminput overview
DESCRIPTION
OpenVX enables performance and power optimized computer vision algorithms for use cases such as face, body and gesture tracking, smart video surveillance, automatic driver assistance systems, object and scene reconstruction, augmented reality, visual inspection, robotics and more. The provisional release of the specification enables developers and implementers to provide feedback before specification finalization, which is expected within six months. OpenVX enables significant implementation innovation while maintaining a consistent API for developers. An OpenVX application expresses vision processing holistically as a graph of function nodes. An OpenVX implementer can optimize graph execution through a wide variety of techniques such as: acceleration of nodes on CPUs, GPUs, DSPs or dedicated hardware, compiler optimizations, node coalescing, and tiled execution to keep sections of processed images in local memories as they flow through the graph. Khronos has released a provisional tiled execution extension alongside the main OpenVX specification to enable user custom kernels to exploit this style of optimization. Additionally, Khronos has released the VXU™ utility library to enable developers using OpenVX to call individual nodes as standalone functions for easy code migration. OpenVX can be used directly by applications or to accelerate higher-level middleware, such as the popular OpenCV open source vision library that is often used for application prototyping. OpenVX will have extensive conformance tests to complement a focused and tightly defined finalized specification for consistent and reliable operation across multiple vendors and platforms making OpenVX an ideal foundation for shipping production vision applications. Finally, as any Khronos specification, OpenVX is extensible to enable nodes to be deployed to meet customer needs, ahead of being integrated into the core specification.TRANSCRIPT
© Copyright Khronos Group 2013 - Page 1
APIs for Vision Acceleration, Camera Control and Sensor Fusion
Neil Trevett Vice President NVIDIA, President Khronos
© Copyright Khronos Group 2013 - Page 2
Khronos Connects Software to Silicon
Open Consortium creating
ROYALTY-FREE, OPEN STANDARD
APIs for hardware acceleration
Defining the roadmap for
low-level silicon interfaces
needed on every platform
Graphics, video, audio, compute,
vision, sensor and
camera processing
Rigorous specifications AND
conformance tests for cross-
vendor portability
Acceleration APIs
BY the Industry
FOR the Industry
Well over a BILLION people use Khronos APIs
Every Day…
© Copyright Khronos Group 2013 - Page 3
Khronos Standards
Visual Computing - 3D Graphics - Heterogeneous Parallel Computing
3D Asset Handling - 3D authoring asset interchange
- 3D asset transmission format with compression
Acceleration in HTML5 - 3D in browser – no Plug-in
- Heterogeneous computing for JavaScript
Camera
Control API
Provisional 1.0
spec released!
Over 100 companies defining royalty-free
APIs to connect software to silicon
Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion
© Copyright Khronos Group 2013 - Page 4
Sensors & Vision Driving Key Mobile Use Cases
Augmented Reality
Natural UI with Face, Body and
Gesture Tracking
Computational Photography and
Videography
3D Scene and Object Reconstruction
Time
© Copyright Khronos Group 2013 - Page 5
Vision Pipeline Challenges and Opportunities
• Light / Proximity
• 2 cameras
• 3 microphones
• Touch
• Position
- GPS
- WiFi (fingerprint)
- Cellular trilateration
- NFC/Bluetooth Beacons
• Accelerometer
• Magnetometer
• Gyroscope
• Pressure / Temp / Humidity
19
Sensor Proliferation Diverse sensor awareness of
the user and surroundings
• Camera sensors >20MPix
• Novel sensor configurations
• Stereo pairs
• Active Structured Light
• Active TOF
• Plenoptic Arrays
Growing Camera Diversity Capturing color, range
and lightfields
Diverse Vision Processors Driving for high performance
and low power
• Camera ISPs
• Dedicated vision IP blocks
• DSPs and DSP arrays
• Programmable GPUs
• Multi-core CPUs
Flexible sensor and camera
control to generate
required image stream
Use best processing available
for image stream processing –
with code portability
Control/fuse vision data
by/with all other sensor data
on device
Camera Control API
© Copyright Khronos Group 2013 - Page 6
OpenVX – Power Efficient Vision Acceleration • Acceleration API for real-time vision
- Focus on mobile and embedded systems
• Enable diverse efficient implementations
- From CPUs, through GPUs and DSPs
to dedicated hardware
• Foundational API for vision acceleration
- Can be used by middleware libraries or
by applications directly
• Complementary to OpenCV
- Which is great for prototyping
• Khronos open source sample implementation
- To be released with final specification
- Sample - not reference - spec remains the
definitive definition of OpenVX operation
Open source sample
implementation
Hardware vendor
implementations
OpenCV open
source library
Other higher-level
CV libraries
Application
© Copyright Khronos Group 2013 - Page 7
OpenVX and OpenCV are Complementary
Governance Community driven open source
with no formal specification Formal specification defined and
implemented by hardware vendors
Conformance No conformance tests for consistency and every vendor implements different subset
Full conformance test suite / process creates a reliable acceleration platform
Portability APIs can vary depending on processor Hardware abstracted for portability
Scope Very wide
1000s of imaging and vision functions Multiple camera APIs/interfaces
Tight focus on hardware accelerated functions for mobile vision Use external camera API
Efficiency Memory-based architecture
Each operation reads and writes memory Graph-based execution
Optimizable computation, data transfer
Use Case Rapid experimentation Production development & deployment
© Copyright Khronos Group 2013 - Page 8
OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency
- Each Node can be implemented in software or accelerated hardware
- Nodes may be fused by the implementation to eliminate memory transfers
- Processing can be tiled to keep data entirely in local memory/cache
• EGLStreams can provide data and event interop with other Khronos APIs
- BUT use of other Khronos APIs are not mandated
• VXU Utility Library for access to single nodes
- Easy way to start using OpenVX by calling each node independently
OpenVX Node
OpenVX Node
OpenVX Node
OpenVX Node
Heterogeneous
Processing
Native
Camera
Control
Example OpenVX Graph
© Copyright Khronos Group 2013 - Page 9
OpenVX 1.0 Function Overview • Core data structures
- Images and Image Pyramids
- Processing Graphs, Kernels, Parameters
• Image Processing
- Arithmetic, Logical, and statistical operations
- Multichannel Color and BitDepth Extraction and Conversion
- 2D Filtering and Morphological operations
- Image Resizing and Warping
• Core Computer Vision
- Pyramid computation
- Integral Image computation
• Feature Extraction and Tracking
- Histogram Computation and Equalization
- Canny Edge Detection
- Harris and FAST Corner detection
- Sparse Optical Flow
Open VX 1.0 defines
Framework for
creating, managing and
executing graphs
Focused set of widely
used functions that are
readily accelerated
Implementers can add
functions as extensions
Widely used extensions
adopted into future
versions of the core
OpenVX Specification
Evolution
© Copyright Khronos Group 2013 - Page 10
Example Graph - Stereo Machine Vision
Camera 1 Compute Depth
Map (User Node)
Detect and track objects (User Node)
Camera 2
Image Pyramid
Stereo Rectify with
Remap
Stereo Rectify with
Remap
Compute Optical Flow
Object
coordinates
OpenVX Graph
Delay
Tiling extension enables user nodes (extensions) to also optimally run in local memory
© Copyright Khronos Group 2013 - Page 11
OpenVX Participants and Timeline • Aiming for specification finalization by mid-2014
• Itseez is working group chair (the convener of OpenCV)
• Qualcomm and TI are specification editors
© Copyright Khronos Group 2013 - Page 12
OpenVX and OpenCL are Complementary
Use Case General Heterogeneous programming Domain targeted - vision processing
Architecture Language-based
– needs online compilation Library-based
- no online compiler required
Target Hardware
‘Exposed’ architected memory model – can impact performance portability
Abstracted node and memory model - diverse implementations can be optimized
for power and performance
Precision Full IEEE floating point mandated Minimal floating point requirements –
optimized for vision operators
Ease of Use General-purpose math libraries with
no built-in vision functions Fully implemented vision operators and
framework ‘out of the box’
© Copyright Khronos Group 2013 - Page 13
Need for Camera Control API • We have APIs for imaging/vision in applications and pre- and post-ISP processing
- E.g. OpenCL can be used to accelerate imaging on CPU, GPU, DSP etc.
• BUT no open standard API for advanced control of ISP and camera subsystem
- ISP is involved in camera control via 3A algorithms
• Need sophisticated image stream generation for advanced imaging & vision apps
- Advanced, high-frequency burst control of camera and sensor operation
- Portable support more types of sensors – including depth sensors
- Tighter system integration – e.g. synch of camera and MEMS sensors
Pre-processing Image Signal
Processor (ISP) Post-processing
Sensor, Color Filter Array
Lens, Flash, Focus, Aperture
Bayer RGB/YUV Image/Vision
Applications
Lens, sensor, aperture control
3A - Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)
Scope of Camera Control API
© Copyright Khronos Group 2013 - Page 14
Advanced Camera Control Use Cases • High-dynamic range (HDR) and computational flash photography
- High-speed burst with individual frame control over exposure and flash
• Subject isolation and depth detection - High-speed burst with individual frame control over focus
• Rolling shutter elimination
- High-precision intra-frame synchronization between camera and motion sensor
• Augmented Reality
- 60Hz, low-latency capture with motion sensor synchronization
- Multiple Region of Interest (ROI) capture
- Synchronized stereo sensors for scene scaling
- Detailed feedback on camera operation per frame
• Time-of-flight or structured light depth camera processing
- Aligned stacking of data from multiple sensors
© Copyright Khronos Group 2013 - Page 15
Khronos Camera API Requirements • Application control over ISP processing (including 3A)
- Including multiple, re-entrant ISPs
• Control multiple sensors with synch and alignment
- E.g. Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras
• Enhanced per frame detailed control
- Format flexibility, Region of Interest (ROI) selection
• Global timing & synchronization
- E.g. Between cameras and MEMS sensors
• Flexible processing/streaming
- Multiple input and output streams with RAW, Bayer or YUV Processing
- Streaming of rows (not just frames)
Enable new camera functionality not available on current platforms
and align with future platform directions for easy adoption
© Copyright Khronos Group 2013 - Page 16
Camera API Design Milestones and Philosophy • C-language API starting from proven designs
- e.g. FCAM
• Design alignment with widely used hardware standards
- e.g. MIPI CSI
• Focus on mobile, power-limited devices
- But do not preclude other use cases such as automotive, surveillance, DSLR…
• Minimize overlap and maximize interoperability with other Khronos APIs
- But other Khronos APIs are not mandated
• Support vendor-specific extensions
Apr13
Jul13
Group charter approved
4Q13
Architectural Design
1Q14
First draft specification
2Q14
Sample implementation
and tests
3Q14
Specification ratification
Working group proposed
© Copyright Khronos Group 2013 - Page 17
Camera API Architecture will be FCAM-based • No global state
- State travels with image requests
- Every stage in the pipeline may have different state
- Enables fast, deterministic state changes
• Synchronize devices
- Lens, flash, sound capture, gyro…
- Devices can schedule Actions
- E.g. to be triggered on exposure change
© Copyright Khronos Group 2013 - Page 18
• Android Exposes Java camera APIs to developers
- Controls underlying Camera HAL
• Camera HAL v1 API simplified basic point and shoot apps
- Difficult or impossible to do much else
• Camera HAL v3 API is a fundamentally different API
- Streams-based to enable more sophisticated camera applications
Potential Adoption on Android
Open source
project developed
by Nokia and
Stanford
Camera API
HAL V3 adopts many
FCAM ideas and can use
EGL in its implementation
Khronos Camera API builds on FCAM with a
goal of being forward compatible with
Android architecture
Khronos Camera API may be used to IMPLEMENT
Android Camera HAL – and provide an advanced
native camera API in NDK
© Copyright Khronos Group 2013 - Page 19
Sensor Industry Fragmentation …
© Copyright Khronos Group 2013 - Page 20
Portable Access to Sensor Fusion
Apps Need Sophisticated Access to Sensor Data
Without coding to specific sensor hardware
Apps request semantic sensor information StreamInput defines possible requests, e.g.
“Provide Skeleton Position” “Am I in an elevator?”
Processing graph provides sensor data stream Utilizes optimized, smart, sensor middleware Apps can gain ‘magical’ situational awareness
Advanced Sensors Everywhere RGB and depth cameras, multi-axis
motion/position, touch and gestures, microphones, wireless controllers, haptics
keyboards, mice, track pads
© Copyright Khronos Group 2013 - Page 21
StreamInput Concepts • Application-defined filtering and conversion
- Set up a graph of nodes to generate required semantics
- Standardized node intercommunication
- Filter nodes modify data from inputs
- Can create virtual input devices
• Node types enable specific data abstractions
- Unicode text node for keyboards,
speech recognition, stylus, etc.
- 3D skeleton node for depth-sensing camera
or a motion-tracking suit …
• Extensibility to any sensor type
- Can define new node data types, state and methods
• Sensor Synchronization
- Universal time stamp on every sample
• Provision of device meta-data
- Including pictorial representations
Input Device
Input Device
Input Device
Filter Node
Filter Node
App Filter Node
Universal Timestamps
Standardized Node Intercommunication
© Copyright Khronos Group 2013 - Page 22
StreamInput Adoption Flexibility • Defines access to high-quality fused sensor stream and context changes
- Implementers can optimize and innovate generation of the sensor stream
OS Sensor OS APIs (E.g. Android SensorManager or
iOS CoreMotion)
Low-level native API defines access to
fused sensor data stream and context-awareness
…
Applications
Sensor Sensor
Sensor
Hub Sensor
Hub
StreamInput implementations
compete on sensor stream quality,
reduced power consumption,
environment triggering and context
detection – enabling sensor
subsystem vendors to increased
ADDED VALUE
Middleware (E.g. Augmented Reality engines,
gaming engines)
Platforms can provide
increased access to
improved sensor data stream
– driving faster, deeper
sensor usage by applications
Middleware engines need platform-
portable access to native, low-level
sensor data stream
Mobile or embedded
platforms without sensor
fusion APIs can provide
direct application access
to StreamInput
© Copyright Khronos Group 2013 - Page 23
Khronos APIs for Augmented Reality
Advanced Camera Control and stream
generation
3D Rendering and Video
Composition
On GPU
Audio
Rendering
Application
on CPUs, GPUs
and DSPs
Sensor
Fusion
Vision
Processing
MEMS
Sensors
Camera Control
API
EGLStream - stream data
between APIs
Precision timestamps
on all sensor samples
AR needs not just advanced sensor processing, vision
acceleration, computation and rendering - but also for
all these subsystems to work efficiently together
© Copyright Khronos Group 2013 - Page 24
Summary • Khronos is building a family of interoperating APIs for portable and
power-efficient vision processing
• OpenVX 1.0 has been provisionally released and non-members are invited to
provide feedback on the forums - http://www.khronos.org/message_boards/forumdisplay.php/110-OpenVX-General
• Khronos Camera and sensor fusion APIs are currently in design and complement
and integrate with OpenVX
• Any company is welcome to join Khronos to influence the direction of mobile
and embedded vision processing!
- $10K annual membership fee for access to all Khronos API working groups
- Well-defined IP framework protects your IP and conformant implementations
• www.khronos.org