cs a490 digital media and interactive...
TRANSCRIPT
August 28, 2013 Sam Siewert
CS A490 Digital Media and Interactive
Systems
Lecture 2 – Hardware and Software Fundamentals
RT Digital Media Systems Embedded Systems
– Set-Top Boxes and IPTV – Mobile Media: Smart Phone, Tablet, eBook Readers, Netbooks, Blue-Ray & DVD
Players, iPODs, etc. – Consumer/Pro-sumer/DVB/DCI Digital Camera Systems (SD, HD, HD-SDI, 2K,
4K, 6K) Resolutions/Formats - http://en.wikipedia.org/wiki/File:Vector_Video_Standards2.svg
– Game Consoles: X-box, PS3, Wii, Nintendo – Mobile Systems and Cloud-based Media Driving Innovation
Scalable Systems (Head-End, Cloud, CDN) – Post Production for Digital Cinema, TV, Web
2K, 4K, 6K Streams from Digital Cameras Frame/Color Editing, CGI (Computer Generated Imagery), Soundtrack, Write to Distribution Media
– Digital Cable Head-Ends: Server 10K+ Customers, Broadcast, On-Demand, Guide Data, DOCSIS Internet, VoIP
– IPTV Head-Ends: Internet, Switched-Digital Video, On-Demand – Web/CDN Viral Video and Social Networking Video/Audio Streaming – Digital Cinema: HD Digital Projectors, 3D Digital Projectors – Cloud – iTunes, Hulu, Netflix, Sony Store, Xfinity, eBooks, GoogleTV – Augmented Reality – Closed Circuit Security Systems: Multi-Camera NTSC/HD
Sam Siewert 2
Old School Media NTSC OTA (1941, 1953 color, 2009 dead) – Analog, Interlaced, Continuous OTA Broadcast Transmission – Tuner with Immediate CRT Display – No Buffers, No Routing, No De-mux – No Compression
Analog Cable AM/FM OTA Film Projectors
Sam Siewert 3
New Digital Media Digital Cable – QAM 256, 30+ Mbps, 10+ MPEG Programs per 6Mhz Channel – Minimal Buffering (In Set-top Box for Digital Tuning and On-Demand) – Dedicated Coaxial RF Carrier (Hybrid Fiber to Coaxial Networks) – On-Demand, Trick-Play, Start-Over – DOCSIS for Internet and Return Path (Streaming Control)
ATSC Digital OTA – Supports HD 1080p or Multiple SD Programs per 6Mhz Channel – Digital Modulation (8VSB) at 19+ Mbps per Channel
Digital Cinema – 1080p, 2K, 4K Resolutions – Automated Digital Delivery and Projection
IPTV, IP Radio and Mobile Media – Routed, Buffered, Compressed – Multiplexed Video/Audio Transport Streams – File Download or Network Streaming – Streaming over UDP or RTP/UDP with RTSP Most Often, No Re-
transmission
Sam Siewert 4
Differences Analog vs Digital Encoding for Transmission – NTSC Frequency Modulation on Channels – Broadband QPSK, QAM, 8VSB OTA – Baseband Packet Switched Networks (Optical, Ethernet)
Routed (Diversely?) Buffered Compressed Multiplexed (Shares Transmission Carrier) Transported by IP (Large Packets) QoS? Continuous Transmission with Instant Tuning vs. Digital Network Streaming vs. Download and Playback (e.g. YouTube) Sam Siewert 5
NTSC (Analog TV)
Sam Siewert 6
AM Video to CRT FM Audio Chroma Added Later Odd/Even Lines (Interlaced) 29.97 FPS (30 before color) Vertical Blanking (CRT Retrace Time, Closed Captioning) 525 Lines, 262.5 per Field, 60 Fields per Second
Sam Siewert 7
MPEG Fundamentals Basic Head-End Broadband MPEG System
PCI
QAM-RF
DVB-ASI
Server
DVB-ASI Analyzer
STBs
QAM-SA IP
Network
SPTS Playback
MPTS Playback
QAM Driver
Control Interface
Video Services
Bit-streams Pre-mux Tools
PRO-1000 Quad
Broadcast VoD
Services
Config & Playlist
Linux in Digital Media Common in Digital Cable Set-Top Boxes Common in Android Mobile Media Used in Digital Video VoD Head-Ends Used in Post Production – After Pre-Production “Filming” on Stage or Location Common for IPTV
Sam Siewert 8
Digital Transport QoS Latency – To Tune in a Program, Turn-on – To Deliver a Video Frame or Audio PCM Sample – To Start, FF, REW, Start-Over, Pause
Bandwidth – Resolution, Lossy/Lossless Compression, High Motion – Pixel Encoding for Color – Frame Rate – Constant Bit-rate Transport? – Variable Bit-rate Transport and Encoding?
Jitter – Decode and Presentation Rates – Elasticity in Decode to Presentation Buffering Necessary
Sam Siewert 9
January 21, 2008 Sam Siewert
Linux System Options
(Linux for Soft Real-time for Interactive and Digital Media Systems)
Sam Siewert 11
Outline Many-Core Linux Host(s)
– Intel Nehalem, Westmere, …, Atom CE – AMD Shanghai Quad/Quad-core – Cavium MIPS64, Tilera, ARM Coretex
Multi-Core Linux with Integrated Graphics – iGPU – dGPU – MICA
GP-GPU Vector Processing PCI-E (NVIDIA Tesla/Fermi, AMD)
Liu and Layland Paper Discussion
– Digital Video and Audio Encoding – Digital Media Capture, Post Production, Delivery, Playback
CPU Scheduling Overview – Scheduling Methods and Classes – Policy, Feasibility – Tuning Execution
NPTL – Native POSIX Threads Library NPTL Example Code Walkthrough
Sam Siewert 12
Conceptual View of RT Resources Three-Space View of Utilization Requirements – CPU Margin? – IO Latency (and Bandwidth)
Margin? – Memory Capacity (and
Latency) Margin?
Upper Right Front Corner – Low-Margin Origin – High-Margin Mobile – Must Consider Battery Life Too (Power)
CPU-Utility
IO-Utility
Memory-Utility
Processing – Initial Focus
Processing and Scaling Frame Transformation, Encode, Decode is Critical Memory for Buffering (Frame Transformations, CPU Integrated or GPU Offloaded – e.g. Linux VDPAU) I/O for Networking (Transport) I/O for Storage (On-Demand, Post, Non-Linear Editing)
Sam Siewert 13
Flynn’s Computer Architecture Taxonomy Single Instruction Multiple Instruction
Single Data SISD (Traditional Uni-processor)
MISD (Voting schemes and active-active controllers)
Multiple Data SIMD (e.g. SSE 4.2, GP-GPU, Vector Processing)
MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)
Sam Siewert 14
GPC has gone MIMD with SIMD Instruction Sets and SIMD Offload (GP-GPU)
NUMA vs. UMA (Trend away from UMA to NUMA or MCH vs. IOH) SMP with One OS (Shared Memory, CPU-balanced Interrupt Handling, Process Load Balancing, Mutli-User, Multi-Application, CPU Affinity Possible)
MIMD - Single Program Multi-Data vs. Multi-Program Multi-Data
Sam Siewert 15
CPU Scheduling Taxonomy Execution Scheduling
Global-MP Local-Uniprocessor
Distributed Asymmetric (AMP )
Symmetric (SMP OS)
Preemptive Non-Preemptive
Fixed-Priority
Hybrid
Dynamic-Priority Cooperative
Batch
FCFS SJN
Co-Routine Continuation Function
Heuristic EDF/LLF RR Timeslice (desktop)
Multi-Frequency Executives
Static Dynamic
Rate Monotonic
Deadline Monotonic
Dataflow
(Preemptive, Non-Preemptive Subtree Under Each Global-MP Leaf)
SMT (Micro-Paralell)
Traditional HRT Shown in GREEN Scalable Interactive and Soft Real-time Shown in RED
Sam Siewert 16
A Service Release and Response Ci WCET Input/Output Latency Interference Time
Event Sensed Interrupt Dispatch Preemption Dispatch
Interference
Completion (IO Queued)
Actuation (IO Completion)
Input-Latency Dispatch-Latency
Execution Execution Output-Latency
Time
Response Time = TimeActuation – TimeSensed (From Release to Response)
Sam Siewert 17
Many-Core MIMD Thread Scaling Symmetric MP and NUMA Many-Core Thread Scaling
SMP – Uniform Memory Access Latency, Full Load Balancing NUMA – Non-Uniform Memory Access, Affinity Required Amdahl’s Law
SIMD Vector Instructions
Intel MMX, SSE 1, 2, 3, 4.x Code Generation Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) – http://software.intel.com/en-us/articles/using-intel-streaming-simd-
extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms/
Sam Siewert 18
PSF
Sam Siewert 19
Offload, Co-Proc, Vector Proc
1. GPU (Graphics Processing Units) – Evolved for Consumer CGI and Games
Physics Engines 3D Rendering + Texture (4D Vector Operations) Game Engines and Simulation HD Output: HDMI, HD-SDI, Headless GP-GPU
– Higher End Used for Digital Cinema / Post Production, Broadcast
PNY Quadro FX NVIDIA CUDA for Post
– GP-GPU Being Used to Accelerate Encode, Transcode, Trans-rate,
etc. - http://www.elementaltechnologies.com/ 2. Built-In SIMD Instruction Set Extensions – Intel SSE
GP-GPU, What Is It? Ideal for Large Bitwise, Integer, and Floating Point Vector Math Flynn’s Taxonomy SIMD Architecture often leverages GP-GPU Co-Processors or Cell for MPMD
20
Single Instruction/Prog Multiple Instruction Single Data SISD (Traditional Uni-
processor) MISD (Voting schemes and active-active controllers)
Multiple Data SIMD (SSE 4.2, Vector Processing) SPMD (Single Program Multiple Data), GP-GPU
MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)
SSE – Streaming SIMD Extensions
128-bit registers known as XMM0 through XMM7 Large Operands and Operators (Multi-Word) E.g. 128-bit XOR of Two Operands Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) – E.g. 4 Component Vector addition – 4 Single Precision Pixel Multiply and Accumulate in Single
Instruction
Sam Siewert 21
vec_res.x = v1.x + v2.x; vec_res.y = v1.y + v2.y; vec_res.z = v1.z + v2.z; vec_res.w = v1.w + v2.w; 16 operations to load 2 operands, add, store
movaps xmm0,address-of-v1 addps xmm0,address-of-v2 movaps address-of-vec_res,xmm0 3 SSE operations to load, add, store ;xmm0=v1.w | v1.z | v1.y | v1.x ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x
Scheduling Parallel/Cluster HW MIMD
– OS SMP threading, provides load balancing, affinity operations, routable interrupts (e.g. MSI-X), e.g. NPTL
– RTOS AMP is most often used in Embedded Systems
MPMD – OpenCL, CUDA, DirectCompute (DirectX extension) – Cell BBE Developer’s Kit – Intel OpenMP, Linux Cluster, MPI
Note on OS/CPU Virtualization and Digital Media
– Hypervisors Type 1 - run directly on the host's hardware to control the hardware and to monitor guest operating systems, guest operating system thus runs on another level above the hypervisor (e.g. VMWare ESXi) Type 2 - hypervisors run within a conventional operating system environment. With the hypervisor layer as a distinct second software level, guest operating systems run at the third level above the hardware (e.g. VMWare for Windows)
– Enables Guest OS to Share Resources on System – Typically DM Scales without Virtualization due to Client/Server Workload, but can
Exploit for IT reasons Sam Siewert 22
Sam Siewert 23
Elements of a Scheduling Class Scheduling Policy
– How is Dispatch Decision Made? – Non-Preemptive, Cooperative or Batch (Hard Coded) – Preemptive
Fixed Priority Encoding – Rate Monotonic (Shortest Period Gets Highest Priority) – Deadline Monotonic (Shortest Deadline Gets Highest Priority)
Dynamic Priority - Programmed Priorities – EDF or Deadline Driver - Earliest Deadline Gets Highest Priority, Updated Continuously – LLF (Least Laxity First) – Most Urgent Deadline Gets Highest Priority, Updated Continuously
Heuristic (Fuzzy Logic Scheduler, Heuristically Guided Iterative Repair)
Scheduling Feasibility Determination – Will Schedule Work? – Can a Set of Services Be Scheduled Given:
CPU Resources Available I/O Resources Available Memory Resources Available
– RM LUB (Next Week) – Lechoczky, Sha, Ding Theorem (Next Week) – EDF Feasibility (Several Weeks Away)
Ability to Tune Schedule
– If Actuals Differ From Expected WCET Expected vs. Observed Maximum Release Frequency for a Service – Expected vs. Observed
Sam Siewert 24
Real-Time Service Types Types of Services – Hard Real-Time (Flight Software, Anti-Lock Braking) – Soft Real-Time (Multi-media, Audio, Video, Virtual Reality) – Best Effort (E.g. Desktop Applications) – Isochronal Hard Real-Time (Digital Feedback Control
Systems) – Isochronal Soft Real-Time (Continuous Media, Video,
Audio)
Real-Time Service Types in Terms of Utility – Utility Curve Shows Value/Harm of Response Over Time
From Release Both Before and After Deadline Relative to Release
– Full Utility - Service Performs as Required – Zero Utility- Service is Not Provided
Drop-out Causes No Harm – Negative Utility
Harm to System and/or User and Significant Loss of Assets
Sam Siewert 25
Hard Real-Time Service Utility
Deadline
Utility
Time
Release
100%
0%
After Deadline, Utility is Negative
Sam Siewert 26
Soft Real-Time Service Utility
Deadline
Utility
Time
Release
100%
0%
F(t)
After Deadline, Utility Diminishes According to Some Function F(t)
Sam Siewert 27
Best Effort Service Utility
Deadline Does Not Exist Utility
Time
Release
100%
0%
Sam Siewert 28
Isochronal Hard Real-Time Utility Deadline
Utility
Time
Release
100%
0%
After Deadline, Utility is Negative Before Deadline, Utility is Negative
Sam Siewert 29
Isochronal Soft Real-Time Utility (QoS Digital Media – Requires Buffering)
Deadline
Utility
Time
Release
100%
0%
After Deadline, Utility is < 100% Before Deadline, Utility is < 100%
F(t) F(t)
Sam Siewert 30
How Does NPTL Work? No Thread Manager or M-on-N Mapping – Previous POSIX Threading Model – Manager Becomes Bottleneck – Two-Level Scheduling Not Deterministic – Many Pthreads (M) to N Kernel Threads Still an Issue – O(n) Scheduling for each Manager
Direct Mapping of User to Kernel Thread or 1-to-1 – User Space Pthread Maps Directly onto Kernel Thread (Requires
Root privilege) – Deterministic (Non-Determinism due to Kernel Preemptability
Issues) – O(1) Scheduling
Scheduling Policies Selectable Similar to RTOS Tasking
Sam Siewert 31
Linux NPTL Scheduling Policies Fixed Priority Preemptive – SCHED_FIFO – This is Priority Preemptive – SCHED_RR – This is Fair, but at Kernel Level – SCHED_OTHER – This is OS default and should not be used
POSIX Threads have – Policy (FIFO, RR, OTHER) – Priority (RT min to RT max) – Creation (Fork) – Join (Wait for thread completion at rendezvous) – Synchronization Methods
Semaphores Message Queues
– Asynchronous Communication Methods Signals Queued Signals
POSIX RT Extensions Include – Virtual Timer Services – Signals Tied to Timer Services – Priority Inversion Protection (Availability on Linux TBD)
July 7, 2004 Sam Siewert
NPTL Coding
Code Walk-through
Thread Scheduling Policy
Sam Siewert 33
pthread_attr_init(&rt_sched_attr); pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO); rt_max_prio = sched_get_priority_max(SCHED_FIFO); rt_min_prio = sched_get_priority_min(SCHED_FIFO); rt_param.sched_priority = rt_max_prio-1; rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param); pthread_attr_getscope(&rt_sched_attr, &scope); if(scope == PTHREAD_SCOPE_SYSTEM) printf("PTHREAD SCOPE SYSTEM\n"); else if (scope == PTHREAD_SCOPE_PROCESS) printf("PTHREAD SCOPE PROCESS\n"); else printf("PTHREAD SCOPE UNKNOWN\n");
Thread Creation and Join
Sam Siewert 34
rc = pthread_create(&main_thread, &main_sched_attr, testThread, (void *)0); if (rc) { printf("ERROR; pthread_create() rc is %d\n", rc); perror(NULL); exit(-1); } pthread_join(main_thread, NULL); if(pthread_attr_destroy(&rt_sched_attr) != 0) perror("attr destroy");