cs a490 digital media and interactive...

August 28, 2013 Sam Siewert

CS A490 Digital Media and Interactive

Systems

Lecture 2 – Hardware and Software Fundamentals

RT Digital Media Systems Embedded Systems

– Set-Top Boxes and IPTV – Mobile Media: Smart Phone, Tablet, eBook Readers, Netbooks, Blue-Ray & DVD

Players, iPODs, etc. – Consumer/Pro-sumer/DVB/DCI Digital Camera Systems (SD, HD, HD-SDI, 2K,

4K, 6K) Resolutions/Formats - http://en.wikipedia.org/wiki/File:Vector_Video_Standards2.svg

– Game Consoles: X-box, PS3, Wii, Nintendo – Mobile Systems and Cloud-based Media Driving Innovation

Scalable Systems (Head-End, Cloud, CDN) – Post Production for Digital Cinema, TV, Web

2K, 4K, 6K Streams from Digital Cameras Frame/Color Editing, CGI (Computer Generated Imagery), Soundtrack, Write to Distribution Media

– Digital Cable Head-Ends: Server 10K+ Customers, Broadcast, On-Demand, Guide Data, DOCSIS Internet, VoIP

– IPTV Head-Ends: Internet, Switched-Digital Video, On-Demand – Web/CDN Viral Video and Social Networking Video/Audio Streaming – Digital Cinema: HD Digital Projectors, 3D Digital Projectors – Cloud – iTunes, Hulu, Netflix, Sony Store, Xfinity, eBooks, GoogleTV – Augmented Reality – Closed Circuit Security Systems: Multi-Camera NTSC/HD

Sam Siewert 2

http://en.wikipedia.org/wiki/File:Vector_Video_Standards2.svg

Old School Media NTSC OTA (1941, 1953 color, 2009 dead) – Analog, Interlaced, Continuous OTA Broadcast Transmission – Tuner with Immediate CRT Display – No Buffers, No Routing, No De-mux – No Compression

Analog Cable AM/FM OTA Film Projectors

Sam Siewert 3

New Digital Media Digital Cable – QAM 256, 30+ Mbps, 10+ MPEG Programs per 6Mhz Channel – Minimal Buffering (In Set-top Box for Digital Tuning and On-Demand) – Dedicated Coaxial RF Carrier (Hybrid Fiber to Coaxial Networks) – On-Demand, Trick-Play, Start-Over – DOCSIS for Internet and Return Path (Streaming Control)

ATSC Digital OTA – Supports HD 1080p or Multiple SD Programs per 6Mhz Channel – Digital Modulation (8VSB) at 19+ Mbps per Channel

Digital Cinema – 1080p, 2K, 4K Resolutions – Automated Digital Delivery and Projection

IPTV, IP Radio and Mobile Media – Routed, Buffered, Compressed – Multiplexed Video/Audio Transport Streams – File Download or Network Streaming – Streaming over UDP or RTP/UDP with RTSP Most Often, No Re-

transmission

Sam Siewert 4

Differences Analog vs Digital Encoding for Transmission – NTSC Frequency Modulation on Channels – Broadband QPSK, QAM, 8VSB OTA – Baseband Packet Switched Networks (Optical, Ethernet)

Routed (Diversely?) Buffered Compressed Multiplexed (Shares Transmission Carrier) Transported by IP (Large Packets) QoS? Continuous Transmission with Instant Tuning vs. Digital Network Streaming vs. Download and Playback (e.g. YouTube) Sam Siewert 5

NTSC (Analog TV)

Sam Siewert 6

AM Video to CRT FM Audio Chroma Added Later Odd/Even Lines (Interlaced) 29.97 FPS (30 before color) Vertical Blanking (CRT Retrace Time, Closed Captioning) 525 Lines, 262.5 per Field, 60 Fields per Second

Presenter

Presentation Notes

To reduce the visibility of interference between the chrominance signal and FM sound carrier required a slight reduction of the frame rate from 30 frames per second to 30/1.001 (approximately 29.97) frames per second, and changing the line frequency from 15,750 Hz to 15,750/1.001 Hz (approximately 15,734.26 Hz). Time codes may use a number of frame rates. Common ones are: 24 frame/sec (film, ATSC, 2k, 4k, 6k) 25 frame/sec (PAL (Europe, Uruguay, Argentina), SECAM, DVB, ATSC) 29.97 (30 ÷ 1.001) frame/sec (NTSC American System (US, Canada, Mexico, Colombia, etc.), ATSC, PAL-M (Brazil)) 30 frame/sec (ATSC)

Sam Siewert 7

MPEG Fundamentals Basic Head-End Broadband MPEG System

PCI

QAM-RF

DVB-ASI

Server

DVB-ASI Analyzer

STBs

QAM-SA IP

Network

SPTS Playback

MPTS Playback

QAM Driver

Control Interface

Video Services

Bit-streams Pre-mux Tools

PRO-1000 Quad

Broadcast VoD

Services

Config & Playlist

Linux in Digital Media Common in Digital Cable Set-Top Boxes Common in Android Mobile Media Used in Digital Video VoD Head-Ends Used in Post Production – After Pre-Production “Filming” on Stage or Location Common for IPTV

Sam Siewert 8

Digital Transport QoS Latency – To Tune in a Program, Turn-on – To Deliver a Video Frame or Audio PCM Sample – To Start, FF, REW, Start-Over, Pause

Bandwidth – Resolution, Lossy/Lossless Compression, High Motion – Pixel Encoding for Color – Frame Rate – Constant Bit-rate Transport? – Variable Bit-rate Transport and Encoding?

Jitter – Decode and Presentation Rates – Elasticity in Decode to Presentation Buffering Necessary

Sam Siewert 9

January 21, 2008 Sam Siewert

Linux System Options

(Linux for Soft Real-time for Interactive and Digital Media Systems)

Sam Siewert 11

Outline Many-Core Linux Host(s)

– Intel Nehalem, Westmere, …, Atom CE – AMD Shanghai Quad/Quad-core – Cavium MIPS64, Tilera, ARM Coretex

Multi-Core Linux with Integrated Graphics – iGPU – dGPU – MICA

GP-GPU Vector Processing PCI-E (NVIDIA Tesla/Fermi, AMD)

Liu and Layland Paper Discussion

– Digital Video and Audio Encoding – Digital Media Capture, Post Production, Delivery, Playback

CPU Scheduling Overview – Scheduling Methods and Classes – Policy, Feasibility – Tuning Execution

NPTL – Native POSIX Threads Library NPTL Example Code Walkthrough

Sam Siewert 12

Conceptual View of RT Resources Three-Space View of Utilization Requirements – CPU Margin? – IO Latency (and Bandwidth)

Margin? – Memory Capacity (and

Latency) Margin?

Upper Right Front Corner – Low-Margin Origin – High-Margin Mobile – Must Consider Battery Life Too (Power)

CPU-Utility

IO-Utility

Memory-Utility

Processing – Initial Focus

Processing and Scaling Frame Transformation, Encode, Decode is Critical Memory for Buffering (Frame Transformations, CPU Integrated or GPU Offloaded – e.g. Linux VDPAU) I/O for Networking (Transport) I/O for Storage (On-Demand, Post, Non-Linear Editing)

Sam Siewert 13

Flynn’s Computer Architecture Taxonomy Single Instruction Multiple Instruction

Single Data SISD (Traditional Uni-processor)

MISD (Voting schemes and active-active controllers)

Multiple Data SIMD (e.g. SSE 4.2, GP-GPU, Vector Processing)

MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)

Sam Siewert 14

GPC has gone MIMD with SIMD Instruction Sets and SIMD Offload (GP-GPU)

NUMA vs. UMA (Trend away from UMA to NUMA or MCH vs. IOH) SMP with One OS (Shared Memory, CPU-balanced Interrupt Handling, Process Load Balancing, Mutli-User, Multi-Application, CPU Affinity Possible)

MIMD - Single Program Multi-Data vs. Multi-Program Multi-Data

Sam Siewert 15

CPU Scheduling Taxonomy Execution Scheduling

Global-MP Local-Uniprocessor

Distributed Asymmetric (AMP )

Symmetric (SMP OS)

Preemptive Non-Preemptive

Fixed-Priority

Hybrid

Dynamic-Priority Cooperative

Batch

FCFS SJN

Co-Routine Continuation Function

Heuristic EDF/LLF RR Timeslice (desktop)

Multi-Frequency Executives

Static Dynamic

Rate Monotonic

Deadline Monotonic

Dataflow

(Preemptive, Non-Preemptive Subtree Under Each Global-MP Leaf)

SMT (Micro-Paralell)

Traditional HRT Shown in GREEN Scalable Interactive and Soft Real-time Shown in RED

Sam Siewert 16

A Service Release and Response Ci WCET Input/Output Latency Interference Time

Event Sensed Interrupt Dispatch Preemption Dispatch

Interference

Completion (IO Queued)

Actuation (IO Completion)

Input-Latency Dispatch-Latency

Execution Execution Output-Latency

Time

Response Time = TimeActuation – TimeSensed (From Release to Response)

Sam Siewert 17

Many-Core MIMD Thread Scaling Symmetric MP and NUMA Many-Core Thread Scaling

SMP – Uniform Memory Access Latency, Full Load Balancing NUMA – Non-Uniform Memory Access, Affinity Required Amdahl’s Law

SIMD Vector Instructions

Intel MMX, SSE 1, 2, 3, 4.x Code Generation Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) – http://software.intel.com/en-us/articles/using-intel-streaming-simd-

extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms/

Sam Siewert 18

PSF

http://software.intel.com/en-us/articles/using-intel-streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms/




Sam Siewert 19

Offload, Co-Proc, Vector Proc

1. GPU (Graphics Processing Units) – Evolved for Consumer CGI and Games

Physics Engines 3D Rendering + Texture (4D Vector Operations) Game Engines and Simulation HD Output: HDMI, HD-SDI, Headless GP-GPU

– Higher End Used for Digital Cinema / Post Production, Broadcast

PNY Quadro FX NVIDIA CUDA for Post

– GP-GPU Being Used to Accelerate Encode, Transcode, Trans-rate,

etc. - http://www.elementaltechnologies.com/ 2. Built-In SIMD Instruction Set Extensions – Intel SSE

http://www3.pny.com/Communities/HDBroadcastFilm.aspx




http://www.nvidia.com/object/io_1252562432787.html

http://www.elementaltechnologies.com/

GP-GPU, What Is It? Ideal for Large Bitwise, Integer, and Floating Point Vector Math Flynn’s Taxonomy SIMD Architecture often leverages GP-GPU Co-Processors or Cell for MPMD

20

Single Instruction/Prog Multiple Instruction Single Data SISD (Traditional Uni-

processor) MISD (Voting schemes and active-active controllers)

Multiple Data SIMD (SSE 4.2, Vector Processing) SPMD (Single Program Multiple Data), GP-GPU

MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)

SSE – Streaming SIMD Extensions

128-bit registers known as XMM0 through XMM7 Large Operands and Operators (Multi-Word) E.g. 128-bit XOR of Two Operands Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) – E.g. 4 Component Vector addition – 4 Single Precision Pixel Multiply and Accumulate in Single

Instruction

Sam Siewert 21

vec_res.x = v1.x + v2.x; vec_res.y = v1.y + v2.y; vec_res.z = v1.z + v2.z; vec_res.w = v1.w + v2.w; 16 operations to load 2 operands, add, store

movaps xmm0,address-of-v1 addps xmm0,address-of-v2 movaps address-of-vec_res,xmm0 3 SSE operations to load, add, store ;xmm0=v1.w | v1.z | v1.y | v1.x ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x

Scheduling Parallel/Cluster HW MIMD

– OS SMP threading, provides load balancing, affinity operations, routable interrupts (e.g. MSI-X), e.g. NPTL

– RTOS AMP is most often used in Embedded Systems

MPMD – OpenCL, CUDA, DirectCompute (DirectX extension) – Cell BBE Developer’s Kit – Intel OpenMP, Linux Cluster, MPI

Note on OS/CPU Virtualization and Digital Media

– Hypervisors Type 1 - run directly on the host's hardware to control the hardware and to monitor guest operating systems, guest operating system thus runs on another level above the hypervisor (e.g. VMWare ESXi) Type 2 - hypervisors run within a conventional operating system environment. With the hypervisor layer as a distinct second software level, guest operating systems run at the third level above the hardware (e.g. VMWare for Windows)

– Enables Guest OS to Share Resources on System – Typically DM Scales without Virtualization due to Client/Server Workload, but can

Exploit for IT reasons Sam Siewert 22

Sam Siewert 23

Elements of a Scheduling Class Scheduling Policy

– How is Dispatch Decision Made? – Non-Preemptive, Cooperative or Batch (Hard Coded) – Preemptive

Fixed Priority Encoding – Rate Monotonic (Shortest Period Gets Highest Priority) – Deadline Monotonic (Shortest Deadline Gets Highest Priority)

Dynamic Priority - Programmed Priorities – EDF or Deadline Driver - Earliest Deadline Gets Highest Priority, Updated Continuously – LLF (Least Laxity First) – Most Urgent Deadline Gets Highest Priority, Updated Continuously

Heuristic (Fuzzy Logic Scheduler, Heuristically Guided Iterative Repair)

Scheduling Feasibility Determination – Will Schedule Work? – Can a Set of Services Be Scheduled Given:

CPU Resources Available I/O Resources Available Memory Resources Available

– RM LUB (Next Week) – Lechoczky, Sha, Ding Theorem (Next Week) – EDF Feasibility (Several Weeks Away)

Ability to Tune Schedule

– If Actuals Differ From Expected WCET Expected vs. Observed Maximum Release Frequency for a Service – Expected vs. Observed

Sam Siewert 24

Real-Time Service Types Types of Services – Hard Real-Time (Flight Software, Anti-Lock Braking) – Soft Real-Time (Multi-media, Audio, Video, Virtual Reality) – Best Effort (E.g. Desktop Applications) – Isochronal Hard Real-Time (Digital Feedback Control

Systems) – Isochronal Soft Real-Time (Continuous Media, Video,

Audio)

Real-Time Service Types in Terms of Utility – Utility Curve Shows Value/Harm of Response Over Time

From Release Both Before and After Deadline Relative to Release

– Full Utility - Service Performs as Required – Zero Utility- Service is Not Provided

Drop-out Causes No Harm – Negative Utility

Harm to System and/or User and Significant Loss of Assets

Sam Siewert 25

Hard Real-Time Service Utility

Deadline

Utility

Time

Release

100%

0%

After Deadline, Utility is Negative

Sam Siewert 26

Soft Real-Time Service Utility

Deadline

Utility

Time

Release

100%

0%

F(t)

After Deadline, Utility Diminishes According to Some Function F(t)

Sam Siewert 27

Best Effort Service Utility

Deadline Does Not Exist Utility

Time

Release

100%

0%

Sam Siewert 28

Isochronal Hard Real-Time Utility Deadline

Utility

Time

Release

100%

0%

After Deadline, Utility is Negative Before Deadline, Utility is Negative

Sam Siewert 29

Isochronal Soft Real-Time Utility (QoS Digital Media – Requires Buffering)

Deadline

Utility

Time

Release

100%

0%

After Deadline, Utility is < 100% Before Deadline, Utility is < 100%

F(t) F(t)

Sam Siewert 30

How Does NPTL Work? No Thread Manager or M-on-N Mapping – Previous POSIX Threading Model – Manager Becomes Bottleneck – Two-Level Scheduling Not Deterministic – Many Pthreads (M) to N Kernel Threads Still an Issue – O(n) Scheduling for each Manager

Direct Mapping of User to Kernel Thread or 1-to-1 – User Space Pthread Maps Directly onto Kernel Thread (Requires

Root privilege) – Deterministic (Non-Determinism due to Kernel Preemptability

Issues) – O(1) Scheduling

Scheduling Policies Selectable Similar to RTOS Tasking

Sam Siewert 31

Linux NPTL Scheduling Policies Fixed Priority Preemptive – SCHED_FIFO – This is Priority Preemptive – SCHED_RR – This is Fair, but at Kernel Level – SCHED_OTHER – This is OS default and should not be used

POSIX Threads have – Policy (FIFO, RR, OTHER) – Priority (RT min to RT max) – Creation (Fork) – Join (Wait for thread completion at rendezvous) – Synchronization Methods

Semaphores Message Queues

– Asynchronous Communication Methods Signals Queued Signals

POSIX RT Extensions Include – Virtual Timer Services – Signals Tied to Timer Services – Priority Inversion Protection (Availability on Linux TBD)

Presenter

Presentation Notes

Altenatives to Typical RTOS Scheduling Local, Non-Preemptive, Cooperative Systems Best for A Few Simple Services Hybrid Multi-Frequency Executive Multi-Frequency Executive (i.e. Multiple Dispatch Queues) High Frequency (100 x Low @ 100 Hz) Med Frequency (10 x Low @ 10 Hz) Low Frequency (1 Hz) State Machines Yield and Place Continuation on Queue Global-MP, Static, Asymmetric with Local, Non-Preemptive Cooperative Pipelined Dataflow Systems Credit-Based Dataflow for CPU to CPU Message Queues Credits Provide Flow Control Can Enqueue Messages Until Credit Runs Out, Then Stall Continuation-Passing-Style for Cooperative

July 7, 2004 Sam Siewert

NPTL Coding

Code Walk-through

Thread Scheduling Policy

Sam Siewert 33

pthread_attr_init(&rt_sched_attr); pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO); rt_max_prio = sched_get_priority_max(SCHED_FIFO); rt_min_prio = sched_get_priority_min(SCHED_FIFO); rt_param.sched_priority = rt_max_prio-1; rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param); pthread_attr_getscope(&rt_sched_attr, &scope); if(scope == PTHREAD_SCOPE_SYSTEM) printf("PTHREAD SCOPE SYSTEM\n"); else if (scope == PTHREAD_SCOPE_PROCESS) printf("PTHREAD SCOPE PROCESS\n"); else printf("PTHREAD SCOPE UNKNOWN\n");

Thread Creation and Join

Sam Siewert 34

rc = pthread_create(&main_thread, &main_sched_attr, testThread, (void *)0); if (rc) { printf("ERROR; pthread_create() rc is %d\n", rc); perror(NULL); exit(-1); } pthread_join(main_thread, NULL); if(pthread_attr_destroy(&rt_sched_attr) != 0) perror("attr destroy");

cs a490 digital media and interactive...

Documents