2004.10.11 - slide 1is246 - fall 2004 lecture 12: computation is246 multimedia information prof....

48
2004.10.11 - SLIDE 1 IS246 - FALL 2004 Lecture 12: Computation IS246 Multimedia Information Prof. Marc Davis UC Berkeley SIMS Monday and Wednesday 2:00 pm – 3:30 pm Fall 2004 http://www.sims.berkeley.edu/academics/ courses/is246/f04/

Post on 21-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

2004.10.11 - SLIDE 1IS246 - FALL 2004

Lecture 12: Computation

IS246Multimedia Information

Prof. Marc DavisUC Berkeley SIMS

Monday and Wednesday 2:00 pm – 3:30 pmFall 2004

http://www.sims.berkeley.edu/academics/courses/is246/f04/

2004.10.11 - SLIDE 2IS246 - FALL 2004

Today’s Agenda

• Review of Last Time

– Object Lesson Assignment

• Computation

• Discussion Questions

• Action Items for Next Time

2004.10.11 - SLIDE 3IS246 - FALL 2004

Today’s Agenda

• Review of Last Time

– Object Lesson Assignment

• Computation

• Discussion Questions

• Action Items for Next Time

2004.10.11 - SLIDE 4IS246 - FALL 2004

Object Lesson Assignment

• Any questions?

• Any problems?

• Any comments?

2004.10.11 - SLIDE 5IS246 - FALL 2004

Today’s Agenda

• Review of Last Time

– Object Lesson Assignment

• Computation

• Discussion Questions

• Action Items for Next Time

2004.10.11 - SLIDE 6IS246 - FALL 2004

Computation in Intellectual History

• Computation as instrumentality– PCs, PDAs, embedded processors, etc.

• Computation as ideas– Modeling process– Languages for modeling process– Primitives, combination, abstraction– Parameterization– Black boxing functionality– Optimization

2004.10.11 - SLIDE 7IS246 - FALL 2004

Programming As Representation

• There is a structure of formal symbols that can be manipulated according to a precisely defined and well-understood system of rules

• There is a mapping through which the relevant properties of the domain can be represented by symbol structures

• This mapping is systematic in that a community of programmers can agree as to what a given structure represents

2004.10.11 - SLIDE 8IS246 - FALL 2004

Programming As Representation

• There are operations that manipulate the symbols in such a way as to produce veridical results—to derive new structures that represent the domain in such a way that the programmers would find them accurate representations

• Programs can be written that combine these operations to produce desired results

2004.10.11 - SLIDE 9IS246 - FALL 2004

Levels of Representation

• Physical machine– Wires, chips, disks, or pipes, valves, sinks, or tinker

toys

• Logical machine– And-gates, or Or-gates, inverters, etc.

• Abstract machine– Machine instructions for manipulating stored symbols

• High-level language– Java, C++, Lisp, etc. (operator, data structures, etc.)

• Representation language– Represents primitives and operations about a domain

2004.10.11 - SLIDE 10IS246 - FALL 2004

Algorithms and Programming

• Algorithm– A step-by-step description of a procedure to

achieve a desired result

• Programming– Primitives– Means of combination– Means of abstraction

2004.10.11 - SLIDE 11IS246 - FALL 2004

From Algorithms to Programs

• Algorithm– A step-by-step description of a procedure to

achieve a desired result– How can we walk a square?

• Walk forward• Turn• Walk forward• Turn• Walk forward• Turn• Walk forward

2004.10.11 - SLIDE 12IS246 - FALL 2004

LOGO Square Example

• TO SQUARE• FORWARD 5• RIGHT 90• FORWARD 5• RIGHT 90• FORWARD 5• RIGHT 90• FORWARD 5• END

2004.10.11 - SLIDE 13IS246 - FALL 2004

LOGO Square Example

• TO SQUARE :SIZE• FORWARD :SIZE• RIGHT 90• FORWARD :SIZE• RIGHT 90• FORWARD :SIZE• RIGHT 90• FORWARD :SIZE• END

2004.10.11 - SLIDE 14IS246 - FALL 2004

LOGO Square Example

• TO WINDOW :SIZE

• SQUARE :SIZE

• SQUARE :SIZE

• SQUARE :SIZE

• SQUARE :SIZE

• END

2004.10.11 - SLIDE 15IS246 - FALL 2004

AutoBuddy Example

2004.10.11 - SLIDE 16IS246 - FALL 2004

• Movies change from being static data to programs

• Shots are inputs to a program that computes new media based on content representation and functional dependency (US Patents 6,243,087 & 5,969,716)

Central Idea: Movies as Programs

Parser

Parser

Producer

Media

Media

Media

ContentRepresentation

ContentRepresentation

2004.10.11 - SLIDE 17IS246 - FALL 2004

AutoBuddy: A MovieKit

Computerw/ video game

TV

Camera

DriverTV

Gunner

Camera

AutoBuddySoftware

Buddy Driving Movie

Driver and Gunner play Zone Rangers, a driving video game.

They can hear each other but not see each other.

They both see and hear the same video game output.

AutoBuddy films the Driver and

AutoBuddy semi-automatically generates a buddy drivi movie from the Driver and Shooter movies plus a movie of the video game output.

Driver and Gunner play Zone Rangers, a driving video game.

They can hear each other but not see each other.

They both see and hear the same video game output.

AutoBuddy films the Driver and Shooter with digital video cameras.

AutoBuddy semi-automatically generates a buddy driving movie from the Driver and Shooter movies plus a movie of the video game output.

2004.10.11 - SLIDE 18IS246 - FALL 2004

How AutoBuddy Makes the Movie

3 Digital Movies (QuickTime)

Synchronize & Crop

Create Shots

Dialog-based Cutting

Add Credits

• Synchronize movies (AudioStreams)• Find beginning and end of game

• Create 3 new movies: - Driver in car - Gunner in car - both in car• Add parametric special effects

• Driver, Gunner, and Video Game

• Cut between 3 new movies and game video based on who is talking• Cutting rules for continuity editing

• Insert stills of Driver, Gunner

2004.10.11 - SLIDE 19IS246 - FALL 2004

AutoBuddy Dialog-Based Cutting

• AutoBuddy analyzes the Driver and Gunner audio to determine who is speaking at each point in movie

• Produces a stream of speech events with durations and values (Driver, Gunner, both, or neither)

GunnerDriver Pause Both Gunner Pause

time

Gunner

2004.10.11 - SLIDE 20IS246 - FALL 2004

Dialog-Based Cutting

• AutoBuddy uses a set of cutting rules that cut between shots based on patterns of speech events

• Example: if Driver speaks and then Gunner speaks, show Driver and cut to Gunner slightly before Gunner starts to speak

• Example: if there is a long pause between Driver and Gunner speaking, cut to the game video

Input Speech Events:

Output Video Cuts:

Driver Pause

Driver Gunner

Gunner

2004.10.11 - SLIDE 21IS246 - FALL 2004

AutoBuddy Composite Shots

• Driver, Gunner, and Both shots are multi-layer composites

• View out the car rear window is generated video games rear view mirror image– Flipped, scaled, smoothed, and placed

• Back of car is a static image from a 3D model• Images of Driver/Gunner are generated by

background subtraction• Front of car is a static image from a 3D model

2004.10.11 - SLIDE 22IS246 - FALL 2004

AutoBuddy Special Effects

• Car is shaken based on “gas pedal”– Gas pedal parsed from acceleration indicator in game

video– Car and people are shaken 90 degrees out of phase

• Gunfire art added to frames based on game audio– Audio Streams used to detect gunfire in game audio– Able to detect gunfire even when other audio effects

present• Explosions

– Game video analyzed to determine when explosions happen

– Images are lightened and rumbled during explosions

2004.10.11 - SLIDE 23IS246 - FALL 2004

Computation for Designing Artifacts

• Four computational ideas/techniques from Carlo Sequin– Procedural generation– Parameterization– Optimization– Evolutionary power

2004.10.11 - SLIDE 24IS246 - FALL 2004

Procedural Generation

• Rather than creating artifacts directly, the user may design a generating program that will then generate the desired artifact

• The empowering aspect of this approach is that the generating procedure will not just create the one artifact originally desired, but, with minor variations to the program, it can produce many different artifacts that may all fit a specified set of constraints or usage

2004.10.11 - SLIDE 25IS246 - FALL 2004

Parameterization

• For classes of frequently needed artifacts, the procedural generation mentioned above can be captured in a robust and more general program that contains a modest number of parameters that can be easily adjusted by non-programming users

• A judicious selection and coupling of such parameters can enhance the likelihood that any arbitrary combinations of parameters still produce a meaningful output, although it may be far from desirable or optimal with respect to some specific application

• However, the ease of modifying the parameter values and previewing the expected outcome, would allow even novice users to achieve better than average results obtained by un-aided users

2004.10.11 - SLIDE 26IS246 - FALL 2004

Optimization

• Given that the tedium of creating individual artifacts can be greatly reduced by procedural generation, users can explore a far larger space of possibilities than they could if they had to craft each artifact individually

• This allows them to home in on a more optimal solution than they could by building a few prototypes

• If the constraints and goal functions are well understood, then the generating program may contain its own evaluation loop that allows it to explore many options on its own and gradually converge towards a local optimum

2004.10.11 - SLIDE 27IS246 - FALL 2004

Evolutionary Power

• The ease of exploration afforded by the use of procedural generation permits an informed user to more clearly see and locate the boundaries of the paradigm captured in a generating program

• By making these boundaries more visible, it also becomes more obvious to ask what lies beyond

• Often such questions can be answered with a modest re-programming effort that enlarges the scope of the generator

2004.10.11 - SLIDE 28IS246 - FALL 2004

Today’s Agenda

• Review of Last Time

– Object Lesson Assignment

• Computation

• Discussion Questions

• Action Items for Next Time

2004.10.11 - SLIDE 29IS246 - FALL 2004

Discussion Questions

• Prof. Davis on Computation– How can we think about motion pictures and

computation in terms of each other as “representational” systems?

– How could we describe an editing process computationally?

– How would we need to describe video to be able to operate on it computationally, i.e., to program for and/or with it?

2004.10.11 - SLIDE 30IS246 - FALL 2004

Discussion Questions

• Prof. Davis on Computation– How can we think about motion pictures and

computation in terms of each other as “representational” systems?

– How could we describe an editing process computationally?

– How would we need to describe video to be able to operate on it computationally, i.e., to program for and/or with it?

2004.10.11 - SLIDE 31IS246 - FALL 2004

Today’s Agenda

• Review of Last Time

– Object Lesson Assignment

• Computation

• Discussion Questions

• Action Items for Next Time

2004.10.11 - SLIDE 32IS246 - FALL 2004

Readings for Next Week

• Monday 10/18 Computational Media Theory – Course Reader

• Manovich, L. The Language of New Media. The MIT Press, Cambrodge, Massachusetts, 2001; pp. 19-61.

• Bloch, G.R. From Concepts to Film Sequences, Yale University Department of Computer Science, 1987; pp. 1-8.

• Dorai, C. and Venkatesh, S. Computational Media Aesthetics: Finding Meaning Beautiful. IEEE Multimedia, 8 (4); pp. 10-12.

• RECOMMENDED: Davis, M. and Levitt, D. Time-Based Media Processing System (US Patent 6,243,087), Interval Research Corporation, USA, 2001; pp. 1-20.

2004.10.11 - SLIDE 33IS246 - FALL 2004

Readings for Next Week

• Wednesday 10/20 Assignment 3 (Short Media Production) Overview and Ideation– Course Reader

• Zettl, H. Video Basics 3. Wadsworth, Belmont, CA, 2001; pp. 26-42.

– Textbook• Bordwell, D. and Thompson, K. Film Art: An

Introduction. McGraw Hill, New York, 2004; pp. 2-47.

2004.10.11 - SLIDE 34IS246 - FALL 2004

Extras

2004.10.11 - SLIDE 35IS246 - FALL 2004

Universal Turing Machine

• An abstract representation of a computing device– It has a read/write/erase head that scans a (possibly infinite)

one-dimensional (bi-directional) tape divided into squares, each of which can be inscribed with a symbol (e.g., 0 or 1)

– Computation begins with the machine in a given "state” scanning a given square

– It can erase what it finds on the square and print a symbol (e.g., 0 or 1), move to an adjacent square, and go into a new state

– A table of instructions specifies, for each state, what the machine should write, which direction it should move in, and which state it should go into

– The machine’s behavior is determined by three parameters• The state the machine is in• The square it is scanning• A table of instructions

– The table can list only finitely many states, each of which becomes implicitly defined by the role it plays in the table of instructions

2004.10.11 - SLIDE 36IS246 - FALL 2004

von Neumann Machine

• Stored program– Program is no longer external to machine

itself– A “conditional control transfer” permitted the

program sequence to be interrupted and reinitiated at any point

– Computer can modify both its data and programs

• von Neumann machines allowed the construction of practical digital computers

2004.10.11 - SLIDE 37IS246 - FALL 2004

Discussion Questions (Hillis)

• Jeremy Kashnow on Hillis– Gregory Bateson defined information as “the

difference that makes a difference.” What differences make a difference when developing multimedia metadata?

• Are metadata requirements application-specific?• How do we develop a system that is general

enough to be uniform across applications, yet specific enough to be useful?

2004.10.11 - SLIDE 38IS246 - FALL 2004

Discussion Questions (Hillis)

• Jeremy Kashnow on Hillis– What types of multimedia metadata can be

computed algorithmically or heuristically and what types must be created organically (with human intervention?)

• Can all metadata be computed using a universal computing machine if given enough time and memory space, or are there some facets that require a different computing mechanism?

• Are there noncomputable problems in this space?

2004.10.11 - SLIDE 39IS246 - FALL 2004

Discussion Questions (Hillis)

• Melanie Feinberg on Hillis on Computation– Computers understand two things: on and off (or zero

and one, or true and false). These binary data points can be manipulated by very few logical statements (and, or, not). Through successive levels of abstraction, these building blocks can represent more complex structures. For a computer, then, aren't representations of words and pictures, or anything for that matter, essentially similar (as long as a representation and a mapping to a lower level of abstraction actually exists)? What implications does that have for media metadata?

2004.10.11 - SLIDE 40IS246 - FALL 2004

Discussion Questions (Hillis)

• Melanie Feinberg on Hillis on Computation– If, as Saussure says, signs are shaped by

social forces over time, what does this mean regarding the ability of signs to be expressed in a way that computers understand? What does this imply regarding the representation of semes and syntagms?

2004.10.11 - SLIDE 41IS246 - FALL 2004

Discussion Questions (Winograd)

• Anita Wilhelm on Winograd and Flores– Winograd and Flores when talking about

representation of computer systems structures state that “Straightforward mappings (such as simply storing English sentences) raise insuperable problems of effectiveness. The operations for coming to a conclusion are no longer the well-understood operations of arithmetic, but call for some kind of higher-level reasoning.” pg 85.

– With this in mind, how are the basic elements of language, as we have previously talked about different from the basic elements of film, as Eco has depicted (semes, signs, figures). How are they the same?

2004.10.11 - SLIDE 42IS246 - FALL 2004

Discussion Questions (Winograd)

• Anita Wilhelm on Winograd and Flores– Based on these similarities or differences,

then, does it make sense to try to describe film metadata in verbal language form? Or is there a better language system which we could use (such as the iconic language we have mentioned in class)? If so, what characteristics would that language system have to have? What are the problems, if any, with only using a verbal language system?

2004.10.11 - SLIDE 43IS246 - FALL 2004

Discussion Questions (Winograd)

• Anita Wilhelm on Winograd and Flores– Winograd and Flores state about creating programs

which produce visual output that, “One such program produces figures containing circular forms and might be appropriately described as ‘drawing a circle’ even though the concept of circle did not pay a role in the design of is mechanism at any level.”

– With that said, since Metz sees the fundamental element of film to be a frame, would an iconic (pictorial, frame-like) language really be the best descriptor we can achieve? Or is there something even better? If so, what kind of system would be better?

2004.10.11 - SLIDE 44IS246 - FALL 2004

Discussion Questions (Winograd)

• Erick Herrarte on Winograd and Flores– Winograd and Flores mention “The problem is that

representation is in the mind of the beholder. There is nothing in the design of the machine or the operation of the program that depends in any way on the fact that the symbol structures are viewed as representing anything at all.” In the reading, there was little mentioned regarding the graphical user interface of applications. How do advances in user interface design help to bring closer the representations in the mind of the beholder to the intended representation?

2004.10.11 - SLIDE 45IS246 - FALL 2004

Discussion Questions (Winograd)

• Erick Herrarte on Winograd and Flores– Where would Winograd and Flores put the

user interface in their levels of abstraction? Would they create a level above "high-level language"? Can we separate a program from its user interface?

– Do devices such as a cell phone or PDA provide affordances and hints to representations in the domain through their design?

2004.10.11 - SLIDE 46IS246 - FALL 2004

Discussion Questions (Winograd)

• Ana Ramirez on Winograd and Flores– The abstraction boundaries must be crossed

occasionally in order to increase performance. One such case is in systems that must operate in real-time. How does this show up in multimedia systems?

2004.10.11 - SLIDE 47IS246 - FALL 2004

Discussion Questions

• On Winograd and Flores (Ana Ramirez)– Winograd and Flores discuss the importance of abstraction and

separation of concerns in computing. Increased levels of abstraction and separation of concerns result in an increase in programmer productivity and sometimes a decrease in performance. Moore's law has placed an importance on the programmer’s productivity, since computers double in speed every 18 months, most performance problems disappear once the newest processor comes out. We are quickly approaching a point where Moore's law will no longer hold. Will we find a new technology that continues this rapid increase in speed in processor technology or will the emphasis be moved from programmers' productivity back to program performance? How will the role of abstraction and separation of concerns change?

2004.10.11 - SLIDE 48IS246 - FALL 2004

Discussion Questions

• On History of Computation (Catherine Lai)– Look at the pervasive presence of computing in

modern science and technology, has the history of computing established a significant presence in the history of science and technology?

– Is technology the creator of demand or a response to it?

– What role do governments play in fostering and directing technological innovation and development?

– What does the term “computer” mean to you? Have we prematurely united its multiple historical resources into one thing? What about for computing? Is there a dual nature or tripartite structure to it?