cisc 7610 lecture 6 midterm review - michael i...

22
CISC 7610 Lecture 6 Midterm review Topics: Midterm instructions Quick quick review Example/practice questions

Upload: others

Post on 17-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

CISC 7610 Lecture 6Midterm review

Topics:Midterm instructionsQuick quick review

Example/practice questions

Page 2: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Midterm instructions

Page 3: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Midterm instructions

1) This is a closed-book, closed-note exam, except for one 8.5x11” sheet of notes.

2) You may not consult with any other person during the exam.

3) You may not use any communication device or computer during the exam.

4) You have 125 minutes to finish.

5) Write all work on the exam paper. Use reverse side if needed (but clearly indicate this if you do)

6) The exam will be graded out of 100 points, but there are 110 points available, thus there are 10 available bonus points.

7) It is highly recommended that you provide some answer for every question so you can receive partial credit. Unanswered questions will receive 0 points.

Page 4: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Quick quick review

Page 5: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

CISC 7610 Lecture 1Introduction to multimedia databases● Main question

– How can a system process and store multimedia data so that users can find what they are looking for in the future?

● Examples of multimedia databases

– Distribution and interaction for audio, video, and images

– Production and distribution for audio and video

– Surveillance and intelligence for speech, images, video● Issues related to multimedia datatypes

– Content-based queries– Storage of audio and video

Page 6: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

CISC 7610 Lecture 2aReview of relational databases

● Relational database management systems

– uses relational data structures / representation

– has a declarative data manipulation language

● Structured query language (SQL)● Example data modeling problem: music collection

● Entity-relationship diagrams

– draw ER diagram for example problem

– convert ER diagram to schema

– create tables, insert data

– query data

● Normalization: remove redundancy, link with foreign keys

Page 7: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

CISC 7610 Lecture 2bThe beginnings of NoSQL

● Big data: volume, velocity, variety, veracity, value

● Google’s infrastructure: dependable software on commodity hardware; GFS, MapReduce, BigTable

● Hadoop: open-source Google infrastructure; HDFS, MapReduce, HBase

● Sharding: scale by assigning data to different databases based on a key. Several issues make this difficult

● CAP theorem: Consistency, Availability, Partition-tolerance

● Amazon’s dynamo: key-value store, consistent hashing, NWR notation

Page 8: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

CISC 7610 Lecture 3Multimedia data and data formats

● Redundant vs irrelevant information

● Perceptual limits of multimedia data: audio and video

● JPEG encoding of images

– 8x8 blocks, YCbCr, DCT, Quantization, Encoding

● MPEG encoding of audio

– quantization in audio, multiband quantization

– hide quantization noise behind louder sounds

● MPEG and H.264 encoding of video

– I, P, B frames

Page 9: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

CISC 7610 Lecture 3bMetadata and APIs

● Metadata is data about data

● Created automatically when media are recorded– EXIF metadata on JPEG images when captured

– ID3 metadata on MP3 files when created

● Derived by machine perception algorithms running on remote servers via APIs– RESTful API: a web service with well-defined semantics and syntax

for something● On HTTP: Manipulate resources using HTTP verbs, get JSON back

– Various commercial APIs for analyzing images, video, speech in various ways

Page 10: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

CISC 7610 Lecture 4Approaches to multimedia databases

● Graph databases: Neo4j

– Relationships have properties too

– Cypher query language

– Converting to/from relational model

● Document databases: MongoDB

– Like a key-value store, but with self-documenting values

– Like a traditional RDBMS, but more scalable, less mis-match with object-oriented programming

Page 11: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Example/practice questions

Page 12: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Practice question:DB Comparison

RDBMS KV Graph Doc[Specifically] MySQL Dynamo Neo4j MongoDB

Schema heterogeneity

Hierarchical data

Joins

Non-pgmr ad hoc queries

OO Impedance mismatch

Distributed reads

Distributed writes

Multi-master consistency

Multi-master availability

ACID transactions

For each database, rate each characteristic as “high”, “medium”, or “low”

Page 13: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Practice question: Video interaction

● Consider a multimedia database designed to facilitate discussions about videos.

● The database consists of the following entities:– People: have a user name and a date of birth

– Videos: have a date of creation and a user who uploaded them

– Comments: have a date of creation, a user who created them, a video they are responding to, and optionally another comment that they are responding to

● Draw an ER diagram representing these data

Page 14: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Practice question: Video interaction

● Consider a multimedia database designed to facilitate discussions about videos.

● The database consists of the following entities:– People: have a user name and a date of birth

– Videos: have a date of creation and a user who uploaded them

– Comments: have a date of creation, a user who created them, a video they are responding to, and optionally another comment that they are responding to

● Draw an ER diagram representing these data● Discuss the pros and cons of implementing this model in:

– RDBMS, Graph database, Document database, key-value store

Page 15: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Practice question: Video surveillance database

● Consider a database to store surveillance videos from an airport terminal.

● Compression: How does video compression use spatial and temporal coherence to save space?

● Explain I-frames, P-frames, and B-frames in MPEG 2 video compression

● To save space, we want to drop certain frames, which ones should we drop?

Page 16: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Practice question:Image sharing site

● Your server is currently at 10% of its write capacity, but 80% of its read capacity– what would be the most cost-effective way to scale it?

● Your server is getting very popular in china, but images are loading too slowly– what would be the most cost-effective way to scale it?

Page 17: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Short answer

● What is the difference between a declarative and imperative query language? Name one of each

● Describe four issues that make sharding difficult or undesirable to implement

● Describe the CAP theorem from the perspective of a program running on a database node

● Describe different object linking models in a document database. What are their pros and cons?

Page 18: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Map-reduce

● Consider the following documents

● We want to use the Map-reduce framework to identify the number of times each pair of images occurs on the same page

● Perform by hand the following operations:– Map: input 1 page, output key-value pairs

– Shuffle: group k-v pairs with same key together

– Reduce: summarize values for each key

D2img2.jpgimg3.jpgimg4.jpg

D1img1.jpgimg2.jpgimg3.jpg

D3img1.jpgimg3.jpgimg5.jpg

D4img2.jpgimg4.jpgimg6.jpg

Page 19: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Cypher, API, JSONConsider the following JSON docs

{"name": "Brooklyn College",

"leader": "Michelle J. Anderson",

"departments": [

{"name": "Finance"},

{"name": "Special projects"},

{"name": "Enrollment management"},

{"name": "Student affairs"},

{"name": "Academic affairs"}

],

"workers": [

{"name": "Tony Thomas"},

{"name": "Michael Hewitt"}

]

}

{"name": "Academic affairs",

"leader": "Anne Lopes",

"departments": [

{"name": "School of Business"},

{"name": "School of Humanities and Social Sciences"},

{"name": "School of Natural and behavioral sciences"},

{"name": "School of Visual, Media and Performing Arts"}

],

"workers": [

{"name": "Sara Crosby"},

{"name": "Lisa Schwebel"},

{"name": "Sabrina Cerezo"}

]

}

Page 20: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Cypher, API, JSONConsider the following JSON docs

{"name": "Computer and Information Science",

"leader": "Yedidyah Langsam",

"departments": [

],

"workers": [

{"name": "Parikh, Rohit"},

{"name": "Raphan, Theodore"},

{"name": "Arnow, David M."},

{"name": "Augenstein, Moshe"},

{"name": "Bar-Noy, Amotz"},

{"name": "Dexter, Scott D."},

{"name": "Langsam, Yedidyah"},

{"name": "Rudowsky, Ira"},

{"name": "Sokol, Dina"},

{"name": "Tenenbaum, Aaron"},

{"name": "Weiss, Gerald"},

{"name": "Whitlock, Paula"},

{"name": "Yanofsky, Noson S."},

{"name": "Yarmish, Gabriel"},

{"name": "Zhou, Neng-Fa"},

{"name": "Ziegler, Chaim"},

{"name": "Cox, James L."},

{"name": "Mandel, Michael"},

{"name": "Schnabolk, Charles"},

{"name": "Thurm, Joseph"},

{"name": "Chen, Hui"},

{"name": "Cogan, Eva"},

{"name": "Halevi, Tzipora"},

{"name": "Kletenik, Devorah"},

{"name": "Levitan, Rebecca"}

]

}

{"name": "School of Natural and behavioral sciences",

"leader": "Kleanthis Psarris",

"departments": [

{"name": "Anthropology and Archaeology"},

{"name": "Biology"},

{"name": "Chemistry"},

{"name": "Computer and Information Science"},

{"name": "Earth and Environmental Sciences"},

{"name": "Health and Nutritional Sciences"},

{"name": "Kinesiology"},

{"name": "Mathematics"},

{"name": "Physics"},

{"name": "Psychology"}

],

"workers": [

{"name": "Crystal Schloss"}

]

}

Page 21: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Cypher, API, JSON

● Write a cypher query to insert all of these documents into a Neo4j database according to the below schema

Page 22: CISC 7610 Lecture 6 Midterm review - Michael I Mandelm.mr-pc.org/t/cisc7610/2018fa/midtermReview.pdf · CISC 7610 Lecture 1 Introduction to multimedia databases Main question –

Cypher, API, JSON

● Write a cypher query to find all of the departments in Brooklyn College

● Write a cypher query to find the path between the Computer and Information Science department and Brooklyn College

● Write a cypher query to find all of the people who work in the School of Natural and Behavioral Sciences

● Write a cypher query to find all of the subordinates to Anne Lopes