data analytics for big data

17
DINAMIC Data Analytics for Big Data Vandana P. Janeja Information Systems Department, Information Systems Department, University of Maryland, University of Maryland, Baltimore County, MD, USA Baltimore County, MD, USA

Upload: gage

Post on 19-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Data Analytics for Big Data. Vandana P. Janeja Information Systems Department, University of Maryland, Baltimore County, MD, USA. Big Data. What is Big Data? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Analytics for Big Data

DINAMIC

Data Analytics for Big Data

Vandana P. Janeja

Information Systems Department, Information Systems Department, University of Maryland, University of Maryland,

Baltimore County, MD, USABaltimore County, MD, USA

Page 2: Data Analytics for Big Data

DINAMIC

Big Data

• What is Big Data?• Recently much good science, whether physical,

biological, or social, has been forced to confront - and has often benefited from - the Big Data phenomenon.

• Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology. (p. 115)

Diebold, F.X. (2003), \Big Data Dynamic Factor Models for Macroeconomic Measurementand Forecasting: A Discussion of the Papers by Reichlin and Watson," In M. Dewa-tripont, L.P. Hansen and S. Turnovsky (eds.), Advances in Economics and Econometrics:Theory and Applications, Eighth World Congress of the Econometric Society, CambridgeUniversity Press, 115-122

Page 3: Data Analytics for Big Data

DINAMIC

Big data spans four dimensions:

Volume, Velocity, Variety, and Veracity

Page 4: Data Analytics for Big Data

DINAMIC

• Volume: Enterprises are awash with ever-growing data of all types, – Terabytes-petabytes-exabytes—of

information.– Turn 12 terabytes of Tweets created each

day into improved product sentiment analysis

– Convert 350 billion annual meter readings to better predict power consumption

Page 5: Data Analytics for Big Data

DINAMIC

• Velocity: Sometimes 2 minutes is too late. – For time-sensitive processes such as

catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.

– Scrutinize 5 million trade events created each day to identify potential fraud

– Analyze 500 million daily call detail records in real-time to predict customer churn faster

Page 6: Data Analytics for Big Data

DINAMIC • Variety: Big data is any type of data - structured and unstructured data – text, sensor data, audio, video, click streams,

log files and more. New insights are found when analyzing these data types together.

– Monitor 100’s of live video feeds from surveillance cameras to target points of interest

– Exploit the 80% data growth in images, video and documents to improve customer satisfaction

Page 7: Data Analytics for Big Data

DINAMIC

• Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. – How can you act upon information if you

don’t trust it? – Establishing trust in big data presents a

huge challenge as the variety and number of sources grows.

Page 8: Data Analytics for Big Data

DINAMIC

Analytics

Page 9: Data Analytics for Big Data

DINAMIC

Is it all about algorithms

Page 10: Data Analytics for Big Data

DINAMIC

Page 11: Data Analytics for Big Data

DINAMIC

Will it make a difference if some of this data is from France and some from Maryland ?

Will it make a difference if some of this data is from LA and some from Baltimore ?

Will it make a difference if some of this data is from Maryland and some from D.C ?

Will it make a difference if some of this data is from Howard County, MD and some from Montgomery County, MD ?

Page 12: Data Analytics for Big Data

DINAMICUS HIGHWAYS• 42,000 Americans Are Killed

On Highways Each Year• Nearly one-third of all fatal

crashes each year are caused by substandard road conditions and roadside hazards. 

• Motor vehicle crashes cost the United States $231 billion annually, including $21 billion from Federal and State tax revenue.

• Americans Waste $67 Billion Each Year Due To Congestion

Ref: http://www.house.gov/transportation/press/press2005/release9.html

According to the 2001 statistics, NJ ranks 12 in intersection fatalities with 32.1% of all state highway fatalities, and ranks 12 in pedestrian fatalities with 17.7% of all state highway fatalities (USDOT)

Page 13: Data Analytics for Big Data

DINAMIC

LA Times 4/27/09 12pm

Page 14: Data Analytics for Big Data

DINAMICDr. William Schaffner, chairman of Preventive Medicine at Vanderbilt University Medical Center in Nashville, Tenn., said doctors like him have been advised by the CDC and state health department to set up a system that would test patients with flu-like symptoms and help define how widespread this outbreak is. He said the severity of the virus is hard to gauge because of the wide discrepancy in how it has affected Mexicans and Americans, and because it is occurring in places that are warm, which is very unusual. "The genetic make up of this virus has influenza experts scratching their heads," he said. "One of the things that has us worried is that could this be a virus that could continue to make mischief during the warmest parts of the year. That would be a big thing. For a respiratory virus to be active during the summer months" would be very unique.

CDC Officials Confirm Swine Flu Cases Up to 40; Outbreak May Worsen : ABC News 2/27/09 1pm

Page 15: Data Analytics for Big Data

DINAMIC

April 21, 2023 Data Mining: Concepts and Techniques

15

Knowledge Discovery (KDD) Process

– Data mining—core of knowledge discovery process

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 16: Data Analytics for Big Data

DINAMIC

Big Data Framework

• Automatic Parallelization• Run-time

– Data partitioning– Task scheduling– Handling machine failures– Managing inter-machine communication

• Completely transparent to the programmer/analyst/user

Page 17: Data Analytics for Big Data

DINAMIC

Relevant IS Courses

• IS 410 Introduction to Database Design • IS 420 Database Application

Development • IS 427  Introduction to Artificial

Intelligence: Concepts and Applications • IS 428 Data Mining Techniques and

Applications • IS 498 Special Topics• Independent studies