big & open data: challenges for smartcity

27
Big and Open data. Challenges for Smartcity Victoria López Grupo G-TeC www.tecnologiaUCM.es Universidad Complutense de Madrid www.tecnologiaUCM.es http://grasia.fdi.ucm.es ICIST 2014 Valencia 1

Upload: grupo-g-tec

Post on 10-May-2015

375 views

Category:

Technology


2 download

DESCRIPTION

This work is about how both private enterprise and government wish to improve their data value and how they deal with this issue. The talk summarizes the way of thinking about Big Data, Open Data and their use by organizations or individuals. Big Data is explained from collecting, storing, analyzing and put in value. This data is collected from numerous sources including sensor networks, government data holdings, company market databases, and public profiles on social networking sites. Organizations use many data analytical techniques to study both structured and unstructured data. Due to the volume, velocity and variety of data, some specific techniques have been developed. MapReduce, Hadoop and other related as RHadoop are trending topic nowadays. Data which come from government must be open. Every day more and more cities and countries are opening their data. Open Data is then presented as a specific case of public data with a special role in Smartcity. The main goal of Big and Open Data in Smartcity is to develop systems which can be useful for citizens. In this sense RMap (Mapa de Recursos) is shown as an Open Data application, an open system for Madrid City Council, avalaible for smarthphones and totally developed by the researching group G-TeC (www.tecnologiaUCM.es).

TRANSCRIPT

Page 1: Big & Open Data:  Challenges for Smartcity

Big and Open data. Challenges for Smartcity

Victoria LópezGrupo G-TeC

www.tecnologiaUCM.esUniversidad Complutense de Madrid

www.tecnologiaUCM.es http://grasia.fdi.ucm.es

ICIST 2014Valencia

1

Page 2: Big & Open Data:  Challenges for Smartcity

Index

• Introduction

• Fighting with Big Data: Genoma data

• What is Big Data?

• Technology transfer: Open Data opportunities

• Developing projects for Smartcity.

• Rmap, a real example in Madrid

• Conclusions

2

Page 3: Big & Open Data:  Challenges for Smartcity

Introduction

– Mobile technologies– Intelligent agents– Optimization and forecasting– Bioinformatics, Biostatistics– …

– www.tecnologiaUCM.es

3

Page 4: Big & Open Data:  Challenges for Smartcity

Fighting with the Big Data

• Every day we need to deal with more and more data.• For many years, new computers with more memory and higher

speed seem to be the solution for data growing. • Many researching areas which was fighting with the Big Data:

Bioinformatics, Genoma data, DNA, RNA, proteins and, in general all biological data have been required by computing monitors and storing in large data bases in several laboratories and researching centers along the world.

The future of genomics rests on the foundation of the Human Genome Project 4

Page 5: Big & Open Data:  Challenges for Smartcity

Fighting with the Big Data

• Each time an organization or an individual is not able to deal with data, a big data problem is facing.

• Same philosophy than modern Big Data: large data bases distributed along the world with parallel processing when available and suitable

• (Sequence alignment and Dynamic Programming)• The amount of biological data is a big data base.

5

Page 6: Big & Open Data:  Challenges for Smartcity

Big DataFrom Data Warehouse to Big Data

6

1970 relational model inventedRDBMS declared mainstream till 90s

One-size fits all, Elephant vendors- heavilyencoded even indexing by B-trees.

Page 7: Big & Open Data:  Challenges for Smartcity

Alex ' Sandy' Pentland, director of 'Media Lab' at Massachusetts Institute of Technology (MIT)

7

Nowadays bussiness needs a high avalailability of data, thennew techniques must be developed: Complex analytics, Graph Databases

Page 8: Big & Open Data:  Challenges for Smartcity

unstructureddata

8

¿Quién genera Big Data?

Progress and innovation are no longer hampered by the ability to collect data, but the ability to manage, analyze, synthesize, visualize, and discover

knowledge from data collected in a timely manner and in a scalable way

Page 9: Big & Open Data:  Challenges for Smartcity

Big DataBig Data 3+1+1 V’s

9

Page 10: Big & Open Data:  Challenges for Smartcity

Big Data

1. High Availability is now a requirement2. Host and Cloudcomputing3. Running in parallel

1. Data Aggregation process2. Analytics on Data3. GraphDBMSs similarities

4. Not only SQL: Cassandra* and MongoDB**5. Moving toward ACID, people from Google admit ACID as a

good idea for working with dababases.

*The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance.**Document oriented storage

10

MONGO

Page 11: Big & Open Data:  Challenges for Smartcity

11

• Main feature: scalability to many nodes– Scan of 100 TB in 1 node @ 50 MB/sec = 23 days– Scan in a cluster of 1000 nodes = 33 minutes

MapReduce– Parallel programming model– Simple concept, smart, suitable for multiple applications– Big datasets multi-node in multiprocessors– Sets of nodes: Clusters or Grids (distributed programming)• By Google (2004)– Able to process 20 PB per day– Based on Map & Reduce, classiclal methods in functional programming

related to the classic divide & conquer – Come from numeric analysis (big matrix products).

Big Data: Map ReduceMapReduce

Page 12: Big & Open Data:  Challenges for Smartcity

• Friendly for non technical usersMap Reduce

12

Big Data: Map Reduce

Page 13: Big & Open Data:  Challenges for Smartcity

– Used by Yahoo!, Facebook, Twitter Amazon, eBay…

– Can be used in different architectures: both clusters (in-house) and grid (Cloudcomputing)

http://hadoop.apache.org/

Hadoop

13

Big Data: Hadoop

Page 14: Big & Open Data:  Challenges for Smartcity

Big Data: Datamining & Scalability

• Techniques of Datamining (Machine Learning, Data Clustering, Predictive Models, etc.) are compatible with big data by complexanalytics

• Modeling prices in electricity Spanish markets under uncertainty G. Miñana, H. Marrao, R. Caro, J. Gil, V. Lopez, B. González , F. Sun et al. (eds.), Knowledge Engineering and Management, Advances in Intelligent Systems and Computing 214,DOI: 10.1007/978-3-642-37832-4_46, Springer-Verlag Berlin Heidelberg 2014

• To get a scalable system– Aggregation– Generalization– (Formal specification)

• Not only many cores, many nodes and out of memory data- Host and Cloudcomputing- Not all problems can be solve with the same techniques, Hadoop is

not enough14

Page 15: Big & Open Data:  Challenges for Smartcity

Technology transfer

• A great oportunity for researchers working to transfer technology, who can increase theirefforts in developing new techniques for– Monitoring data (Sensors, smartphones, …)– Storing data (Cloudcomputing, Amazon S3, EC2,

Google BigQuery, Tableau …)– Cleaning, Integrating & Processing data– data (Data Curation at Scale: The Data Tamer System,

M. Stonebraker et al., CIDR 2013) – Analysing data (R, SAS… but also Google, Amazon,

eBay..)– Fully homomorphic encryption & searching on

encrypted data

15

Page 16: Big & Open Data:  Challenges for Smartcity

Open Data“Open data is data that can be freely used, reused and redistributed by anyone –

subject only, at most, to the requirement to attribute and sharealike.” OpenDefinition.org -

“Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and share alike.” OpenDefinition.org

Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine-readable.Universal Participation: everyone must be able to use, reuse and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.

16

Page 17: Big & Open Data:  Challenges for Smartcity

Open Data

17

Page 18: Big & Open Data:  Challenges for Smartcity

Why Open Data by Open Knowledge Foundation

18

Page 19: Big & Open Data:  Challenges for Smartcity

Open Data for Smartcity

• What a citizen can expect when living in a city?

• Internet of the things– Libraries– Public transportation, trafic monitoring– Pets, devices, cars, even people

• Intelligent agents– Interacting without our control– Credit cards control (BBVA case of use)

19

Page 20: Big & Open Data:  Challenges for Smartcity

Basic structure

Patrón Cliente/Servidor

PUBLIC DATA

Web Service

SERVER CLIENT

WEB SERVER

20

Page 21: Big & Open Data:  Challenges for Smartcity

NEW DATA IS COLLECTED.

A SERVICE IS GIVEN

query

DATA TRANSFER

21

Page 22: Big & Open Data:  Challenges for Smartcity

Recycla.me

22

Page 23: Big & Open Data:  Challenges for Smartcity

Data Analytics

FROM (UNSTRUCTURED) DATA TO VALUE23

Page 24: Big & Open Data:  Challenges for Smartcity

Mariam SaucedoPilar TorralboDaniel Sanz

Recycla.meAna Alfaro

Sergio BallesterosLidia Sesma

Héctor Martos

Álvaro Bustillo

Arturo Callejo

Belén Abellanas

Jaime Ramos

Ignacio P. de Ziriza

Victor Torres

Alberto Segovia

Miguel Bueno

Mar Octavio de Toledo

Antonio SanmartínCarlos Fernández

MAPA DE RECURSOS

RECYCLA.TE24

Page 25: Big & Open Data:  Challenges for Smartcity

• Parks and gardens• Parkings for

• Cars• Motorbikes• Bikes

• Recycing Points• Fixed• Mobile• Cloths

• Stations• Bioetanol• Gas • Oil• Electric

• Routes for bikes• Vías ciclistas• Calles seguras

• Áreas de Prioridad Residencial

Madrid – Smart CityRMapRMap

25

Page 26: Big & Open Data:  Challenges for Smartcity

26

Page 27: Big & Open Data:  Challenges for Smartcity

Big and Open data. Challenges for Smartcity

Victoria LópezGrupo G-TeC

www.tecnologiaUCM.esUniversidad Complutense de Madrid

ICIST 2014Valencia