fortune time institute: big data - challenges for smartcity
TRANSCRIPT
Shanghai 2014
Big and Open Data Challengesfor Smartcity
Dr. Victoria LpezGrupo G-TeCwww.tecnologiaUCM.esUniversidad Complutense de Madrid
August26th201455 Exchange PlaceNYC
1
Big and Open data. Challengesfor SmartcityWhat about Big Data?Fighting with Big Data. Big Data. Big Projects. Privacity. Open Data. Transparency. Smartcities.2
What about Big Data?
From Data Warehouse to Big Data (large Data Bases) 31970 relational model inventedRDBMS declared mainstream till 90sOne-size fits all, Elephant vendors- heavily encoded even indexing by B-trees.
What about Big Data?
Big Data 3+1+1 Vs 4
Value
Fighting with Big Data5
Fighting with the Big Data Bioinformatics, Genoma data, DNA, RNA, Proteins and, in general all biological data have been required by computing monitors and storing in large data bases in several laboratories and researching centers along the world
The Human Genome Project 6
Customer point of viewLooking for flightsNot a simple search
7
Web Issues: Short path8
Joke but, behind our comfortable position there are some math and programming
Restrictions: Total timeTotal CostsDate/hourHow to sort the results?
http://www.sorting-algorithms.com/
9Web issues: Searching & Sorting
How many? 10
Order your room now!One teenager working = one afternoon at home
How many? 11Order all New York rooms NOW!One teenager working alone?
The solution: organization12
13Main feature: scalability to many nodesScan of 100 TB in 1 node @ 50 MB/sec = 23 daysScan in a cluster of 1000 nodes = 33 minutesCreated by Google (2004)Parallel programming modelSimple concept, smart, suitable for multiple applicationsBig datasets multi-node in multiprocessorsSets of nodes: Clusters or Grids (distributed programming)
Able to process 20 PB per dayBased on Map & Reduce, classical methods in functional programming related to the classic Divide & Conquer Come from numeric analysis (big matrix products).
Big Data: Map Reduce
Hadoop open code implementation of the computacional model Map ReduceUsed by Yahoo!, Facebook, Twitter Amazon, eBayCan be used in different architectures: both clusters (in-house) and grid (Cloudcomputing)Storrm and Spark are same model in memory instead of in diskhttps://hadoop.apache.org/ https://spark.apache.org/
14Big Data: Hadoop, Spark
How amount of data?15
Recommender Systems16
Renew your car insurance
Semantic Web toolsAnalysing & storing personal information
Business need to be competitive17
Harvard Business Review (HBR) blog, CMOs and CIOs Need to Get Along to Make Big Data Work,
Big Data & Business18
Big Data for Big projectsReal TimeThe Obama 2012 campaign used data analytics and the experimental method to assemble a winning coalition vote by vote. In doing so, it overturned the long dominance of TV advertising in U.S. politics and created something new in the world: a national campaign run like a local ward election, where the interests of individual voters were known and addressed.19
20
Big Data for Big projectsReal TimeHow Brazil vs. Germany played out on TwitterGeotagged tweets mentioning key terms around the Word Cup game, July 8, 2014
Where are my Personal Data?21
Social Sensing
The close future: Internet of the things22
Open Data
Open data is data that can be freely used, reused and redistributed by anyone subject only, at most, to the requirement to attribute and sharealike. OpenDefinition.org -Open data is data that can be freely used, reused and redistributed by anyone subject only, at most, to the requirement to attribute and share alike. OpenDefinition.orgAvailability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must bemachine-readable.Universal Participation: everyone must be able to use, reuse and redistribute there should be no discrimination against fields of endeavour or against persons or groups. For example, non-commercial restrictions that would prevent commercial use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.23
Open Data
24
Why Open Data by Open Knowledge Foundation
25
Our experience in developing systems to Madrid Open Data
Mariam SaucedoPilar TorralboDaniel Sanz
Recycla.me
Ana AlfaroSergio BallesterosLidia Sesma
Hctor Martoslvaro BustilloArturo Callejo
Beln Abellanas Jaime Ramos Ignacio P. de Ziriza
Victor TorresAlberto SegoviaMiguel Bueno
Mar Octavio de ToledoAntonio SanmartnCarlos Fernndez
MAPA DE RECURSOS RECYCLA.TE26
26
Parks and gardensParkings for CarsMotorbikesBikesRecycing PointsFixedMobileClothsStationsBioetanolGas Oil ElectricRoutes for bikesVas ciclistasCalles segurasResidential Priority Areas
Madrid Smart City27
27
RMapDemostration28
The way from data to valueBig Data CollectionMonitoringData cleaning and integrationHosted Data Platforms and the Cloud Big Data StorageModern Data BasesDistributed Computing Platforms NoSQL, NewSQL Big Data Systems SecurityMulticore scalabilityVisualization and User Interfaces Big Data AnalyticsFast algorithmsData compressionMachine learning toolsVisualization & Reporting
29
The MIT proposal stage list to deal with Big Data
Conclusions30Big Data, Open Data and Smartcity
Era of Data Revolution (Alex 'Sandy' Pentland, http://www.media.mit.edu/people/sandy)New technologies & developmentNew Business Great opportunities in Smartcity development
Dr. Victoria Lpez www.tecnologiaUCM.eswww.madrid.orgMadrid City Hall