big data, potentials, opportunities, and challenges
TRANSCRIPT
Big Data, Potentials, Opportunities, and Challenges
Setia PramanaPoliteknik Statistika STIS
Sub Directorate Statistics Modeling BPS RI
1
About Me
1999-2000
2005
BSc in Statistics Brawijaya Univ.
Start working @ STIS
Journey to Europe
2006
M.Sc in App. Statistics Hasselt Univ. Belgium
M.Sc in BioStatisticsHasselt Univ. Belgium
Research Assistant
2007
2011
PhD MathematicsStatistical BioinformaticsHasselt Univ. Belgium
Postdoc @ MEBKarolinska Institutet
2014
2015
@ STIS
Associate Professor
2018
Head of PPPM Polstat STIS
Head of Sub Directorate Statistics Modeling BPS
2019
Board Member
• UN Global Working Group Big Data for Official Statistics
• Asosiasi Ilmuan Data Indonesia
• Ikatan Statistisi Indonesia
• Forum Pendidikan Tinggi Statistika
• Masyarakat Biodiversiti dan Bioinformatika Indonesia
• Asosiasi Artificial Intelegent Indonesia
4
“Data adalah jenis kekayaan baru bangsa kita. Kini data lebih berharga
dari minyak. Oleh karena itu, kedaulatan data harus diwujudkan. Hak
warga negara atas data pribadi harus dilindungi. Regulasinya harus
segera disiapkan, tidak boleh ada kompromi.
Presiden RI, Joko Widodo dalam Pidato Kenegaraan 16 Agustus 2019
REGISTRASI
DULU sumber data berasal dari:
SURVEI
SENSUS
SEKARANG sumber data juga berasal dari:
BIG DATA
Data administratif
Data digital komersial atau
transaksional
Perangkat pelacakan GPS
Data perilaku
Data opini
Data Explosion
• Interactions of billions of people using computers, GPS devices, cell phones, and medical devices.
• online or mobile financial transactions, social media traffic, and GPS coordinates.
• “In the next five years, we’ll generate more data as humankind than we generated in the previous 5,000 years”. Eron Kelly, GM Microsoft
5
Definition?
“A paradigm for enabling the collection, storage, management, analysis and visualization, potentially under real-time constraints, of extensive datasets with heterogeneous characteristics.” (International Telecommunication Union, 2015)
Data SourcesExhaust data Mobile phone data
Financial transactionsOnline search and access logsCitizen cardPostal data
Sensing data Satellite and UAV imagerySensors in cities, transport and homes Sensors in nature, agriculture and waterWearable technologyBiometric dataInternet of Things (IoT)
Digital Content Social media dataWeb scrapingParticipatory sensing / crowdsourcingHealth recordsRadio content
What People Do
What People Say
Measurement Revolution
Sources: Mobile Phone Data• Owned by Mobile Network Operators (MNOs)
• Mobile Position (Active)
• Call Detail Records (CDR):• Contains: incoming and outgoing call, SMS and MMS, and Location(passive)• stored to internal data warehouses & billing management systems
• DDR (Data Detail Record): Internet traffic between the mobile devices and the network).
17
Mobile Positioning Data
• Location of Mobile Devices• Statistical indicators can be generated:
• The number of residences geographically distributed according to available accuracy;• The number of workplace, school, secondary home, and other regular locations;• Internal migration based on the change of the residences within the country;• Change of workplace over time;• Cross-border migration based on the regular travels between different countries;• Population grid statistics (1 km2);• Temporary population statistics• Assessing temporary population (hourly, daily, weekly, monthly, etc.);• Real-time assessment for specific location during the large-scale event, gathering of
people or actual emergency situations (e.g. what is the consistence of the crowd in specific location, how many people are affected by an earth-quake of hurricane);Risk assessment for law enforcement (planning the number of patrol units in the area based on the consistency of the temporary population).
18
Page 21Big Data for Energy
From source to load we make the grid efficient and reliableFrom downtown to suburb, we deliver urban efficiency today
Smart Grid & Smart City IoT Solutions
Smart Grid Operator “IT/OT integration from field to control center to enterprise”
Smart Generator“Producing power efficiently"
Renewable Operator"Making renewablesdispatchable"
Energy Services Provider"Bridging supply & demand" .
Smart Energy
Smart Mobility
Smart Water
Smart Public Services
Smart Buildings & Homes
Smart Integration
Smart Data Center
Crowdsourcing• The process of getting work, funding or information,
usually online, from a crowd of people.
• The word Crowdsourcing is a combination of Crowd & Outsourcing
CROWD
OUTSOURCING
CROWDSOURCING
Web Crawling and Scraping
• Extract Information from Web
• Web Crawling is the process of locating information on World Wide Web(WWW), indexing all the words in a document, adding them to a database, then following all hyper links and indexes and adds that information also to the database
• Web scraping is the process of automatically requesting a web document and collecting information from it.
32
http://prowebscraping.com/web-scraping-vs-web-crawling/
Better Data, Better Government
• Quality and timely data are vital for enabling governments,international organizations, civil society, private sector and thegeneral public to make informed decisions
• Evidence based policy making
• Quality of Statistics:• Accuracy • Relevance • Timeliness • Accessibility • Coherence • Interpretability
34Setia Pramana
STATISTICS INDONESIA
39
Crowdsource for Food Prices Nowcasting
• Collaboration with Pulse Lab UN Jakarta
• Use crowdsourcing premise UN Food security project
• Locus: Kota Mataram, NTB
• Time: March– July 2015
39
Web Scraping: Online Shops
41
Online Shops Total Commodities
Hypermart 52 products
KlikMart 75 products
Bhinneka 40 products
Elektronik City 17 products
Zalora 36 products
BerryBenka 25 products
Mothercare 2 products
Babyzania 4 products
Apotek Century 10 products
Pusat Kosmetik 5 products
Sephora 2 products
Stationary 6 products
Gramedia 3 products
• Analysis is on progress
• Get the movement of consumer price
• Get the pattern of the changes of consumer price per commodity kind and per e-commerce
• Construct CPI by substituting the conventionally collected consumer price with e-commerce-based consumer price, then
• Comparing the survey-based CPI with e-commerce-based CPI
Online Shop Commodities Prices
42
Sumber Data
• Web Scraping dari E Commerce dan Google Maps
• Keterangan Data:• Web E Commerce (Nasional)
• 1.288 kategori• 3.065.279 barang
• 4.212 barang di sumatera barat
• 264.240 toko• 892 toko di sumatera barat
• Google Maps (6 sampel kabupaten/kota di Sumatera Barat)• Kabupaten Solok• Padang• Padang Pariaman• Pariaman• Pesisir Selatan• Kota Solok
Crawling online ticketing
• Pegi pegi
• Agoda
• Traveloka
• Data :Link Hotel, Id Hotel, Nama Hotel, Tipe Hotel, Bintang Hotel,Alamat Hotel, Harga Hotel, Skor Reviu, Jumlah Review, Tipe Kamar,Jumlah Tipe Kamar, Jumlah Kamar Tersisa, Jumlah Lantai, Jumlah Restauran, Jumlah Kamar Total,Tahun Dibangun, Latitude, Longitude,Kota/Area Hotel, dan Fasilitas Hotel.
• Sraping data dari media sosial instagram.
• Keywords: #wonderfulindonesia, #pesonaindonesia, #visitindonesia, #exploreinidonesia, #indonesiatourism.
• 1,897,450 post: 480,100 post have geotag (25%), cleaned: 411,630 post.
Analisis Data Geospasial Sosial Media Pola Pariwisata di Indonesia
Analisis Data Geospasial Sosial Media Pola Pariwisata di Indonesia
61542
27738
0
20000
40000
60000
80000
Peak Season (20 Desember 2018 - 2 Januari 2019)Low Season (21 Januari 2019 - 3 Februari 2019)
Contents Title
Contents Title Contents Title
Subjective Happiness Index
Subjective Happiness Index by Province
The saddest province
Not too sad & not too happy province
The happiest province
Analytics Approaches
• Descriptive: What happened or what is happening now?
• Diagnostic: Why did it happen or Why is it happening now?
• Predictive: What will happen next? What will happen under various conditions?
• Prescriptive: What are the options to create the most optimal/high value result/outcome?
67
Big Data Analytics
• Data is unstructured
• Data comes from different sources and has conflicts/missing data/outliers
• Usually a data fusion step is required
• Data are dynamic
• Often has a crowdsourcing component (e.g., Twitter)
• Often sensor processing steps are required (domain-specific)
• Because of the size of processed data things have to be done differently
• Involves high-performance computing and specialized algorithms
• Using advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics, and natural language processing.
Big Data for Energy 68
Data Science
“Applying advanced statistical tools to existing data to solve problems, generate new insights, improve products/services”
“Everything that has something to do with data: Collecting, analyzing, modeling...... yet the most important part is its applications --- all sorts of application”
70
What is Data Science?
• Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education• Computer Science
• Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI
• Mathematics• Mathematical Modeling
• Statistics• Statistical and Stochastic modeling, Probability.
71
Data Science
• A Mashed Up Discipline
• A multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data
72
Data Science
• New Discipline
• Very few books covering the discipline as a whole
• Interdisciplinary fields like business analysis that incorporate computer science, modeling, statistics, analytics, and mathematics.
74
Monica Rogati https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
75
Summary
• We are now in BigData
• Huge potential of big data
• Big data analytics plays important roles on all aspects
• Data Scientist would be the most sexiest job
Big Data for Energy 76
Challenges
• Information technology (IT) infrastructure.
• Data collection and governance.
• Data integration and sharing.
• Data processing and analysis.
• Security and privacy.
• Professionals of big data analytics and smart energy management.
Big Data for Energy 77