¿qué es big data? - iese.edu data iese pdd alumni reunion 2014 (p… · “big data is data that...

35
email: [email protected] blog: blog.iese.edu/FaceIT @JavZamora Prof. Javier Zamora IESE PDD Alumni Reunion Barcelona, 26 de marzo de 2014 Atando Cabos con Big Data: Buscando una Aguja el Pajar Digital 1 ¿Qué es Big Data? 2

Upload: dodieu

Post on 24-Mar-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

email: [email protected] blog: blog.iese.edu/FaceIT

@JavZamora

Prof.&Javier&Zamora&!!IESE&PDD&Alumni&ReunionBarcelona,&26&de&marzo&de&2014

Atando&Cabos&con&Big&Data: Buscando&una&Aguja&el&Pajar&Digital

1

¿Qué es Big Data?

2

Page 2: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

¿Qué es Big Data?

3

4

Page 3: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

“The Internet of Things” en el móvil...

Red de Proximidad

Red de Llamadas

Red de Localización

Red de Amigos

5

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Fuentes del Big Data

! Estructurados"! Transacciones ERP/CRM/SCM"! Transacciones de acciones"! Clickstream/Page views/Web transactions"! Web links/Blog references"

! No estructurados"! Emails"! Twitter feeds"! Documentos de texto"! Facebook posts"

! Sensores"! Mobile phone/GPS/Location data"! RFID, Bar Code Scanner Data"! Real-time machinery diagnostics/engines/equipment "

! Etc...

6

Page 4: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

“Big data is data that exceeds the processing

capacity of conventional database systems."

The data is too big, moves

too fast, or doesn't fit the strictures of your

database architectures."

To gain value fromthis data, you must choose

an alternative way to process it.”"

!Edd Dumbill"

Chairman O'Reilly Strata Conference"

7

¿Cuánto es Big Data?

8

Page 5: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

9

! 1,750 Exabytes generados en 2011 según IDC "

! A finales de 2012 se disponían sólo de 800 Exabytes de almacenamiento"

! 1 Exabyte impreso en un libro necesitaría 500,000 millones de árboles"

! Según McKinsey las empresas con más de 1,000 empleados almacenan una media de 235 Terabytes más información que la que contiene laUS Library of Congress

10

Page 6: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Wal-Mart Stores, Inc.

! 1 millón de transacciones de clientes cada hora"

! Alimentando bases de datos con más de 2.5 Petabytes"

! 167 veces los libros de la Library of Congress

11

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Wal-Mart Stores, Inc.

! 1 millón de transacciones de clientes cada hora"

! Alimentando bases de datos con más de 2.5 Petabytes"

! 167 veces los libros de la Library of Congress

12

Page 7: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Las 3 “V”s del Big Data

! Volumen"! Necesidades de Almacenamiento"! Análisis masivo"

! Variedad"! Transaccional"! Social Media"! Móvil"

! Velocidad"! Generación"! Proceso

13

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

¿Cómo obtenemos la cuarta “V”?

14

Page 8: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

15

transformandoGigabytes endecisiones...

16

Page 9: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

¿Renovarse o morir?

17

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

18

Page 10: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Tiros en la NBA 2006-2011: 700.000

MIT Sloan Sports Analytics Conference 2012 March 2-3, 2012, Boston, MA, USA

3

significant time pressure there is a need for efficient visualizations that streamline connections between spatial game data, reasoning artifacts, analysts, and other audiences.

3 Case Study: Who is the best shooter in the NBA?

As a means to evaluate the feasibility of spatial and visual analytics for the NBA, we conducted a case study that attempted to answer one simple question: who is the best shooter in the NBA? Conventional metrics and evaluative approaches fail to provide a simple answer to this question. For example, Nene Hilario and Dwight Howard led the league in FG% in the 2010-2011 season, but neither is considered to be a great shooter. What makes a shooter great? What criteria separate good shooters from bad shooters? When asked who was the best shooter in the league, NBA reporter David Aldridge suggested it was Ray Allen because of his ability to shoot well from many different locations on the court: “Everyone has their favorite spots on the court but it seems like Ray [Allen] is more comfortable in more places than anyone I’ve seen.”

There is a clear spatial criterion implied in Aldridge’s answer; he is implying that all shooters perform better

at some places than others, but great shooters possess the ability to shoot well from many locations on the floor – the capability to make baskets from several different court locations is commonly referred to as “shooting range.” Although this seems like a sensible and useful assessment criterion, shooting range has yet to be effectively quantified and consequently has been mostly neglected in current evaluative approaches. To determine the best shooters in the NBA, we generated an analytical framework that 1) employs spatial analysis to quantify shooting range, and 2) visual analytics to reveal the unique spatial tendencies of individual shooters. Below, we describe the case study’s data, methods, and results.

Data: Using game data sets for every NBA game played between 2006 and 2011, we compiled a spatial field goal database that included Cartesian coordinates (x,y) for every field goal attempted in this 5-year period. This data set includes player name, shot location, and shot outcome for over 700,000 field goal attempts. We mapped the shot data atop a base map of a NBA basketball court (Figure 1). Although a regulation NBA court is 4,700 ft2, (50ft x 94ft), almost all (>98%) field goal attempts occur within a 1,284 ft2 area in between the baseline and a relatively thin buffer around the 3-point arc; we call this area the “scoring area.” We divided the scoring area into a grid consisting of 1,284 unique “shooting cells,” each 1 ft2 (Figure 1). To quantify shooting range, we applied spatial analyses to evaluate shooting performance across the grid and within each shooting cell.

Figure 1: Our composite shot maps from 2006-2011 NBA game data. The map on the left summarizes the density of all field goal attempts during the study period. The map on the right reveals league-wide tendencies in both shot attempts and points per attempt. Larger squares indicate areas where many field goals were attempted; smaller squares indicate fewer attempts. The color of the squares is determined by a spectral color scheme and indicates the average points per attempt for each location. Orange areas indicate areas where more points result from an average attempt, and blue areas indicate fewer points per attempt.

We derived metrics that described spatial aspects of shooting performance throughout the scoring area. The most basic metric is called “Spread,” which is simply a count of the unique shooting cells in which a player has attempted at least one field goal. The raw result is a number between 0 and 1,284 and summarizes the spatial diversity of a player’s shooting attempts. By dividing this count by 1,284 and multiplying by 100, we generated Spread%, which indicates the percentage of the scoring area in which a player has attempted at least one field goal.

Spread = FGAijij∈SA∑

Spread = Total spatial spread of player across all scoring cells FGAij = 1, if at least one field goal has been attempted in cell i, 0 if not SA = Scoring area consisting of 1,284 scoring cells

Fuente: Kirk Goldsberry, “CourtVision: New Visual and Spatial Analytics for the NBA” - MIT Sloan Sports Analytics Conference 2012

19

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Tiros en la NBA 2006-2011: 700.000

MIT Sloan Sports Analytics Conference 2012 March 2-3, 2012, Boston, MA, USA

3

significant time pressure there is a need for efficient visualizations that streamline connections between spatial game data, reasoning artifacts, analysts, and other audiences.

3 Case Study: Who is the best shooter in the NBA?

As a means to evaluate the feasibility of spatial and visual analytics for the NBA, we conducted a case study that attempted to answer one simple question: who is the best shooter in the NBA? Conventional metrics and evaluative approaches fail to provide a simple answer to this question. For example, Nene Hilario and Dwight Howard led the league in FG% in the 2010-2011 season, but neither is considered to be a great shooter. What makes a shooter great? What criteria separate good shooters from bad shooters? When asked who was the best shooter in the league, NBA reporter David Aldridge suggested it was Ray Allen because of his ability to shoot well from many different locations on the court: “Everyone has their favorite spots on the court but it seems like Ray [Allen] is more comfortable in more places than anyone I’ve seen.”

There is a clear spatial criterion implied in Aldridge’s answer; he is implying that all shooters perform better

at some places than others, but great shooters possess the ability to shoot well from many locations on the floor – the capability to make baskets from several different court locations is commonly referred to as “shooting range.” Although this seems like a sensible and useful assessment criterion, shooting range has yet to be effectively quantified and consequently has been mostly neglected in current evaluative approaches. To determine the best shooters in the NBA, we generated an analytical framework that 1) employs spatial analysis to quantify shooting range, and 2) visual analytics to reveal the unique spatial tendencies of individual shooters. Below, we describe the case study’s data, methods, and results.

Data: Using game data sets for every NBA game played between 2006 and 2011, we compiled a spatial field goal database that included Cartesian coordinates (x,y) for every field goal attempted in this 5-year period. This data set includes player name, shot location, and shot outcome for over 700,000 field goal attempts. We mapped the shot data atop a base map of a NBA basketball court (Figure 1). Although a regulation NBA court is 4,700 ft2, (50ft x 94ft), almost all (>98%) field goal attempts occur within a 1,284 ft2 area in between the baseline and a relatively thin buffer around the 3-point arc; we call this area the “scoring area.” We divided the scoring area into a grid consisting of 1,284 unique “shooting cells,” each 1 ft2 (Figure 1). To quantify shooting range, we applied spatial analyses to evaluate shooting performance across the grid and within each shooting cell.

Figure 1: Our composite shot maps from 2006-2011 NBA game data. The map on the left summarizes the density of all field goal attempts during the study period. The map on the right reveals league-wide tendencies in both shot attempts and points per attempt. Larger squares indicate areas where many field goals were attempted; smaller squares indicate fewer attempts. The color of the squares is determined by a spectral color scheme and indicates the average points per attempt for each location. Orange areas indicate areas where more points result from an average attempt, and blue areas indicate fewer points per attempt.

We derived metrics that described spatial aspects of shooting performance throughout the scoring area. The most basic metric is called “Spread,” which is simply a count of the unique shooting cells in which a player has attempted at least one field goal. The raw result is a number between 0 and 1,284 and summarizes the spatial diversity of a player’s shooting attempts. By dividing this count by 1,284 and multiplying by 100, we generated Spread%, which indicates the percentage of the scoring area in which a player has attempted at least one field goal.

Spread = FGAijij∈SA∑

Spread = Total spatial spread of player across all scoring cells FGAij = 1, if at least one field goal has been attempted in cell i, 0 if not SA = Scoring area consisting of 1,284 scoring cells

Fuente: Kirk Goldsberry, “CourtVision: New Visual and Spatial Analytics for the NBA” - MIT Sloan Sports Analytics Conference 2012

MIT Sloan Sports Analytics Conference 2012 March 2-3, 2012, Boston, MA, USA

3

significant time pressure there is a need for efficient visualizations that streamline connections between spatial game data, reasoning artifacts, analysts, and other audiences.

3 Case Study: Who is the best shooter in the NBA?

As a means to evaluate the feasibility of spatial and visual analytics for the NBA, we conducted a case study that attempted to answer one simple question: who is the best shooter in the NBA? Conventional metrics and evaluative approaches fail to provide a simple answer to this question. For example, Nene Hilario and Dwight Howard led the league in FG% in the 2010-2011 season, but neither is considered to be a great shooter. What makes a shooter great? What criteria separate good shooters from bad shooters? When asked who was the best shooter in the league, NBA reporter David Aldridge suggested it was Ray Allen because of his ability to shoot well from many different locations on the court: “Everyone has their favorite spots on the court but it seems like Ray [Allen] is more comfortable in more places than anyone I’ve seen.”

There is a clear spatial criterion implied in Aldridge’s answer; he is implying that all shooters perform better

at some places than others, but great shooters possess the ability to shoot well from many locations on the floor – the capability to make baskets from several different court locations is commonly referred to as “shooting range.” Although this seems like a sensible and useful assessment criterion, shooting range has yet to be effectively quantified and consequently has been mostly neglected in current evaluative approaches. To determine the best shooters in the NBA, we generated an analytical framework that 1) employs spatial analysis to quantify shooting range, and 2) visual analytics to reveal the unique spatial tendencies of individual shooters. Below, we describe the case study’s data, methods, and results.

Data: Using game data sets for every NBA game played between 2006 and 2011, we compiled a spatial field goal database that included Cartesian coordinates (x,y) for every field goal attempted in this 5-year period. This data set includes player name, shot location, and shot outcome for over 700,000 field goal attempts. We mapped the shot data atop a base map of a NBA basketball court (Figure 1). Although a regulation NBA court is 4,700 ft2, (50ft x 94ft), almost all (>98%) field goal attempts occur within a 1,284 ft2 area in between the baseline and a relatively thin buffer around the 3-point arc; we call this area the “scoring area.” We divided the scoring area into a grid consisting of 1,284 unique “shooting cells,” each 1 ft2 (Figure 1). To quantify shooting range, we applied spatial analyses to evaluate shooting performance across the grid and within each shooting cell.

Figure 1: Our composite shot maps from 2006-2011 NBA game data. The map on the left summarizes the density of all field goal attempts during the study period. The map on the right reveals league-wide tendencies in both shot attempts and points per attempt. Larger squares indicate areas where many field goals were attempted; smaller squares indicate fewer attempts. The color of the squares is determined by a spectral color scheme and indicates the average points per attempt for each location. Orange areas indicate areas where more points result from an average attempt, and blue areas indicate fewer points per attempt.

We derived metrics that described spatial aspects of shooting performance throughout the scoring area. The most basic metric is called “Spread,” which is simply a count of the unique shooting cells in which a player has attempted at least one field goal. The raw result is a number between 0 and 1,284 and summarizes the spatial diversity of a player’s shooting attempts. By dividing this count by 1,284 and multiplying by 100, we generated Spread%, which indicates the percentage of the scoring area in which a player has attempted at least one field goal.

Spread = FGAijij∈SA∑

Spread = Total spatial spread of player across all scoring cells FGAij = 1, if at least one field goal has been attempted in cell i, 0 if not SA = Scoring area consisting of 1,284 scoring cells

20

Page 11: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Tiros en la NBA 2006-2011: Posiciones preferidas de 4 jugadores

MIT Sloan Sports Analytics Conference 2012 March 2-3, 2012, Boston, MA, USA

5

is the best way of quantifying shooting range or not, it seems to capture pure shooting ability better than FG% or eFG%.

Player Spread %

1. Kobe Bryant 1,071 83.4% 2. Lebron James 1,047 81.5% 3. Vince Carter 1,005 78.3% 4. Joe Johnson 992 77.3% 5. Rudy Gay 983 76.6% 6. Antawn Jamison 965 75.2% 7. Andre Igudola 962 74.9% 8. Ray Allen 952 74.1% 8. Kevin Durant 949 73.9%

10. Danny Granger 948 73.8% Table 1: Top 10 players in Spread metric

Player Range % 1. Steve Nash 406 31.6% 2. Ray Allen 386 30.1% 3. Kobe Bryant 383 29.8% 4. Dirk Nowitzki 373 29.0% 5. Rashard Lewis 354 27.6% 6. Joe Johnson 352 27.4% 7. Vince Carter 343 26.7% 8. Paul Pierce 332 25.9% 8. Rudy Gay 332 25.9%

10. Danny Granger 331 25.8% Table 2: Top 10 players in Range metric

Figure 3: The shooting ranges of Steve Nash, Ray Allen, Dirk Nowitzki, and Kobe Bryant. These four players had the highest Range values, but these graphics reveal that they achieve them in much different ways. For example, when compared to the three others, Dirk Nowitzki shoots relatively few 3-point shots and performs much better in the mid-range areas on the left side of the court, while Ray Allen excels in the corners of the court where Steve Nash rarely shoots.

4 Discussion

Basketball is a spatial sport. Shooting ability and other performance variables are spatially dependent. Understanding the spatial dimensions of performance variations is key to basketball expertise. Every coach, player, fan, scout, and general manager is well aware that the structure and dynamics of court space directly influence every second of every basketball game. To understand basketball, you need to understand space. In this regard, basketball is not unique. Spatial reasoning is a primary form of human intelligence that informs everything from tying our shoes, to launching spacecraft. Visual analytics is an emerging science that focuses on analytical reasoning as facilitated by intelligent visuals. Graphics including maps, charts, and diagrams are vital tools in every strategic domain in which space is key; maps inform military campaigns, magnetic resonance imagery informs orthopedic surgery, and IKEA diagrams enable us to assemble bookshelves. With this in mind, there is a surprising lack of visual reasoning artifacts used within the basketball community.

Steve Nash Ray Allen

Dirk Nowitski Kobe BryantFuente: Kirk Goldsberry, “CourtVision: New Visual and Spatial Analytics for the NBA” - MIT Sloan Sports Analytics Conference 2012

21

22

Page 12: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Anton van Leeuwenhoek (1632-1723)

23

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Ernst August Friedrich Ruska (1906-1988)

24

Page 13: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

http://www.youtube.com/watch?v=KCVs1U8xdxA

25

26

Page 14: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

27

1. DESCRIPTIVE

28

Page 15: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

1. DESCRIPTIVE

2. PREDICTIVE

29

1. DESCRIPTIVE

2. PREDICTIVE

3. PRESCRIPTIVE

30

Page 16: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

TextHal Varian Chief Economist at Google

The sexy job in the next ten years will be statisticians… The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that�s going to be ahugely important skill.�!

How the Web Challenges Managers!McKinsey Quarterly

31

TextHal Varian Chief Economist at Google

The sexy job in the next ten years will be statisticians… The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that�s going to be ahugely important skill.�!

How the Web Challenges Managers!McKinsey Quarterly

32

Page 17: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Correlación "no es lo mismo que

Causalidad

33

Correlación "no es lo mismo que

Causalidad

34

Page 18: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Experimentos Controlados Tests A/B

! Dividir el tráfico aleatoramiente entre dos (o más) versiones"! A (Control)"! B (Test)"! Recoger métricas

relevantes"! Analizar"

! Mejor método científico para probar causalidad"

Fuente: Ronny Kohavi

35

“to have a great idea have a lot of them”

Thomas A. Edison36

Page 19: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

37

Big Data vs. HiPPOs"!

Highest Paid Person’s Opinion

38

Page 20: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Escépticos informados

39

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Escépticos informados

Fuente: S. Shah, A. Horne and J. Capellá, HBR, abril 2012

40

Page 21: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

41

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Comportamiento de las “tribus” urbanas

Fuente: Sense Networks & Prof. Alex Petland

42

Page 22: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Fuente: Sense Networks & Prof. Alex Petland

43

Fuente: Sense Networks & Prof. Alex Petland

44

Page 23: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Smartsteps

http://www.youtube.com/watch?v=wqjKTW3wJZ8

45

46

Page 24: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

47

48

Page 25: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

49

50

Page 26: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

51

https://www.youtube.com/watch?v=HkEOJnn_zlg

52

Page 27: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

53

“Personal data is the new oil of the internet

and the new currencyof the digital world.”

—Meglena Kuneva

European Consumer Commissioner

“Data is now the world’snew natural resource.”

—Virginia Rometty

IBM President & CEO 54

Page 28: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

! Fraude y Robo"! Invasión de Privacidad"! Ciber-Terrorismo & Vandalismo"! Discriminación de precio"! Denegación de seguro médico"! Robo de clientes o ideas"! Obama’s Privacy Bill of Rights Tenets"! UE con legislación más restrictiva

(multas de hasta 1M € o 2% del beneficio)

55

56

Page 29: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

57

58

Page 30: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

59

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Creación de un Mercado de Datos

Fuente: World Economic Forum, febrero 2013

60

Page 31: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Creación de un Mercado de Datos

Fuente: World Economic Forum, febrero 2013

61

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

“The end of the average”

62

Page 32: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

63

Prof. Javier Zamora, [email protected], Blog: http://blog.iese.edu/FaceIT, Twitter: JavZamora!IESE PDD Alumni Reunion, Barcelona, 26 de marzo de 2014

Fuente: Intel

Volviendo a la Ley de Moore...

La capacidad de procesamiento de información aumenta de forma imparable

64

Page 33: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

! Estudio del analista informático Martin Grötschel sobre la velocidad de resolución de un problema estándar de optimización en el periodo 1988-2003"

! Mejora por un factor de 43 millones! "! 2 factores"

! Procesadores: x1,000"! Algoritmos: x43,000

65

https://www.youtube.com/watch?v=6LYi2NAi8zE

66

Page 34: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

67

“To err is human, "but to really foul things up "

you need a computer.”"!

Paul Ehrlich

68

Page 35: ¿Qué es Big Data? - iese.edu DATA IESE PDD ALUMNI REUNION 2014 (P… · “Big data is data that exceeds the processing capacity of conventional database systems." The data is too

¡Gracias!

@JavZamora

email: [email protected] blog: blog.iese.edu/FaceIT !

69