using big data to investigate the influence of climate and demography on wine consumer habits...
TRANSCRIPT
Using Big Data to investigate the influence of climate and demography on wine consumer habits Alastair Reed1, Michael Shannon1, Daniel Mathews2
1 Viticulture and Winemaking, Melbourne Polytechnic Contact: [email protected]
2School of Mathematic Sciences, Monash University
Today
Background
Australian wine retail sector
The study
Use of Big Data in wine to derive relationships between geography and climate
Results
Association between temperature, geography and consumer preference
Recommendations
Ongoing research and implications for future management
Introduction
The Australian wine retail sector is a clear duopoly
Dominated by two players; Wesfarmers Ltd [19%] and Woolworths Ltd [39%]
Data analysis opportunity!
Beverage Revenue
Wesfarmers Ltd $2.0 billion
Woolworths Ltd $4.1 millionFrom: Data estimated by IBIS World
What effects consumer preference?
Epigenetics of a varietal decision
1. Visual
Label, position, status
2. History
Regional bias, personal bias
3. Environment
Climatic effects, light levels
Decision Genes
Shiraz Sauvignon Blanc
Shiraz sale Sauvignon Blanc sale
Activation
Decision Gene expression can be developmentally influenced and/or environmental
Developmental vs EnvironmentalCase Study: Champagne
1/1/2013
1/10/2013
1/19/2013
1/28/2013
2/6/2013
2/15/2013
2/24/2013
3/5/2013
3/14/2013
3/23/2013
4/1/2013
4/10/2013
4/19/2013
4/28/2013
5/7/2013
5/16/2013
5/25/2013
6/3/2013
6/12/2013
6/21/2013
6/30/2013
7/9/2013
7/18/2013
7/27/2013
8/5/2013
8/14/2013
8/23/2013
9/1/2013
9/10/2013
9/19/2013
9/28/2013
10/7/2013
10/16/2013
10/25/2013
11/3/2013
11/12/2013
11/21/2013
11/30/2013
12/9/2013
12/18/2013
12/27/2013
0
0.1
0.2
0.3
0.4
0.5
0.6
Online Chardonnay sales in Melbourne, Australia, during 2013
Developmental vs EnvironmentalCase Study: Champagne
1/1/2013
1/10/2013
1/19/2013
1/28/2013
2/6/2013
2/15/2013
2/24/2013
3/5/2013
3/14/2013
3/23/2013
4/1/2013
4/10/2013
4/19/2013
4/28/2013
5/7/2013
5/16/2013
5/25/2013
6/3/2013
6/12/2013
6/21/2013
6/30/2013
7/9/2013
7/18/2013
7/27/2013
8/5/2013
8/14/2013
8/23/2013
9/1/2013
9/10/2013
9/19/2013
9/28/2013
10/7/2013
10/16/2013
10/25/2013
11/3/2013
11/12/2013
11/21/2013
11/30/2013
12/9/2013
12/18/2013
12/27/2013
0
0.1
0.2
0.3
0.4
0.5
0.6
Warm averageCold average
Developmental vs EnvironmentalCase Study: Champagne
1/1/2013
1/10/2013
1/19/2013
1/28/2013
2/6/2013
2/15/2013
2/24/2013
3/5/2013
3/14/2013
3/23/2013
4/1/2013
4/10/2013
4/19/2013
4/28/2013
5/7/2013
5/16/2013
5/25/2013
6/3/2013
6/12/2013
6/21/2013
6/30/2013
7/9/2013
7/18/2013
7/27/2013
8/5/2013
8/14/2013
8/23/2013
9/1/2013
9/10/2013
9/19/2013
9/28/2013
10/7/2013
10/16/2013
10/25/2013
11/3/2013
11/12/2013
11/21/2013
11/30/2013
12/9/2013
12/18/2013
12/27/2013
0
0.1
0.2
0.3
0.4
0.5
0.6
Melbourne Cup
Easter
Mother’s Day
Tax Returns?
Football finals
NYE
We wish to explain the environmental and developmental…
Can we quantify to what degree wine purchase decisions are influenced by the weather?
Can we explain to what degree wine purchase decisions are influenced by location on a city-level?
The data…
Over 3 million transactions from across Victoria, Australia
Closely examined:
Shiraz
Chardonnay
Riesling
Sauvignon Blanc
Pinot Gris/Grigio
Cabernet Sauvignon
Merlot
Pinot Noir
Wine Purchase DecisionCase Study: Victoria
1 2 3 4 5 6 7 8 9 10 11 120
5
10
15
20
25
30
Geographically diverse state
Desert in north-west
Alpine in the north-east
Temperate in the south
Melbourne’s Climate
Average temperature: 13 – 25°C
Extreme temperatures: -2 – 46°C
Varieties correlate to temperature on a geographic scale
Association between relative Sauvignon Blanc (left) and Shiraz (right) sales and temperature, across Australia
0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 0.22 0.2255
10
15
20
25
30
f(x) = 353.252146300215 x − 52.2410878210395R² = 0.692078572970318
0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.135
10
15
20
25
30
f(x) = − 180.762346031977 x + 33.7165680206313R² = 0.405556053630358
All analysed varieties were correlated to temperature on a temporal scale
5 10 15 20 25 30 35 40 450.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
f(x) = − 0.00359695794237473 x + 0.267774023270689R² = 0.197351352441506
Association between relative Shiraz sales and temperature
All analysed varieties were correlated to temperature on a temporal scale
Association between relative Sauvignon Blanc sales and temperature
5 10 15 20 25 30 35 40 450.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
f(x) = 0.00169245126087001 x + 0.119450352512055R² = 0.10228796113334
Google search associates Shiraz to temperature
10 15 20 25 30 3515
20
25
30
35
40
45
50
55
60
f(x) = − 0.779314046730358 x + 56.7949665501602R² = 0.370583390432392
Temperature (°C)
Goog
le se
arch
(rel
ative
)
Association between relative fortnightly Google searches and average temperature (excluding Christmas period)
Google search associates Sauvignon Blanc to temperature
Association between relative fortnightly Google searches and average temperature (excluding Christmas period)
20 25 30 35 40 45 50 55 60 650.1
0.12
0.14
0.16
0.18
0.2
0.22
f(x) = 0.000610093115845559 x + 0.144491466125468R² = 0.133136983710597
Link between red wine sales and temperature is consistently stronger than white, except Sauvignon Blanc…
Proportion of stores with significant correlation (r)
Average income** when significant correlation
Average income when insignificant correlation
Cabernet Sauvignon 0.96 (0.29) $1632 $1110
Merlot 0.86 (0.26) $1639 $1436
Pinot Noir 0.57 (0.22) $1793 $1371
Shiraz 0.98 (0.44) $1623 $995
Chardonnay 0.45 (0.17) $1703 $1535
Pinot Gris 0.67 (0.23) $1765 $1303
Riesling 0.61 (0.25) $1778 $1352
Sauvignon Blanc 0.96 (0.29) $1626 $1244
Average 0.76 (0.27) $1695a $1294b
*>0.027 **fortnightly
Decision Gene approach
Relative purchase figures can be treated the same as allele frequencies (the frequency of gene variants), where an individual has two alleles for each gene
Genotypes:
aa = purchase
Aa or AA = no purchase
We can then use the frequencies to describe the characteristics of a population
Comparing the relative frequency of alleles allows populations to be compared using distance-matrices, visualized with traditional phylograms.
Clustering between distinct geographic areas
Phylogram generated using the Neighbour-Joining (NJ) method on sales frequencies of 7 varieties across 28 retail outlets (derived using POPTREE2 [Takezaki 2010)
Summary
Significant associations can be made between developmental and environmental factors and consumer preference
Temporal and spatial trends can be identified but need further analysis for confirmation
We are looking for collaborators to consolidate this research, all welcome!