what is data mining ?
DESCRIPTION
general description of data mining, its business context, the differences between data mining and statistics, example of an applicatonTRANSCRIPT
What is data mining ?
Johan BlommeCirculation Manager, AMP
The Datamining Garden kick-off workshopJune, 19th 2007
Regus Pegasus, Diegem
1
1. Introduction : “Competing on Analytics”
2
• Thomas Davenport : organizations that have built their very business on the ability to collect, analyze and act on data are consistently the leaders in their industry.
• The demands of business today are creating an increasing need for access to data and the use of it to maintain a sustainable competitive advantage :
– the rapid construction of data-driven analytics : • descriptive statistics ;
• predictive modeling and optimization techniques ;
– the rapid deployment of knowledge derived from data ;
– the need to give end users access to results in a form that helps them gain the insights they need to make critical business decisions.
3
Processes: interwoven, collaborativelinear, sequential
Tempo:periodic, slow
continuous, rapid
Assets :tangibles
intangibles
Industrial Age Information Age
4
5
2. Business drivers of data mining
6
Time and information drive the information age, and competitiveness will bebased on obtaining real-time information and acting on it promptly and effectively.The following changes indicate how to compete in the information age :
• more complex business environments due to globalization and deregulation ;• greater impact of change from external causes ;• a power shift from sellers to buyers, rapidly shifting customer demands and subsequent reduced product life cycles ;• constant technology change ;• faster business cycles and temporary competitive advantage ;• the need to explore collaborative strategies ;• constant change at ever-increasing speeds and shrinking strategy time horizons.
7
• Technology facilitates data gathering :
– e.g. RFID ;
– currently : applications mainly in production environment and logistics ;
– future possibilities : narrowcasting ;
– privacy issues !
8
• Technology transforms the way we live and interact :
– ubiquitous access to information is changing the economics of knowledge ;
– consumer preferences are becoming more complex and are changing more rapidly
– customers will increasingly choose how they would like to interact with organizations and will do only business with componies that meet their interaction needs ;
– the customer takes the lead ;
– technology changes the behaviour of consumers ; consequently, it is very important to track customer interactions and customer behaviour
9
3. Data mining defined
10
• Data mining is the extraction of actionable knowledge from large datasets to acquire and sustain a competitive advantage.
• Data mining is about achieving the organization’s goals, not about the maths and the statistics.
11
• The introduction of data warehousing in the 90’s resulted in a wider acceptance of data mining :
– operational data stored in corporate data warehouses has the potential to be exploited as business intelligence ;
– data warehouses are multidimensional structures used for on line analytical processing ;
– OLAP : • analyze information about past performance on an aggregate level
• verification-based approach : the user develops a hypothesis and then tests the data to prove or disprove the hypothesis
– data mining :• prospective data analysis
• predicting future trends, allowing businesses to make proactive, knowledge driven decisions
Data mining and statistics/OLAP can complement each other : the inductively revealed
relationships between variables can be used to formulate hypothesis and the insights gained
12
13
• Statistics vs. data mining :
– Statistical analysis is primarily concerned with confirmatory data analysis (model fitting) : testing if a proposed model of hypothetical relationships between variables does or does not provide a good explanation of the observed data.
Statistical models are based on assumptions or some theory about relationships between
variables and assume a deductive process
– Data mining : rather than verifying hypothetical patterns, data mining uses the data itself to detect such patterns.
Data mining : computational algorithms play a much greater role in building model through
exploratory data analysis (EDA). The nature of the process is inductive.
14
15
standard reports
query / drill down
alerts
forecasting
predictive modeling
optimization
degree of intelligence
business value
16
The CRISP-DM model is an industry- and application-neutral standard for fitting data mining into the general problem-solving strategy of a business.
17
4. An example of DM
The case of demand planning of magazines (AMP)
18
Distribution of press products :2.8 mio copies every night
19
Business problem :
The market for printed magazines is declining. Key reasons :
- advertising is migrating to e-media ;
- publishers are not investing in the future of printed magazines at the same rate as they are in
in the future of e-media products ;
- the young generation is brought up in an e-media world and will be less inclined to read
printed products ;
- publishers’ drive to reduce costs makes e-media publishing an attractive proposition, since
paper, printing and distribution costs can be eliminated.
The big issue in single copy sales is that of unsolds. If sales volumes go down, the distribution cost/copy
increases, since the overhead of the distribution system have to be spread over fewer magazines, and
returns as a proportion of delivered magazines increases (the fee earned by distributors is based on cover
prices of magazines and number of copies sold (instead of a cost-to-serve model).
20
Objective :
How to build an intelligent supply chain to improve supply chain efficiency,
reduce costs and increase profits ?
21
Product Planning& Development
Retail Catalog - MailInternet, WWW,
Kiosks
Suppliers
Business Understanding
• make-to stock environment• lack of visibility of supply chain, esp. day-to-day demand and stock positions• excessive inventory levels• return rates of + 60 % are not uncommon in our industry
=> Information is key : integrate internal SC activities of AMP with those of paterners to gain efficiencies across the supply chain
SAPBUSINESSWAREHOUSE
Sales Force
22
the traditional (linear) supply chain
23
Publisher Distributor Newsstand
1Information & Intelligence Sharing for Effectiveness
Product Flow
Information Flow
• POS Data Sharing• Inventory levels• Forecasts• Promotional Activities• New Product Introduction• Production & delivery schedules
the intelligent supply chain
24
Product Planning& Development
Retail Catalog - MailInternet, WWW,
Kiosks
Suppliers
Business Understanding
SAPBUSINESSWAREHOUSE
Sales Force
Data Preprocessing
. data normalization
. handling missing data
25
Product Planning& Development
Retail Catalog - MailInternet, WWW,
Kiosks
Suppliers
Business Understanding
SAPBUSINESSWAREHOUSE
Sales Force
Data Preprocessing
. flat sales model
. intermittent data modeling
. discreta data : low volume model
. apply business rules
DevelopForecast Model
26
Product Planning& Development
Retail Catalog - MailInternet, WWW,
Kiosks
Suppliers
Business Understanding
SAPBUSINESSWAREHOUSE
Sales Force
Data Preprocessing
. interpret results : simulation
. workflow integration (operations)
DevelopForecast Model
DeployForecasts
27
service degree level
monthly titles
28
0
25,0000
50,0000
75,0000
100,0000
0 25,0000 50,0000 75,0000 100,0000
R² = 0,5696
R² = 0,0213
% w
eigh
ted
oos
% unsolds
reference period Linear.(reference period) draw regulation Log.(draw regulation)
29
Improved understanding, forecasting and analysis of consumer demandImproved capability to respond and react to changesImproved stability, predictability and efficiency of supply chain operations
Improved Fill RatesImproved on-shelf availabilityMore effective demand generationactivities
IncreasedSales
Reduced lead timesReduced inventories
Reduced Inventories
Smoother SC executionMore efficient processesReduction of costs for handlingreturns
Reduced Costs
Shared visibility across supply chain
30