project proposal compiled
TRANSCRIPT
-
8/12/2019 Project Proposal Compiled
1/13
Business Intelligence IntegrationJoel Da Costa, Takudzwa Mabande, Richard Migwalla
Antoine Bagula, Joseph Balikuddembe
-
8/12/2019 Project Proposal Compiled
2/13
Project Description
Business Intelligence (BI) is the practice of using computer software to aid data analysis and
decision making in businesses. It represents a set of processes, tools and technologies which
improve productivity, sales and service of an enterprise, and so profitability in general. BI works
primarily by collecting, organizing and analyzing corporate data and then creating useful
knowledge out this analysis (reporting). BI as a whole incorporates a wide spectrum of software
functions including ad-hoc querying, on-line analytical processing (OLAP), dashboards,
scorecards, search, visualization and more.
BI differentiates itself through its interdepartmental focus and general overview which is geared
towards total business performance. The implementation of BI gives knowledge and understanding
to departmental groups which previously may not have had access to or understanding of the data.
Increased analytics and ad hoc reporting allow organisations to better understand trends within
their business and apply a variety of different measures and attributes to understanding these
trends. Once the BI system has been implemented, a company will typically find it has more ideas
for new initiatives, more efficient and precise data collection processes, more effective marketingtechniques and a better understanding of its customersneeds and characteristics, and a better
understanding of the state of the market. This improved business agility and efficiency through BI
results in a long term performance gain which can result in significant profit increases.
The BI system itself is typically segmented into several key areas. The first is Business Modelling
in order to create the framework of the system and how the information flows need to be
established. Data warehouses are used as a centralized repository for all the data gathered, and
maintained through the 'Extraction, Transformation and Loading' (ETL) processes. OLAP is a
technique by which the data sourced from the data warehouse is visualized and summarized to
provide a perspective view across multiple dimensions in order to quickly answer multi-dimensionalqueries. Essentially, OLAP tells a business what has happened, and Data mining explains why it
happened, and what is likely to happen in the future based on past patterns.
-
8/12/2019 Project Proposal Compiled
3/13
Problem Statement
The project is going to focus on the underlying technologies which enable Business Intelligence
(BI) and their application to two key scenarios. Previously, various technologies have been
developed and implemented from a one size fits all approach, but this approach is likely to result
in less effective and accurate analysis. Different areas and analyses require more adaptability
rather than such a singular approach. Our aim then is to evaluate which technologies would be themost effective for the particular cases. The technologies being evaluated are Bayesian Belief
Networks, Neural Networks and Artificial Immune Systems which will be expanded on later in the
proposal.
The project will be done in cooperation with Sanlam, who will provide the necessary data to be
analysed.
The first case is to analyse customer data in order to create profiles of them so that they may be
targeted with the correct marketing techniques. By doing so, it would allow an increase in sales as
well as a decrease in the cost of marketing. Using the data provided, the 3 different technologies
will be applied to try and gain the most accurate customer profiles.
The second case is Predictive Sales Forecasting. Using historical and current data, the same 3
technologies will be applied to try and create an accurate forecast of the future trends. Forecasting
allows better business decisions to be made and mitigation to be taken in order to improve the
likely outcome.
While the cases operate individually, they are all being implemented with the same aim. What we
want to ascertain from the project results is the variance of each approach's results when
measured against the same data and also bench-marked on the known sales figures. This will help
define the strengths and weaknesses of the particular technologies in developing BI functions.
-
8/12/2019 Project Proposal Compiled
4/13
Procedures and Methods
This project is primarily designed as a research venture, with the main objective being the
synthesis of usable research results, as per the Problem statement. The scope does however
extend beyond that, as shall be illustrated through the following breakdown of procedures and
objectives.
Implementation will occur in the form of a java application. It will make use of 3 different intelligent
systems to analyse historical data provided by Sanlam, to predict the required output.
Rationale for cho sen approach
Before considering the actual approach that will be used in addressing the Problem statement, it is
first necessary to mention the reasoning behind the choice of algorithms. As the research
conducted indicated, industry tends to favour the use of these 3 algorithms, particularly exhibiting a
distinct liking to Bayesian Belief networks. Furthermore, as per the initial meeting with the Sanlam
representative, these 3 algorithms are of particular interest to Sanlam. More detail as to industrys
use of these algorithms can be seen in the related works section.
Thus, the next step is to elaborate on the chosen approach, i.e. the choice to address the problem
in the form of an application. The following points summarize the motivation behind this:
Application development allows further extensibility: By choosing to develop this project in
the form of an application, there is more room to generalize and adapt the application,
making it useful in other spheres of Business.
Extensibility also allows room for improvement. Thus, developing this application allows
room for continuation, evolution and progress.
Providing Sanlam with a concrete showing of the results obtained, as well as how they were
obtained is also reason for developing this application.
Recreating the results of this research experiment will also be made easier given the
platform of an application.
Development process
Because of the collaborative nature of this project, it is key that the primary stakeholders i.e.
Sanlam, submit a clear description of their requirements and expectations. For this reason, the
project will involve a Sanlam delegate, as well as the project team. Meetings will be held with the
delegate in order to generate a specific set of user requirements from which the solution can be
derived.
Once the requirements have been finalised, the next phase will then be implemented. This will
consist of developing the application which will model abstractions of the selected intelligent
systems. The application will have various forms of clientele information, (provided by Sanlam) as
input. This information will include elements such as Incomes, Premiums as well as purchasing
history, to name a few. Based on this input, the application will then use the embedded intelligent
systems to generate output, offering the user the choice as to which algorithm is applied in the
simulation. This functionality will thus allow for comparison of results. The output will be displayed
in a format that is relevant to business users, and a graphical user interface will be implemented as
part of the application. By hiding a significant portion of the underlying technicalities, and displaying
only what is relevant to shareholders and other business analysts the interface will thus achieve its
functionality (more detail on this is provided later).
-
8/12/2019 Project Proposal Compiled
5/13
Ethical, Professional and Legal Issues
Ethical Issues
We will be using Sanlam sales and customer data which is to remain confidential. It may not be
redistributed to any external parties and no personal information may be extracted for use outside
the project. For demonstration purposes, the software may not display personal information that
may lead to the identification of particular individuals. This information may be used in thegeneration of results/forecasts but it will be abstracted with the use of IDs for names if necessary.
Legal Issues
All sales and customer data from Sanlam must be kept private within the realms of the project. Any
copies of the database must be deleted once testing is completed and may not be archived outside
of Sanlam. No copies of the database may be created for use outside the project for any purpose.
-
8/12/2019 Project Proposal Compiled
6/13
Related Work
Customer Prof i l ing
Sebastiani et al. used Bayesian Networks to profile customers in order to predict profits. They usedtwo networks: the first to describe the probability of response from customers, and the second tomodel price factors. The results were reasonable, and by understanding the characteristics ofcustomers, the models thus help to potentially increase profits [1].
Similar work has been done by Elalfi et al. who combined Bayesian networks with geneticalgorithms. An algorithm was used to extract accurate and comprehensible rules from a databaseusing trained artificial neural networks, which in turn were trained by genetic algorithms to find theoptimal values for the model. These rules were then used to define customer profiles in order tomake for more profitable e-business [2].
Customer l i fe cycles
Baesens et al. introduces a measure of a customers future spending evolution that might improverelationship marketing decision making. The method suggested predicts whether a customer willincrease or decrease spending from their initial purchase information. It had a 75% classificationaccuracy in predicting the customer lifecycle using purchase volume and purchase category [3].
Repeat Purch ase Model ing
Baesens et al. focuses on the need for companies such as mail-order companies to identify whichcustomers are most likely to purchase before they send out costly catalogues. This involvesprofiling customers according to several parameters and calculating the probability of repurchase.A Bayesian Neural Network was used and had a correct classification result of 71% given the data
set used [4].
Model l ing Customer Att i tudes
Ishigaki et al. use Bayesian networks to model customer attitudes based on questionnaire data.The model can then be used to gauge customersfeelings towards a product, and how they shouldbe marketed to. The model was fairly successful with a 73.5% success rate on testing [5].
Sales Forecast ing
Recently, Chang et al. developed on the idea of sales forecasting by including clustering in the
model. The K-mean technique is used to cluster the data, which is then used with a fuzzy neuralnetwork, which once trained, can generate sales forecasts. The model proved very effective inproviding accurate forecasts, and was more accurate than a series of other models it was testedagainst [6].
-
8/12/2019 Project Proposal Compiled
7/13
Anticipated Outcomes
We will create a package that will read in data from the Sanlam database, use different machine
learning techniques to profile customers and compare the accuracy of the different techniques
using actual data.
System
The software will be composed of:
An interface to the database that will read in relevant data.
The core of the program that will contain three different Intelligent System techniques that a
user can utilize.
The front-end interface that will give the user results of the classification comparing actual
data to inferred information.
The major component will be the implementation of the different techniques. However, it will still be
important to have good interfaces with the database and the user. The user interface will need to
display interpretable information on the performance of each technique, which will entail
aggregating the results in a way that a user will quickly and easily understand. It will need to allow
changes in parameters to allow optimisation for particular data sets.
Expected Impact
We expect to identify the best machine learning technique to use for customer profiling and sales
forecasting for Sanlam in particular. From our initial investigation it seems that Bayesian networks
are very good classifiers (useful in customer profiling) and neural networks are very good
forecasters. The performance of each technique however is highly dependent on the task, data
and results required. This may mean that the performance results in Sanlamscase will not
necessarily match the results for other organisations/companies.
Key Success Factors
The results of the simulations will need to be compared to existing data of what the simulations are
trying to predict. The comparisons will be used to rank each technique according to accuracy of itsresults. All simulations will be expected to complete within an acceptable time frame (performance
and scalability are out of scope for this project but each implementation will need to run within a
determined acceptable time, thus making performance negligible in determining the best technique
to use).
-
8/12/2019 Project Proposal Compiled
8/13
Project Plan
Risk Management
The risks that follow are to be evaluated based on the following risk Matrix
Probability
Low Medium High
Impact
Disastrous C B A
Serious D C B
Marginal E D C
Trivial F E D
The following table gives a breakdown of the predicted risks associated with this project, paying
special attention to their impact and probability. It also highlights 2 courses of action: Avoidance
that is an on-going process as well as mitigation should the risk materialize.
RiskMatrix
EvaluationAvoidance Mitigation
1. Loss of a project team
member. (This would
occur if one or more
members abandoned
the Honours
Programme for any
number of reasons)
D. Serious/
Low
Probability
Pressure to stay on the
project as failure to do so
means not graduating.
Have sufficiently
independent deliverable
modules for each team
member.
2. Delay in Delivery of testdata. (Dependent on
Sanlam for Data-
External factor)
C.Disastrous/
Low
Probability
Pressure Sanlam to
provide data as soon as
possible.
Create random test data
or use alternative
available data.
3. Scope creep (Plan too
many tasks, Cannot
complete tasks in time)
E. Marginal/
Low
Probability
Project planned in detail
with supervisor and
department approval.
Start with fundamental
features first and leave
other things to the end.
4.Data loss due to
hardware failure,
(External Factor)
C.Serious/
Medium
Probability
Frequent backups of all
progress on different
machines or storage
devices.
Roll back to last backup.
5.
Missing project
deadlines
C.Serious/
Medium
Probability
Constant reference to the
project timeline and clear
communication between
project members
Review and reassess
deadlines; readjusting
where necessary- as
cost-effectively as
possible.
6. Misunderstanding User
requirements.
(Resultant of
miscommunication/ambiguity in user-team
interaction)
D.Serious/
Low
Probability
Constant communication
with Sanlam to maintain
correct direction. Also,
providing Sanlam withproject plan and design in
order to detect flaws.
Iterations through
development so that
inconsistencies can bedetected early.
-
8/12/2019 Project Proposal Compiled
9/13
Timel ine & Gantt Chart
-
8/12/2019 Project Proposal Compiled
10/13
-
8/12/2019 Project Proposal Compiled
11/13
Resources Required
The resources required to complete the project are fairly standard, with the software and
equipment in the Honours Lab sufficing for development. Apart from this though, Joseph
Balikuddembe is necessary as a representative of Sanlam and as co-supervisor for the project.
Furthermore, the data regarding customers and sales that Sanlam will provide is crucial to the
project development.
Necessary Resources:
PCs
Sanlam Database Access
Java Development Platform
Deliverables
The following Table illustrates a detailed list of the deliverables necessary for the completion of thisproject:
Deliverable: Description:
Final Project Proposal Final copy of Proposal for evaluation.
Project Proposal Presentation Presentation to supervisor and class.
Project Web Presence Online availability of proposal and project timeline.
Project Poster Poster representation of Project.
Project Web Page Open Availability of Project Webpage.
Project Report A report on the results of the research.
Project Application The actual project.
Further detail as to dates can be seen as per the Milestones.
-
8/12/2019 Project Proposal Compiled
12/13
-
8/12/2019 Project Proposal Compiled
13/13
References
[1] Sebastiani P., Ramoni M., Crea A. Profiling your Customers using Bayesian Networks. SIGKDDExplorations 1(2). 9197.
[2] Elfalfi A., Haque R., Elalami M. Extracting rules from trained neural network using GA formanaging E-business.Applied Soft Computing 4. 65-77
[3] Baesens, B., Verstraeten, G., Van Den Poel, D., Egmont-Petersen, M., Van Kenhove, P. AndVanthienen, J. 2004. Bayesian network classifiers for identifying the slope of the customer lifecycleof long-life customers. European Journal of Operational Research 156, 508-523.
[4] Baesens, B., Viaene, S., Van Den Poel, D., Vanthienen, J. And Dedene, G. 2002. Bayesianneural network learning for repeat purchase modelling in direct marketing. European Journal ofOperational Research 138, 191-211.
[5] Ishigaki T., Motomura Y., Dohi M., Kouchi M., Mochimaru M. Knowledge Extraction byProbabilistic Cognitive Structure Modeling Using a Bayesian Network for Use by a Retail Service.MEDES October 2009. 141-149
[6] Chang P, Lio C, Fan C. Data clustering and fuzzy neural network for sales forecasting: A casestudy in printed circuit board industry. Knowledge Based Systems 22. 344- 355.