project proposal compiled

Upload: aukjidu

Post on 03-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Project Proposal Compiled

    1/13

    Business Intelligence IntegrationJoel Da Costa, Takudzwa Mabande, Richard Migwalla

    Antoine Bagula, Joseph Balikuddembe

  • 8/12/2019 Project Proposal Compiled

    2/13

    Project Description

    Business Intelligence (BI) is the practice of using computer software to aid data analysis and

    decision making in businesses. It represents a set of processes, tools and technologies which

    improve productivity, sales and service of an enterprise, and so profitability in general. BI works

    primarily by collecting, organizing and analyzing corporate data and then creating useful

    knowledge out this analysis (reporting). BI as a whole incorporates a wide spectrum of software

    functions including ad-hoc querying, on-line analytical processing (OLAP), dashboards,

    scorecards, search, visualization and more.

    BI differentiates itself through its interdepartmental focus and general overview which is geared

    towards total business performance. The implementation of BI gives knowledge and understanding

    to departmental groups which previously may not have had access to or understanding of the data.

    Increased analytics and ad hoc reporting allow organisations to better understand trends within

    their business and apply a variety of different measures and attributes to understanding these

    trends. Once the BI system has been implemented, a company will typically find it has more ideas

    for new initiatives, more efficient and precise data collection processes, more effective marketingtechniques and a better understanding of its customersneeds and characteristics, and a better

    understanding of the state of the market. This improved business agility and efficiency through BI

    results in a long term performance gain which can result in significant profit increases.

    The BI system itself is typically segmented into several key areas. The first is Business Modelling

    in order to create the framework of the system and how the information flows need to be

    established. Data warehouses are used as a centralized repository for all the data gathered, and

    maintained through the 'Extraction, Transformation and Loading' (ETL) processes. OLAP is a

    technique by which the data sourced from the data warehouse is visualized and summarized to

    provide a perspective view across multiple dimensions in order to quickly answer multi-dimensionalqueries. Essentially, OLAP tells a business what has happened, and Data mining explains why it

    happened, and what is likely to happen in the future based on past patterns.

  • 8/12/2019 Project Proposal Compiled

    3/13

    Problem Statement

    The project is going to focus on the underlying technologies which enable Business Intelligence

    (BI) and their application to two key scenarios. Previously, various technologies have been

    developed and implemented from a one size fits all approach, but this approach is likely to result

    in less effective and accurate analysis. Different areas and analyses require more adaptability

    rather than such a singular approach. Our aim then is to evaluate which technologies would be themost effective for the particular cases. The technologies being evaluated are Bayesian Belief

    Networks, Neural Networks and Artificial Immune Systems which will be expanded on later in the

    proposal.

    The project will be done in cooperation with Sanlam, who will provide the necessary data to be

    analysed.

    The first case is to analyse customer data in order to create profiles of them so that they may be

    targeted with the correct marketing techniques. By doing so, it would allow an increase in sales as

    well as a decrease in the cost of marketing. Using the data provided, the 3 different technologies

    will be applied to try and gain the most accurate customer profiles.

    The second case is Predictive Sales Forecasting. Using historical and current data, the same 3

    technologies will be applied to try and create an accurate forecast of the future trends. Forecasting

    allows better business decisions to be made and mitigation to be taken in order to improve the

    likely outcome.

    While the cases operate individually, they are all being implemented with the same aim. What we

    want to ascertain from the project results is the variance of each approach's results when

    measured against the same data and also bench-marked on the known sales figures. This will help

    define the strengths and weaknesses of the particular technologies in developing BI functions.

  • 8/12/2019 Project Proposal Compiled

    4/13

    Procedures and Methods

    This project is primarily designed as a research venture, with the main objective being the

    synthesis of usable research results, as per the Problem statement. The scope does however

    extend beyond that, as shall be illustrated through the following breakdown of procedures and

    objectives.

    Implementation will occur in the form of a java application. It will make use of 3 different intelligent

    systems to analyse historical data provided by Sanlam, to predict the required output.

    Rationale for cho sen approach

    Before considering the actual approach that will be used in addressing the Problem statement, it is

    first necessary to mention the reasoning behind the choice of algorithms. As the research

    conducted indicated, industry tends to favour the use of these 3 algorithms, particularly exhibiting a

    distinct liking to Bayesian Belief networks. Furthermore, as per the initial meeting with the Sanlam

    representative, these 3 algorithms are of particular interest to Sanlam. More detail as to industrys

    use of these algorithms can be seen in the related works section.

    Thus, the next step is to elaborate on the chosen approach, i.e. the choice to address the problem

    in the form of an application. The following points summarize the motivation behind this:

    Application development allows further extensibility: By choosing to develop this project in

    the form of an application, there is more room to generalize and adapt the application,

    making it useful in other spheres of Business.

    Extensibility also allows room for improvement. Thus, developing this application allows

    room for continuation, evolution and progress.

    Providing Sanlam with a concrete showing of the results obtained, as well as how they were

    obtained is also reason for developing this application.

    Recreating the results of this research experiment will also be made easier given the

    platform of an application.

    Development process

    Because of the collaborative nature of this project, it is key that the primary stakeholders i.e.

    Sanlam, submit a clear description of their requirements and expectations. For this reason, the

    project will involve a Sanlam delegate, as well as the project team. Meetings will be held with the

    delegate in order to generate a specific set of user requirements from which the solution can be

    derived.

    Once the requirements have been finalised, the next phase will then be implemented. This will

    consist of developing the application which will model abstractions of the selected intelligent

    systems. The application will have various forms of clientele information, (provided by Sanlam) as

    input. This information will include elements such as Incomes, Premiums as well as purchasing

    history, to name a few. Based on this input, the application will then use the embedded intelligent

    systems to generate output, offering the user the choice as to which algorithm is applied in the

    simulation. This functionality will thus allow for comparison of results. The output will be displayed

    in a format that is relevant to business users, and a graphical user interface will be implemented as

    part of the application. By hiding a significant portion of the underlying technicalities, and displaying

    only what is relevant to shareholders and other business analysts the interface will thus achieve its

    functionality (more detail on this is provided later).

  • 8/12/2019 Project Proposal Compiled

    5/13

    Ethical, Professional and Legal Issues

    Ethical Issues

    We will be using Sanlam sales and customer data which is to remain confidential. It may not be

    redistributed to any external parties and no personal information may be extracted for use outside

    the project. For demonstration purposes, the software may not display personal information that

    may lead to the identification of particular individuals. This information may be used in thegeneration of results/forecasts but it will be abstracted with the use of IDs for names if necessary.

    Legal Issues

    All sales and customer data from Sanlam must be kept private within the realms of the project. Any

    copies of the database must be deleted once testing is completed and may not be archived outside

    of Sanlam. No copies of the database may be created for use outside the project for any purpose.

  • 8/12/2019 Project Proposal Compiled

    6/13

    Related Work

    Customer Prof i l ing

    Sebastiani et al. used Bayesian Networks to profile customers in order to predict profits. They usedtwo networks: the first to describe the probability of response from customers, and the second tomodel price factors. The results were reasonable, and by understanding the characteristics ofcustomers, the models thus help to potentially increase profits [1].

    Similar work has been done by Elalfi et al. who combined Bayesian networks with geneticalgorithms. An algorithm was used to extract accurate and comprehensible rules from a databaseusing trained artificial neural networks, which in turn were trained by genetic algorithms to find theoptimal values for the model. These rules were then used to define customer profiles in order tomake for more profitable e-business [2].

    Customer l i fe cycles

    Baesens et al. introduces a measure of a customers future spending evolution that might improverelationship marketing decision making. The method suggested predicts whether a customer willincrease or decrease spending from their initial purchase information. It had a 75% classificationaccuracy in predicting the customer lifecycle using purchase volume and purchase category [3].

    Repeat Purch ase Model ing

    Baesens et al. focuses on the need for companies such as mail-order companies to identify whichcustomers are most likely to purchase before they send out costly catalogues. This involvesprofiling customers according to several parameters and calculating the probability of repurchase.A Bayesian Neural Network was used and had a correct classification result of 71% given the data

    set used [4].

    Model l ing Customer Att i tudes

    Ishigaki et al. use Bayesian networks to model customer attitudes based on questionnaire data.The model can then be used to gauge customersfeelings towards a product, and how they shouldbe marketed to. The model was fairly successful with a 73.5% success rate on testing [5].

    Sales Forecast ing

    Recently, Chang et al. developed on the idea of sales forecasting by including clustering in the

    model. The K-mean technique is used to cluster the data, which is then used with a fuzzy neuralnetwork, which once trained, can generate sales forecasts. The model proved very effective inproviding accurate forecasts, and was more accurate than a series of other models it was testedagainst [6].

  • 8/12/2019 Project Proposal Compiled

    7/13

    Anticipated Outcomes

    We will create a package that will read in data from the Sanlam database, use different machine

    learning techniques to profile customers and compare the accuracy of the different techniques

    using actual data.

    System

    The software will be composed of:

    An interface to the database that will read in relevant data.

    The core of the program that will contain three different Intelligent System techniques that a

    user can utilize.

    The front-end interface that will give the user results of the classification comparing actual

    data to inferred information.

    The major component will be the implementation of the different techniques. However, it will still be

    important to have good interfaces with the database and the user. The user interface will need to

    display interpretable information on the performance of each technique, which will entail

    aggregating the results in a way that a user will quickly and easily understand. It will need to allow

    changes in parameters to allow optimisation for particular data sets.

    Expected Impact

    We expect to identify the best machine learning technique to use for customer profiling and sales

    forecasting for Sanlam in particular. From our initial investigation it seems that Bayesian networks

    are very good classifiers (useful in customer profiling) and neural networks are very good

    forecasters. The performance of each technique however is highly dependent on the task, data

    and results required. This may mean that the performance results in Sanlamscase will not

    necessarily match the results for other organisations/companies.

    Key Success Factors

    The results of the simulations will need to be compared to existing data of what the simulations are

    trying to predict. The comparisons will be used to rank each technique according to accuracy of itsresults. All simulations will be expected to complete within an acceptable time frame (performance

    and scalability are out of scope for this project but each implementation will need to run within a

    determined acceptable time, thus making performance negligible in determining the best technique

    to use).

  • 8/12/2019 Project Proposal Compiled

    8/13

    Project Plan

    Risk Management

    The risks that follow are to be evaluated based on the following risk Matrix

    Probability

    Low Medium High

    Impact

    Disastrous C B A

    Serious D C B

    Marginal E D C

    Trivial F E D

    The following table gives a breakdown of the predicted risks associated with this project, paying

    special attention to their impact and probability. It also highlights 2 courses of action: Avoidance

    that is an on-going process as well as mitigation should the risk materialize.

    RiskMatrix

    EvaluationAvoidance Mitigation

    1. Loss of a project team

    member. (This would

    occur if one or more

    members abandoned

    the Honours

    Programme for any

    number of reasons)

    D. Serious/

    Low

    Probability

    Pressure to stay on the

    project as failure to do so

    means not graduating.

    Have sufficiently

    independent deliverable

    modules for each team

    member.

    2. Delay in Delivery of testdata. (Dependent on

    Sanlam for Data-

    External factor)

    C.Disastrous/

    Low

    Probability

    Pressure Sanlam to

    provide data as soon as

    possible.

    Create random test data

    or use alternative

    available data.

    3. Scope creep (Plan too

    many tasks, Cannot

    complete tasks in time)

    E. Marginal/

    Low

    Probability

    Project planned in detail

    with supervisor and

    department approval.

    Start with fundamental

    features first and leave

    other things to the end.

    4.Data loss due to

    hardware failure,

    (External Factor)

    C.Serious/

    Medium

    Probability

    Frequent backups of all

    progress on different

    machines or storage

    devices.

    Roll back to last backup.

    5.

    Missing project

    deadlines

    C.Serious/

    Medium

    Probability

    Constant reference to the

    project timeline and clear

    communication between

    project members

    Review and reassess

    deadlines; readjusting

    where necessary- as

    cost-effectively as

    possible.

    6. Misunderstanding User

    requirements.

    (Resultant of

    miscommunication/ambiguity in user-team

    interaction)

    D.Serious/

    Low

    Probability

    Constant communication

    with Sanlam to maintain

    correct direction. Also,

    providing Sanlam withproject plan and design in

    order to detect flaws.

    Iterations through

    development so that

    inconsistencies can bedetected early.

  • 8/12/2019 Project Proposal Compiled

    9/13

    Timel ine & Gantt Chart

  • 8/12/2019 Project Proposal Compiled

    10/13

  • 8/12/2019 Project Proposal Compiled

    11/13

    Resources Required

    The resources required to complete the project are fairly standard, with the software and

    equipment in the Honours Lab sufficing for development. Apart from this though, Joseph

    Balikuddembe is necessary as a representative of Sanlam and as co-supervisor for the project.

    Furthermore, the data regarding customers and sales that Sanlam will provide is crucial to the

    project development.

    Necessary Resources:

    PCs

    Sanlam Database Access

    Java Development Platform

    Deliverables

    The following Table illustrates a detailed list of the deliverables necessary for the completion of thisproject:

    Deliverable: Description:

    Final Project Proposal Final copy of Proposal for evaluation.

    Project Proposal Presentation Presentation to supervisor and class.

    Project Web Presence Online availability of proposal and project timeline.

    Project Poster Poster representation of Project.

    Project Web Page Open Availability of Project Webpage.

    Project Report A report on the results of the research.

    Project Application The actual project.

    Further detail as to dates can be seen as per the Milestones.

  • 8/12/2019 Project Proposal Compiled

    12/13

  • 8/12/2019 Project Proposal Compiled

    13/13

    References

    [1] Sebastiani P., Ramoni M., Crea A. Profiling your Customers using Bayesian Networks. SIGKDDExplorations 1(2). 9197.

    [2] Elfalfi A., Haque R., Elalami M. Extracting rules from trained neural network using GA formanaging E-business.Applied Soft Computing 4. 65-77

    [3] Baesens, B., Verstraeten, G., Van Den Poel, D., Egmont-Petersen, M., Van Kenhove, P. AndVanthienen, J. 2004. Bayesian network classifiers for identifying the slope of the customer lifecycleof long-life customers. European Journal of Operational Research 156, 508-523.

    [4] Baesens, B., Viaene, S., Van Den Poel, D., Vanthienen, J. And Dedene, G. 2002. Bayesianneural network learning for repeat purchase modelling in direct marketing. European Journal ofOperational Research 138, 191-211.

    [5] Ishigaki T., Motomura Y., Dohi M., Kouchi M., Mochimaru M. Knowledge Extraction byProbabilistic Cognitive Structure Modeling Using a Bayesian Network for Use by a Retail Service.MEDES October 2009. 141-149

    [6] Chang P, Lio C, Fan C. Data clustering and fuzzy neural network for sales forecasting: A casestudy in printed circuit board industry. Knowledge Based Systems 22. 344- 355.