recommender systems bener

Download Recommender systems   bener

Post on 16-Apr-2017




0 download

Embed Size (px)


  • Recommender Systems: Challenges and Opportunities

    Ayse Bener January 22, 2015

  • } Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didnt know how to ask for, finds you. CNN Money, The race to create smart Google.

    From Search to Recommendation

  • Recommender problem

    the user is } Consumer } Subscriber } Member

    Estimate a utility function to predict

    how a user will like an item

    the item is } Movie } Apps } Travel destinations

  • Recommender

    } A good recommendation } Relevant to the user } Personalized } Diverse }Expands the user's taste into neighboring

    areas (serendipity unsought finding)

  • Paradigm of Recommender Systems } Recommender systems reduce information overload by

    estimating relevance } Collaborative filtering : What is popular in a community

    } User profile & community information

    } Content Based: Provides more of what user liked before } User profile & Item profile

    } Knowledge Based : What is best based on the users needs } User profile & Item profile & Knowledge Model

    } Hybrid Method: Combination of inputs and/or composition of different methods } User profile & Item profile & knowledge Model & Community


  • Recommender Systems Challenges } Dealing with Big Data problems

    } Lack of Useful Data } Unstructured data

    } Missing Data } New user and New Item

    } Cold Start problem } Temporality

    } Changing Data } Changing user preferences and biases } Negative choices

    } Evaluating Recommenders

  • Main Research Issues } Understanding the context and modeling context } Algorithms } Evaluation } Engineering

  • Bayesian Networks For Evidence-Based Decision-Making in Software Engineering

    Ayse Tosun Misirli, and Ayse Bener, IEEE Transactions on Software Engineering, vol.40, no.6., June 2014

  • Recommendation systems for software engineering (RSSE)

    } Recommendation systems/ prediction models should be designed in a way that they are capable of integrating evidence, i.e., facts and probabilities systematically collected or measured from real data and observations, into practitioners experience.

    } In this study, we follow the lead of computational biology and healthcare decision-making, and investigate the applications of BNs in SE

  • The Bayesian Approach } Provides a natural statistical framework for evidence-based

    decision-making by incorporating an integrated summary of the available evidence and associated uncertainty (of consequences) } Maintaining observations, statistical distributions, prior

    assumptions, and expert judgment in a single model } Encoding causal relationships among variables for predicting

    future actions } information propagation through the network, i.e., gaming

    over the network to see all possible scenarios and their outcomes to give the best action

    } imitating the process of human thinking, while going beyond the capabilities of human reasoning with a fact-based, error-free intelligence through the usage of enormous amounts of historical data

  • Example of a simple BN with different variable types

  • Systematic Mapping of BNs in SWE

    } To investigate the applications of BNs in SE } main software

    engineering challenges addressed

    } techniques used to learn causal relationships among variables

    } techniques used to infer the parameters

    } variable types used as BN nodes

  • Empirical Analysis on Bayesian Decision-Making

    } Hybrid Bayesian Network that would solve a specific software engineering challenge } predicting software reliability in terms of post-release defects

    } a mixeddata model to represent software life cycle phases by incorporating expert judgment (qualitative data through surveys) into quantitative data collected from software repositories

    } a hybrid BN that incorporates both continuous and categorical variables

  • Demographics for Two Software Companies

  • BN Models in this Study

  • Model Representation

    Model #1

    Model #2

    Model #3

  • Graphical Representation of BN (Co. A)

  • Graphical Representation of BN (Co. B)

  • Setting Prior Distributions } Model #1

    } expert knowledge } Model #2

    } Lilliefors significance test on all variables and on post release defects

    } normal probability plots } Model #3

    } The requirements specification subnet whose distributions were set based on expert knowledge is used, and it is incorporated with the development and testing subnet in Model #2 whose variables are assigned different distributions based on the significance tests

  • Structure Learning } Expert Judgement } Chi-plot

    } Independence betwen two variables } Copula models- a transformation of data with marginal distributions } Prior to modeling it is necessary to chack the presence of dependence

    there is a positive monotone dependence between test cases and post release defects as data pairs are shifted towards right from the center

  • Inference } Bayesian learning for complex models using Monte Carlo

    methods, especially Gibbs sampling } insufficient statistics } incomplete data } successively sample from posterior distribution of each

    node in a Bayesian model given all the others as full conditionals

    } successful when estimating the unknown parameters of probability distributions or when conducting empirical analysis to infer true values of a given sample

    } enables to make predictions for future scenarios even though some of the input variables are missing

  • Prediction Performance of the Models

  • Threats to Validity } Internal validity

    } biases during data collection } Used scripts to extract data } Eliminated outliers } BNs for causality and to avoid over-fitting

    } Construct validity } Large set of metrics were chosen } Well-known performance measures are used

    } Conclusion validity } Non-parametric test (Mann-Whitney U-test), ANOVA, t-test were used

    } External validity } we aim to transfer the methodology behind BN construction to enhance the

    usage of these graphical, probabilistic models in software engineering

  • Conclusions } Similar to computational biology and healthcare, we need

    to make decisions under uncertainty using multiple data sources

    } As we understand the dynamics of BNs and the techniques used for model learning, these models would enable us to uncover hidden relationships between variables, which cannot be easily identified by experts

    } Understanding the theory behind BNs also gives us the opportunity to adopt these models to different industrial settings by changing the set of metrics, their distributions, and causal relationships among variables

  • Conclusions } An integrated tool support (intelligent software delivery

    platform) } Dione to be integrated to IBM Rational