an investigation into the free/open source software phenomenon using data mining, social network...

43
An Investigation into the An Investigation into the Free/Open Source Software Free/Open Source Software Phenomenon using Data Phenomenon using Data Mining, Social Network Mining, Social Network Theory, and Agent-Based Theory, and Agent-Based Greg Madey Computer Science & Engineering University of Notre Dame UIUC - NSF Workshop on Continuous (Re)Design of Open Source Software University of Illinois, Urbana-Champaign October 8-9, 2003 This research was partially supported by the US National Science Foundation, CISE/IIS-Digital Society & Technology, under Grant No. 0222829

Upload: gervais-allison

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

An Investigation into the An Investigation into the Free/Open Source Software Free/Open Source Software

Phenomenon using Data Phenomenon using Data Mining, Social Network Mining, Social Network

Theory, and Agent-Based Theory, and Agent-Based

Greg MadeyComputer Science & Engineering

University of Notre Dame

UIUC - NSF Workshop on Continuous (Re)Design ofOpen Source Software

University of Illinois, Urbana-ChampaignOctober 8-9, 2003

This research was partially supported by the US National Science Foundation, CISE/IIS-Digital Society & Technology, under Grant No. 0222829

Page 2: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

ContributorsContributors• Vincent Freeh, Computer Science, North Carolina State University Vincent Freeh, Computer Science, North Carolina State University

(Principal Investigator)(Principal Investigator)• Yongqin Gao, Computer Science and Engineering, University of Yongqin Gao, Computer Science and Engineering, University of

Notre Dame (Graduate Student)Notre Dame (Graduate Student)• Jeff Goett, University of Notre Dame (REU Student)Jeff Goett, University of Notre Dame (REU Student)• Chris Hoffman, University of Notre Dame (REU Student)Chris Hoffman, University of Notre Dame (REU Student)• Nadir Kiyanclar, University of Notre Dame (REU Student)Nadir Kiyanclar, University of Notre Dame (REU Student)• Greg Madey, Computer Science & Engineering, University of Notre Greg Madey, Computer Science & Engineering, University of Notre

Dame (Principal Investigator)Dame (Principal Investigator)• Patrick McGovern, Director SourceForge.net, VA Software Patrick McGovern, Director SourceForge.net, VA Software

(Industrial Collaborator)(Industrial Collaborator)• Carlos Siu, University of Notre Dame (REU Student)Carlos Siu, University of Notre Dame (REU Student)• Renee Tynan, Department of Management, College of Business, Renee Tynan, Department of Management, College of Business,

University of Notre Dame (Principal Investigator)University of Notre Dame (Principal Investigator)• Jin Xu, Computer Science & Engineering, University of Notre Dame Jin Xu, Computer Science & Engineering, University of Notre Dame

(Graduate Student)(Graduate Student)

Page 3: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

OutlineOutline

• Research approachResearch approach• Tools and definitions: Agents, models, Tools and definitions: Agents, models,

simulations, collaborative social networks, simulations, collaborative social networks, computer experimentscomputer experiments

• Data collection and analysisData collection and analysis• Example research questionExample research question• SimulationSimulation• Computer experimentsComputer experiments• ResultsResults

Page 4: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

One Approach to One Approach to Researching F/OSSDResearching F/OSSD

• Online dataOnline data– Screen scrapingScreen scraping– Database dumpsDatabase dumps

• ModelingModeling– Social network theorySocial network theory– Evolutionary assumptionsEvolutionary assumptions

• SimulationSimulation– Verification and validationVerification and validation– Computer experimentsComputer experiments

• Variation of Classical Scientific MethodVariation of Classical Scientific Method

Page 5: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Classical Scientific Classical Scientific MethodMethod

1.1. Observe the worldObserve the worlda)a) Identify a puzzling phenomenonIdentify a puzzling phenomenon

2.2. Generate a falsifiable hypothesis Generate a falsifiable hypothesis (K. Popper)(K. Popper)

3.3. Design and conduct an experiment with Design and conduct an experiment with the goal of disproving the hypothesisthe goal of disproving the hypothesisa)a) If the experiment “fails”, then the hypothesis If the experiment “fails”, then the hypothesis

is accepted (until replaced)is accepted (until replaced)b)b) If the experiment “succeeds”, then reject If the experiment “succeeds”, then reject

hypothesis, but additional insight into the hypothesis, but additional insight into the phenomenon may be obtained and steps 2-3 phenomenon may be obtained and steps 2-3 repeatedrepeated

Page 6: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

The Computer The Computer ExperimentExperiment

Page 7: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Agent-Based Simulation as Agent-Based Simulation as a Component of the a Component of the

Scientific MethodScientific MethodModeling(Hypothesis)

Agent -BasedSimulation(Experiment)

Observation

Page 8: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Agent-Based Simulation as Agent-Based Simulation as a Component of the a Component of the

Scientific MethodScientific MethodModeling(Hypothesis)

Agent -BasedSimulation(Experiment)

Observation

Social NetworkModel of F/OSS

Grow ArtificialSourceForge

Analysis ofSourceForge

Data

Page 9: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Agent-Based Modeling and SimAgent-Based Modeling and Simulationulation

• Conceptual models of a phenomenonConceptual models of a phenomenon• Simulations are computer implementations of Simulations are computer implementations of

the conceptual modelsthe conceptual models• Agents in models and simulations are distinct Agents in models and simulations are distinct

entities (instantiated objects)entities (instantiated objects)– Tend to be simple, but with large numbers of them Tend to be simple, but with large numbers of them

(thousands, or more) - i.e., swarm intelligence(thousands, or more) - i.e., swarm intelligence– Contrasted with higher level AI “intelligent agents”Contrasted with higher level AI “intelligent agents”

• Foundations in complexity theoryFoundations in complexity theory– Self-organizationSelf-organization– EmergenceEmergence

Page 10: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Collaborative Social NetwCollaborative Social Networksorks• Research-paper co-authorship, small world phenomenon, e.g., Erdos Research-paper co-authorship, small world phenomenon, e.g., Erdos

number number (Barabasi 2001, Newman 2001)(Barabasi 2001, Newman 2001)

• Movie actors, small world phenomenon, e.g., Kevin Bacon number Movie actors, small world phenomenon, e.g., Kevin Bacon number (Watts (Watts 1999, 2003)1999, 2003)

• Interlocking corporate directorshipsInterlocking corporate directorships• Terrorist NetworksTerrorist Networks• Open-source software developers Open-source software developers (Madey et al, AMCIS 2002)(Madey et al, AMCIS 2002)

• Collaborators are nodes in a graph, and collaborative relationship are the Collaborators are nodes in a graph, and collaborative relationship are the edges of the graph => a framework to model data/phenomenonedges of the graph => a framework to model data/phenomenon

Page 11: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

SourceForgeSourceForge

• VA Software• Part of OSDN• Started 12/1999• Collaboration tools• 70,000 Projects• 90,000 Developers• 700,00 Registered Users

Page 12: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

SavannahSavannah• SourceForge Software? • Free Software Foundation•1,600 Projects•16,000 Registered Users

Page 13: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

ObservationsObservations

• Web miningWeb mining• Web crawler (scripts)Web crawler (scripts)

– PythonPython– PerlPerl– AWKAWK– SedSed

• MonthlyMonthly• Since Jan 2001 Since Jan 2001 • ProjectIDProjectID• DeveloperIDDeveloperID• Almost 2 million recordsAlmost 2 million records• Relational databaseRelational database

PROJ|DEVELOPER8001|dev3788001|dev89758001|dev99728002|dev276508005|dev313518006|dev125098007|dev193958007|dev46228007|dev356118008|dev8975

Page 14: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Collaboration NetworksCollaboration Networks

Adapted from Newman, Strogatz and Watts, 2001

Page 15: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

15850 dev[46]dev[83] 15850 dev[46]

dev[48]

15850 dev[46]dev[56]

15850 dev[46]dev[58]

6882 dev[58]dev[47]

6882 dev[47]dev[79]

6882 dev[47]dev[52]

6882 dev[47]dev[55]

7028 dev[46]dev[99]

7028 dev[46]dev[51]

7028 dev[46]dev[57]

7597 dev[46]dev[45]

7597 dev[46]dev[72]

7597 dev[46]dev[55]

7597 dev[46]dev[58]

7597 dev[46]dev[61]

7597 dev[46]dev[64]7597 dev[46]

dev[67]

7597 dev[46]dev[70]

9859 dev[46]dev[49]9859 dev[46]

dev[53]

9859 dev[46]dev[54]

9859 dev[46]dev[59]

dev[46]

dev[83] dev[56]

dev[48]

dev[52]

dev[79]

dev[72]

dev[51]

dev[57]

dev[55]

dev[99]

dev[47]

dev[58]

dev[53]

dev[58]

dev[65]

dev[45]

dev[70]

dev[67]

dev[59]

dev[54]

dev[49]

dev[64]

dev[61]

Project 6882

Project 9859

Project 7597

Project 7028

Project 15850

F/OSS Developers - Collaboration Social NetworkDevelopers are nodes / Projects are links

24 Developers5 Projects

2 Linchpin Developers1 Cluster

Page 16: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Topological Analysis of Topological Analysis of the Datathe Data

• Statistics inspectedStatistics inspected– DiameterDiameter– Average degreeAverage degree– Clustering coefficientClustering coefficient– Degree distributionDegree distribution– Cluster size distributionCluster size distribution– Relative size of major clusterRelative size of major cluster– Fitness and life cycleFitness and life cycle

• Evolution of these statisticsEvolution of these statistics• Dual networks Dual networks

– developer network and project networkdeveloper network and project network

Page 17: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

TerminologyTerminology• DiameterDiameter

– Average length of shortest paths between all pairs of verticesAverage length of shortest paths between all pairs of vertices• DegreeDegree

– The count of edges connected to given vertexThe count of edges connected to given vertex• Average degreeAverage degree

– Average of the degrees of all vertices in the networkAverage of the degrees of all vertices in the network• ClusterCluster

– The connected components of the networkThe connected components of the network• Clustering coefficient (CC)Clustering coefficient (CC)

– CCCCii: Fraction representing the number of links actually present re: Fraction representing the number of links actually present relative to the total possible number of links among the vertices in ilative to the total possible number of links among the vertices in its neighborhood.ts neighborhood.

– CC: average of all CCCC: average of all CCii in a network in a network• Degree distributionDegree distribution

– The distribution of degrees throughout a networkThe distribution of degrees throughout a network• Major clusterMajor cluster

– The largest cluster in the networkThe largest cluster in the network

Page 18: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Degree Distribution: Degree Distribution: DevelopersDevelopers

Page 19: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Degree Distribution: Degree Distribution: ProjectsProjects

Page 20: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Diameter of Developer Diameter of Developer Network vs. TimeNetwork vs. Time

• Network Network size size increased increased from from 30,000 to 30,000 to 70,00070,000

Page 21: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Diameter of Project Diameter of Project Network vs. TimeNetwork vs. Time

• Network size inNetwork size increased from 2creased from 20,000 to 50,000.0,000 to 50,000.

• Diameter decreDiameter decreasing with time asing with time both for develoboth for developer network anper network and project netwd project networkork

Page 22: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Clustering Coefficient of Clustering Coefficient of Developer Network vs. TimeDeveloper Network vs. Time

Page 23: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Clustering Coefficient of Clustering Coefficient of Project Network vs. TimeProject Network vs. Time

Page 24: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Cluster Size DistributionCluster Size Distribution

• RR22 with with major major cluster is cluster is 0.74260.7426

• RR22 without without major major cluster is cluster is 0.9799 0.9799

Page 25: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Relative Size of Major Relative Size of Major Cluster vs. TimeCluster vs. Time

• Increase of Increase of the relative the relative size of the size of the major major clustercluster

• ApproachinApproaching steady-g steady-state?state?

Page 26: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

An Example Research An Example Research QuestionQuestion

• What processes can explain the evolution What processes can explain the evolution of the project and developer social of the project and developer social networks?networks?– Randomly growing network (Erdos-Reyni, Randomly growing network (Erdos-Reyni,

1960)?1960)?– Evolving network with preferential attachment Evolving network with preferential attachment

(Barabasi-Albert, 1999)?(Barabasi-Albert, 1999)?– Evolving network with preferential attachment Evolving network with preferential attachment

and fitness (Barabasi-Albert, 2001)?and fitness (Barabasi-Albert, 2001)?– Others?Others?

Page 27: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Computer ExperimentsComputer Experiments

• Agent-based simulationsAgent-based simulations• Java programs using Swarm class libraryJava programs using Swarm class library

– Validation (docking) exercises using Java/RepastValidation (docking) exercises using Java/Repast

• Grow artificial SourceForge’s Grow artificial SourceForge’s (Epstein & Axtell, (Epstein & Axtell, 1996)1996)

– Parameterized with observed data, e.g., developer Parameterized with observed data, e.g., developer behaviorsbehaviors• Join ratesJoin rates• New project additionsNew project additions• Leave projectsLeave projects

– Evaluation of multiple models (hypotheses)Evaluation of multiple models (hypotheses)– Verification/validation Verification/validation

Page 28: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Cycles of Modeling & Cycles of Modeling & SimulationSimulation

Modeling(Hypothesis)

Agent -BasedSimulation(Experiment)

Observation

Social Network ModelsER => BA => BA+Fitness => BA+Dynamic Fitness

Grow ArtificialSourceForge

Analysis ofSourceForge

Data

Degree DistributionAverage Degree

DiameterClustering Coefficient

Cluster Size Distribution

Page 29: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Model for SourceForgeModel for SourceForge

• ABM based on bipartite graphABM based on bipartite graph• Model descriptionModel description

– Agent: developerAgent: developer– Behaviors: Create, join, abandon and idleBehaviors: Create, join, abandon and idle– Preference: developer’s and project’sPreference: developer’s and project’s– FitnessFitness

• Four models in iterationsFour models in iterations– ER, BA, BA with constant fitness and BA with ER, BA, BA with constant fitness and BA with

dynamic fitnessdynamic fitness

• Comparison of empirical and simulated Comparison of empirical and simulated datadata

Page 30: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

ER Model – Degree ER Model – Degree DistributionDistribution

• Degree Degree distribution distribution is normal is normal distribution distribution while it is while it is power law power law in empirical in empirical datadata

• Fit Fails!Fit Fails!

Page 31: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

ER Model - DiameterER Model - Diameter• Average degree Average degree

is decreasing is decreasing while it is while it is increasing in increasing in empirical dataempirical data

• Diameter is Diameter is increasing increasing while it is while it is decreasing in decreasing in empirical dataempirical data

• Fit Fails!Fit Fails!

Page 32: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

ER Model – Clustering ER Model – Clustering CoefficientCoefficient

• Clustering Clustering coefficient is coefficient is relatively low relatively low under 0.3 under 0.3 while it is while it is around 0.7 in around 0.7 in empirical data.empirical data.

• Fit fails!Fit fails!

Page 33: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

ER Model – Cluster Size ER Model – Cluster Size DistributionDistribution

• Power law Power law distribution with distribution with RR22 as 0.6667 as 0.6667 (0.9653 without (0.9653 without the major cluster) the major cluster) while Rwhile R22 in in empirical data is empirical data is 0.7426 (0.9799 0.7426 (0.9799 without the major without the major cluster)cluster)

• The actual The actual distribution is distribution is different from different from empirical dataempirical data

• Fit Fails!Fit Fails!

Page 34: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

BA Model – Degree BA Model – Degree DistributionDistribution

• Power laws in degree Power laws in degree distributions, similar to distributions, similar to empirical data (o for empirical data (o for simulated data and x simulated data and x for empirical data).for empirical data).

• For developer For developer distribution: simulated distribution: simulated data has Rdata has R22 as 0.9798 as 0.9798 and empirical data has and empirical data has RR22 as 0.9714. as 0.9714.

• For project For project distribution: simulated distribution: simulated data has Rdata has R22 as 0.6650 as 0.6650 and empirical data has and empirical data has RR22 as 0.9838. as 0.9838.

• Partial Fit!Partial Fit!

Page 35: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

BA Model – Diameter and BA Model – Diameter and Clustering CoefficientClustering Coefficient

• Small diameter Small diameter and high and high clustering clustering coefficient like coefficient like empirical dataempirical data

• Diameter and Diameter and clustering clustering coefficient are coefficient are both decreasing both decreasing like empirical like empirical datadata

• Good Fit!Good Fit!

Page 36: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

BA Model with Constant BA Model with Constant FitnessFitness

• Power laws in degree Power laws in degree distributions, similar to distributions, similar to empirical data (o for empirical data (o for simulated data and x for simulated data and x for empirical data).empirical data).

• For developer distribution: For developer distribution: simulated data has Rsimulated data has R22 as as 0.9742 and empirical data 0.9742 and empirical data has Rhas R22 as 0.9714. as 0.9714.

• For project distribution: For project distribution: simulated data has Rsimulated data has R22 as as 0.7253 and empirical data 0.7253 and empirical data has Rhas R22 as 0.9838. as 0.9838.

• Improved fit!Improved fit!

Page 37: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Discovery: Project Life Discovery: Project Life CycleCycle

Page 38: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

BA Model with Dynamic BA Model with Dynamic FitnessFitness

• Power laws in degree Power laws in degree distribution, similar to distribution, similar to empirical data (o for empirical data (o for simulated data and x simulated data and x for empirical data).for empirical data).

• For developer For developer distribution: simulated distribution: simulated data has Rdata has R22 as 0.9695 as 0.9695 and empirical data has and empirical data has RR22 as 0.9714. as 0.9714.

• For project distribution: For project distribution: simulated data has Rsimulated data has R22 as 0.8051 and empirical as 0.8051 and empirical data has Rdata has R22 as 0.9838. as 0.9838.

• Somewhat better fit!Somewhat better fit!

Page 39: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Models of the F/OSS Social Models of the F/OSS Social NetworkNetwork

(Alternative Hypotheses)(Alternative Hypotheses)• General model featuresGeneral model features– Agents are nodes on a graph (developers or projects) Agents are nodes on a graph (developers or projects) – Behaviors: Create, join, abandon and idleBehaviors: Create, join, abandon and idle– Edges are relationships (joint project participation)Edges are relationships (joint project participation)– Growth of network: random or types of preferential Growth of network: random or types of preferential

attachment, formation of clustersattachment, formation of clusters– FitnessFitness – Network attributes: diameter, average degree, Network attributes: diameter, average degree,

degree distribution, clustering coefficientdegree distribution, clustering coefficient• Four specific modelsFour specific models

– ER (random graph) - (1960)ER (random graph) - (1960)– BA (preferential attachment) - (1999)BA (preferential attachment) - (1999)– BA ( + constant fitness) - (2001)BA ( + constant fitness) - (2001)– BA ( + dynamic fitness) - (2003)BA ( + dynamic fitness) - (2003)

Page 40: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

SummarySummary

Page 41: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

SummarySummary

• Why Agent-Based Modeling and Simulation?Why Agent-Based Modeling and Simulation?– Can be used as components of the Scientific MethodCan be used as components of the Scientific Method– A research approach for studying socio-technical syA research approach for studying socio-technical sy

stemsstems• Case study: F/OSS - Collaboration Social NetworCase study: F/OSS - Collaboration Social Networ

ksks– SourceForge conceptual models: ER, BA, BA with coSourceForge conceptual models: ER, BA, BA with co

nstant fitness and BA with dynamic fitness.nstant fitness and BA with dynamic fitness.– Simulations Simulations

• Computer experiments that tested conceptual modelsComputer experiments that tested conceptual models• Provided insight into the phenomenon under study and gProvided insight into the phenomenon under study and g

uided data mining of collected observationsuided data mining of collected observations

Page 42: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

QuestionsQuestions

• Validity of approachesValidity of approaches– Social networksSocial networks– SimulationSimulation

• Value/Utility of approachsValue/Utility of approachs• Applicability to other areas of F/OSS Applicability to other areas of F/OSS

researchresearch– Project sites, e.g., Mozilla.orgProject sites, e.g., Mozilla.org– Individual projects, e.g., Linux kernelIndividual projects, e.g., Linux kernel

Page 43: An Investigation into the Free/Open Source Software Phenomenon using Data Mining, Social Network Theory, and Agent-Based Greg Madey Computer Science &

Thank youThank you