basics of computation and modeling - lecture 2 in introduction to computational social science

48
BASICS OF COMPUTATION AND MODELING LECTURE 2, 2.9.2015 INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE (CSS01) LAURI ELORANTA

Upload: lauri-eloranta

Post on 15-Jul-2015

523 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

BASICS OF COMPUTATION AND

MODELINGLECTURE 2, 2.9.2015

INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE (CSS01)

LAURI ELORANTA

Page 2: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• LECTURE 1: Introduction to Computational Social Science [DONE]

• Tuesday 01.09. 16:00 – 18:00, U35, Seminar room114

• LECTURE 2: Basics of Computation and Modeling [TODAY]

• Wednesday 02.09. 16:00 – 18:00, U35, Seminar room 113

• LECTURE 3: Big Data and Information Extraction

• Monday 07.09. 16:00 – 18:00, U35, Seminar room 114

• LECTURE 4: Network Analysis

• Monday 14.09. 16:00 – 18:00, U35, Seminar room 114

• LECTURE 5: Complex Systems

• Tuesday 15.09. 16:00 – 18:00, U35, Seminar room 114

• LECTURE 6: Simulation in Social Science

• Wednesday 16.09. 16:00 – 18:00, U35, Seminar room 113

• LECTURE 7: Ethical and Legal issues in CSS

• Monday 21.09. 16:00 – 18:00, U35, Seminar room 114

• LECTURE 8: Summary

• Tuesday 22.09. 17:00 – 19:00, U35, Seminar room 114

LECTURESSCHEDULE

Page 3: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• PART 1: COMPUTATION

• Role of Computation

• How Computers Work

• What is Programming

• PART 2: MODELING

• What is Modeling

• Unified Modeling Language (UML)

LECTURE 2OVERVIEW

Page 4: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Understanding computers as information processing systems help us understand complex systems as information processing systems as well

• Knowing how computers, programs and programming languages work and what you are able to do with them helps us grasp how to approach a research problem on a practical level (e.g. which tools to choose)

• There are practical consequences on the selection of computers, programs and programming languages in relation to the research problem we are trying to solve (e.g. selection of tools affect the answers we are able to get)

MOTIVATIONWHY UNDERSTANDING COMPUTERS MATTERS

(Cioffi-Revilla 2014.)

Page 5: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Computation is used in CSS as a language to formalize (1) theory and

(2) empirical research to research social complexity.

• Computation is in most cases applied computation: computation itself

is rarely researched (as is done in computer science).

• Information processing paradigm

1. Computing as a fundamental part of complex social systems

2. Computing as tools for research

ROLE OF COMPUTATION IN CSS

(Cioffi-Revilla 2014.)

Page 6: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

HOW COMPUTERS WORK

Page 7: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Computers are formed of hardware and software

• Hardware: the physical parts of the computer that enable computing

• Micro-processor, physical memory (= electronic physical machines)

• Hardware provides the physical means for information processing

• Software: the (textual) instructions for that tell the hardware what to do

• The non-physical parts: e.g. MS. Word is a software program

• Example: your physical iPhone is hardware, the apps you run on it are software.

HARDWARE & SOFTWARE

(Hennessy & Patterson 2013.)

Page 8: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

COMPUTER ARCHITECTURE

CPU

MAIN

MEMORY

(RAM)

SECONDARY

MEMORY

(e.g. Hard Disk)

INPUT

DEVICE

OUTPUT

DEVICE

(Cioffi-Revilla 2014.)

Page 9: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• CPU = Central Processing Unit, Does all the computing work

• Processes the instuctions on an program

• Controller, Registers, Arithmetic & Logic Unit

• Main Memory (RAM, Random Access Memory)

• Fastest memory, close to CPU (so that CPU-Memory-can work well together)

• Instructions for computing are loaded to Main memory and executed from there by the CPU

• Secondary memory (Hard Disk)

• Slower memory with big volume

• Able to store big amounts of data, but the access is much slower

• Programs fetched first to Main memory from secondary memory, before they are run by the CPU

• Input and Output, I/O devices

• Screen, mouse, keyboard, network connections, …

COMPUTER ARCHITECTURE

(Hennessy & Patterson 2013.)

Page 10: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Computers run programs according the instruction cycle, also called as

fetch-execute-cycle (or fetch-decode-execute cycle)

• Basically it is about cycling two steps

• 1. Fetch the next instruction of the program from main memory to CPU

• 2. Execute that instruction in CPU

• Repeat steps 1 & 2

INSTRUCTION CYCLE

(Hennessy & Patterson 2013.)

Page 11: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• In relation to the information processing paradigm, computers can be

seen quite similar to complex adaptive social systems

• Computers are formed of information processing system (CPU,

Memory) and its environment (via I/O devices)

• Can a social system be seen as an information processing

“computer”?

AN ANALOGY BETWEEN COMPUTERS AND COMPLEX SYSTEMS

(Hennessy & Patterson 2013.)

Page 12: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

WHAT IS PROGRAMMING

Page 13: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Programming is the act of writing the instructions for the Computer/CPU

to execute

• A Program is a set of those instructions

• An iPhone app is a program

• The textual form of those instructions is called CODE and it is separated

from the DATA, which is to information the CODE is computing via CPU

• Programs are written in special languages called programing languages

PROGRAMMING

(Hennessy & Patterson 2013.)

Page 14: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Central Processing Unit (CPU) can only understand instructions that are

written in its “native” language

• This CPU language is called Machine Code, and it varies from CPU to

CPU, based on make and model

• For example ARM <-> Intel X86 machine codes

• Machine language is not (or is hardly) human readable. The closest

correspondent is low-level Assembly Language

• Machine code or machine language is a set of instructions executed directly by a computer's central

processing unit (CPU). Each instruction performs a very specific task, such as a load, a jump, or an ALU

operation on a unit of data in a CPU register or memory. Every program directly executed by a CPU is

made up of a series of such instructions. (Wikipedia 2015, Machine code)

CPU HAS ITS OWN LANGUAGE

(Hennessy & Patterson 2013.)

Page 15: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• People write code in “human readable programming languages” (or

semi-human-readable, as assembly)

• One is able to see what the program does from the code

• CPU does not understand human readable languages & code, as it only

understands Machine Code

• Human readable programming languages needs to be translated to

machine code so that CPU is able to execute the code

• There are two ways to do this:

1. Compiling

2. Interpreting

PROGRAMMING IS DONE IN HUMAN READABLE LANGUAGES

(Hennessy & Patterson 2013.)

Page 16: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Compiling code: The human readable code is transformed (=compiled)

once to machine code. After this the machine code program can be run

many times.

• -> This is equivalent in translating a book to a foreign language

(machine code), After the translation, book can be read many times.

• Interpreting code: The human readable code is interpreted to Machine

code at the same time it is executed by the CPU. This means, that the

interpretation/translation is happening at the same time the instructions

are executed.

• ->This is equivalent of having a real life conversation via a human

interpreter.

• Whether a language is compiled or intepreted has practical effects

• Speed, how variables are resolved, etc.

HELPING MACHINES READ CODE:COMPILING AND INTERPETING

(Hennessy & Patterson 2013.)

Page 17: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• The abstraction level of a programming language depends on how “far” it

is from Machine Code & dealing with hardware related specifics (such as

memory management)

• Languages can be compiled/interpreted to other languages

ABSTRACTION LEVEL OF THE LANGUAGE

Low level language High level language

Machine Code Assembly C Java RVisual

Programming

C++ Scala

Page 18: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• There are hundreds of programming languages

• http://en.wikipedia.org/wiki/List_of_programming_languages

• Languages differ in

• Syntax = how they are written, rules of writing instructions

• Semantics = what different words and concepts mean

• Pragmatics = what the language is used for

• Languages also differ in that are they compiled or interpreted to

machine code

PROGRAMMING LANGUAGES

Page 19: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

IN C LANGUAGE:

#include<stdio.h>

main()

{

printf("Hello World");

}

SYNTAX & SEMANTICSHELLO WORLD -EXAMPLE

IN JAVA LANGUAGE:

public class HelloWorld {

public static void main(String[] args) {

System.out.println("Hello, World");

}

}

IN SCHEME -LANGUAGE:

(define hello-world

(lambda ()

(begin

(write ‘Hello-World)

(newline)

(hello-world))))

IN PYTHON LANGUAGE:

print "Hello, World!"

Page 20: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Data types:

• Most basic type of information in the language

• integer, real, boolean…

• Data structures:

• More complex structures of data.

• list, stack, array, tree

• Variables: places to store functions and data

• Assignments: a way to tie a certain value to certain variable

• X = 5 + 2;

• Functions:

• A command that performs certain functionality

• Takes arguments and retunrs a value

• Print(“Hello World”) “Hello World”

• Control Structures:

• Control the flow of the program

• Loop, skip, iterate, do something while certain conditions hold

PROGRAMMING LANGUAGES INCLUDE

(Cioffi-Revilla 2014.)

Page 21: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• There are many different paradigms in the ways people do programs;

below are the three most common:

• Procedural / Imperative Programming

• Line-by-line telling what the program should do:

• 1 Do This

• 2 Do that

• 3 Do those things

• Object-Oriented Programming (OOP)

• Based on objects that contain functions and data

• Objects preserve state

• Functional Programming (FP)

• Functions as first class citizens

PARADIGMS OF PROGRAMMING

Page 22: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• An algorithm is a self-contained set of step-by-step operations to achieve a desired result.

• There are different algorithms for different purposes

• Search algorithms

• Sort algorithms

• Image processing algorithms

• Etc..

• A real life algorithm might be: how people get study credits

• Sign up for a course

• Participate lectures

• Do lecture assignments and final work

• Return lecture assignments and final work

• If your work passes the grading, you get study credits

ALGORITHMS

(Cioffi-Revilla 2014.)

Page 23: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• The way one writes code matters, because you or someone else needs

to be able to easily understand & modify the code

• This may happen after long periods of time (after one has forgotten

how the program works)

• Good coding style produces code that is simple, readable,

understandable, concise and well structured

• Code is also a way to communicate how the program works

• Documenting your code is a crucial part of programming!

• General principles according Cioffi-Revilla 2014

• Readability

• Commenting

• Modularity

• Defensive coding

CODING STYLE

Page 24: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• A good summary on how to write, refactor and manage code and data:

• Gentzkow, Matthew and Jesse M. Shapiro. 2014. Code and Data for the

Social Sciences: A Practitioner’s Guide. University of Chicago mimeo,

http://web.stanford.edu/~gentzkow/research/CodeAndData.pdf

• Handles matters such as:

• Automation

• Version Control

• Directories

• Data Keys

• Abstractation

• Documentation

• Management

MANAGING AND REFACTORING CODE&DATA

Page 25: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• You learn programming by doing!

• Start with something small

• University of Helsinki: Many Computer Science Courses

• CSS02 – Introduction to Programming in Social Sciences (II period, 2015).

• MOOC Courses Online:

• Coursera

• Data Science Specialization (Highly recommended)https://www.coursera.org/specialization/jhudatascience/1?utm_medium=catalog

• CodeAcademy

• http://www.codecademy.com/learn

• MIT Open Course WARE

• http://ocw.mit.edu/courses/intro-programming/

• Udemy

• https://www.udemy.com/courses/Development/

WHERE TO LEARN PROGRAMMING

Page 26: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

WHAT IS MODELING

Page 27: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Model is a formal and purposeful representation and abstraction of

reality

• Scientific Modeling is a scientific activity, the aim of which is to make a particular

part or feature of the world easier to understand, define, quantify, visualize, or

simulate by referencing it to existing and usually commonly accepted knowledge.

It requires selecting and identifying relevant aspects of a situation in the real

world and then using different types of models for different aims, such as

conceptual models to better understand, operational models to operationalize,

mathematical models to quantify, and graphical models to visualize the subject.

(Wikipedia 2015, Scientific Modeling)

• Reality Abstraction Model of the Phenomena

MODEL

Page 28: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

1. Models of Phenomena: model based on real world phenomena (e.g.

how ants collect food)

2. Models of Data: modeling based on raw data (e.g. plotting)

3. Models of Theory: model is the structural and formal presentation of

a textual theory

• Different Modeling Perspectives (Ontological)

• Physical models (e.g. miniature buildings)

• Fictional models (e.g. Bohr model of atom)

• Mathematical models: set-theory models, equations..

• Descriptions

• Mixed models

• A good summary on scientific modeling:

• http://plato.stanford.edu/entries/models-science/

MODELS AS REPRESENTATIONS

(Stanford Encyclopedia 2015.)

Page 29: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Ontology is the philosophical study of the nature of being,

becoming, existence, or reality, as well as the basic categories of

being and their relations. Traditionally listed as a part of the major

branch of philosophy known as metaphysics, ontology deals with

questions concerning what entities exist or can be said to exist, and how

such entities can be grouped, related within a hierarchy, and subdivided

according to similarities and differences. (Wikipedia 2015, Ontology)

• In computer science and information science, an ontology is a

formal naming and definition of the types, properties, and

interrelationships of the entities that really or fundamentally exist

for a particular domain of discourse. It is thus a practical application

of philosophical ontology, with a taxonomy. (Wikipedia 2015, Ontology

information science)

ONTOLOGY

Page 30: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Entire social world consists of social systems and their environments

• These systems form of

• Classes

• Objects (of a certain class, called instances)

• Associations between classes and objects (e.g relationships between

entities)

• Real World (Referent Social System) Model (abstracted Social

System)

ONTOLOGY & SOCIAL SYSTEMS

(Cioffi-Revilla 2014.)

Page 31: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

CAN YOU FIND CLASSES, OBJECTS AND ASSOCIATIONS?

This Image is Public Domain. From: http://www.publicdomainpictures.net.

Page 32: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

FAMILY, PARENT, CHILD, GENDER/SEX, PARENT-CHILD-RELATIONSHIP, HETERONORMATIVITY, PHOTO STUDIO…

This Image is Public Domain. From: http://www.publicdomainpictures.net.

Page 33: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Deep epistemological and philosophy of science related questions,

which are not unproblematic

• What is the true relationship between the model and reality?

• What can be actually researched with models?

• What questions the models are actually able to answer?

• Modeling takes also a certain stance on the philosophy of science,

leaning towards empiricism & positivism, or at least critical realism.

MODELING IS PROBLEMATIC

Page 34: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• A really good primer on model thinking is the course given by Scott E.

Page at the University of Michigan. One is able to participate the course

for free in Coursera: https://www.coursera.org/course/modelthinking

• Why Model?

• To be an intelligent citizen of the world

• To be a clearer thinker

• To understand and use data

• To better decide, strategize, and design

• Course videos also freely available in YouTube:

• https://www.youtube.com/watch?v=K-

gxhxGwJ38&index=2&list=PLGqc26s6O0E2P2BnK73JWXk4YYTgl3dm

b

MODEL THINKING

Page 35: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

MODELING WITH UNIFIED MODELING LANGUAGE (UML)

Page 36: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• The Unified Modeling Language (UML) is a general-purpose modeling

language in the field of software engineering, which is designed to

provide a standard way to visualize the design of a system. (Wikipedia

2015, UML)

• UML is a standardized notational system for graphically representing

complex systems consisting of classes, objects, associations among

them, dynamic interactions and other scientifically important features.

(Cioffi-Revilla 2014)

• Developed during the 1990s

• Is part of the ISO standard

• Static Modeling: Models the static structure of the system

• Dynamic Modeling: Models the dynamic behavior of the system

UNIFIED MODELIN LANGUAGE (UML)

Page 37: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Use Case Diagrams

• Class Diagrams

• Sequence Diagrams

• State Diagrams

• Component Diagrams

• Deployment Diagrams

• Most useful for Social Science modeling might be the Class, State,

Sequence diagrams

MAIN TYPES OF UML MODELS

(Bell 2004.)

Page 38: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Class diagram represents the static structure of a complex system

• Class diagram forms of

• Rectangles representing classes and objects (name on top)

• Classes and objects can have

• Attributes (e.g age, sex)

• Methods = a certain function the class or object is able to perform

(e.g.getMarried())

• Links between rectangles representing associations between classes

and objects

CLASS DIAGRAM

(Cioffi-Revilla 2014.)

Page 39: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

CLASSES

nameOfClass

Attributes (optional)

Methods (optional)

Family

-age

-weight

-height

Person

Page 40: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Four types of associations represented by different arrowhead-links:

• Inheritance/generalization

(empty arrowhead)

• Aggregation

(empty diamond)

• Composition

(black diamond)

• Generic association

(plain link / directional arrow symbol)

CLASS DIAGRAM & ASSOCIATIONS

(Cioffi-Revilla 2014.)

(Image from: http://www.javacodegeeks.com/2013/01/quick-summary-object-associations.html)

Page 41: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

ASSOCITATIONS

Family

-age

-weight

-height

Person

belongs to

Page 42: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Multiples represent the quantities in relation of association

• E.g. How many children a parent has in the particular model

• There are many different range options

• 0..1 = between 0 and 1

• 1 = exactly 1

• 0..* or * = between 0 and unspecified many

• 1..* = between 1 and unspecified many

• 0..N or N = between 0 and unspecified many

• 1..N = between 1 and unspecified many

CLASS DIAGRAM & MULTIPLES

(Cioffi-Revilla 2014.)

Page 43: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

MULTIPLES

Family

-age

-weight

-height

Person

belongs to

1..*

0..1

Page 44: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

HOW TO MODEL THIS IN UML CLASS DIAGRAM?

This Image is Public Domain. From: http://www.publicdomainpictures.net.

Page 45: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Sketch a UML Class Diagram model that represents elections

• What are the main classes, objects and relationships between the

classes?

• Do you find the model useful?

ASSIGNMENT

Page 46: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Gentzkow, M.; Shapiro, J, M. 2014. Code and Data for the Social Sciences: A Practitioner’s Guide. University of Chicago mimeo, http://faculty.chicagobooth.edu/matthew.gentzkow/research/CodeAndData.pdf

• Granger, C. 2015. Coding is not the new literacy. http://www.chris-granger.com/2015/01/26/coding-is-not-the-new-literacy/

• Epstein, J. M. 2008. Why Model?. Keynote address to the Second World Congress on Social Simulation. In Why Model?: Keynote address to the Second World Congress on Social Simulation. George Mason University.

• Page, S. E. 2012. The Model Thinker: Prologue, Introduction and Chapter 1. Link provided by University of Michigan & Coursera:

• http://vserver1.cscs.lsa.umich.edu/~spage/ONLINECOURSE/R1Page.pdf

• Stanford Encyclopedia of Philosophy, 2012. Models in Science.

• http://plato.stanford.edu/entries/models-science/

• Bell, D. 2003. UML basics: An introduction to the Unified Modeling Language. The Rational Edge. https://www.ibm.com/developerworks/rational/library/content/RationalEdge/sep03/f_umlbasics_db.pdf

LECTURE 2 READING

Page 47: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

• Cioffi-Revilla, C. 2014. Introduction to Computational Social Science. Springer-Verlag, London

• Gentzkow, M.; Shapiro, J, M. 2014. Code and Data for the Social Sciences: A Practitioner’s Guide. University of Chicago mimeo, http://faculty.chicagobooth.edu/matthew.gentzkow/research/CodeAndData.pdf

• Hennessy, J. L.; Patterson, D. A. 2013. Computer Organization and Design. Elsevier, Waltham.

• Stanford Encyclopedia of Philosophy, 2012. Models in Science.

• http://plato.stanford.edu/entries/models-science/

• Bell, D. 2003. UML basics: An introduction to the Unified Modeling Language. The Rational Edge. https://www.ibm.com/developerworks/rational/library/content/RationalEdge/sep03/f_umlbasics_db.pdf

• Wikipedia 2015, Scientific Modeling. http://en.wikipedia.org/wiki/Scientific_modelling

• Wikipedia 2015, Ontology. http://en.wikipedia.org/wiki/Ontology

• Wikipedia 2015, Ontology (information science) http://en.wikipedia.org/wiki/Ontology_(information_science)

REFERENCES

Page 48: Basics of Computation and Modeling - Lecture 2 in Introduction to Computational Social Science

Thank You!

Questions and comments?

twitter: @laurieloranta