basics of computation and modeling - lecture 2 in introduction to computational social science
TRANSCRIPT
BASICS OF COMPUTATION AND
MODELINGLECTURE 2, 2.9.2015
INTRODUCTION TO COMPUTATIONAL SOCIAL SCIENCE (CSS01)
LAURI ELORANTA
• LECTURE 1: Introduction to Computational Social Science [DONE]
• Tuesday 01.09. 16:00 – 18:00, U35, Seminar room114
• LECTURE 2: Basics of Computation and Modeling [TODAY]
• Wednesday 02.09. 16:00 – 18:00, U35, Seminar room 113
• LECTURE 3: Big Data and Information Extraction
• Monday 07.09. 16:00 – 18:00, U35, Seminar room 114
• LECTURE 4: Network Analysis
• Monday 14.09. 16:00 – 18:00, U35, Seminar room 114
• LECTURE 5: Complex Systems
• Tuesday 15.09. 16:00 – 18:00, U35, Seminar room 114
• LECTURE 6: Simulation in Social Science
• Wednesday 16.09. 16:00 – 18:00, U35, Seminar room 113
• LECTURE 7: Ethical and Legal issues in CSS
• Monday 21.09. 16:00 – 18:00, U35, Seminar room 114
• LECTURE 8: Summary
• Tuesday 22.09. 17:00 – 19:00, U35, Seminar room 114
LECTURESSCHEDULE
• PART 1: COMPUTATION
• Role of Computation
• How Computers Work
• What is Programming
• PART 2: MODELING
• What is Modeling
• Unified Modeling Language (UML)
LECTURE 2OVERVIEW
• Understanding computers as information processing systems help us understand complex systems as information processing systems as well
• Knowing how computers, programs and programming languages work and what you are able to do with them helps us grasp how to approach a research problem on a practical level (e.g. which tools to choose)
• There are practical consequences on the selection of computers, programs and programming languages in relation to the research problem we are trying to solve (e.g. selection of tools affect the answers we are able to get)
MOTIVATIONWHY UNDERSTANDING COMPUTERS MATTERS
(Cioffi-Revilla 2014.)
• Computation is used in CSS as a language to formalize (1) theory and
(2) empirical research to research social complexity.
• Computation is in most cases applied computation: computation itself
is rarely researched (as is done in computer science).
• Information processing paradigm
1. Computing as a fundamental part of complex social systems
2. Computing as tools for research
ROLE OF COMPUTATION IN CSS
(Cioffi-Revilla 2014.)
HOW COMPUTERS WORK
• Computers are formed of hardware and software
• Hardware: the physical parts of the computer that enable computing
• Micro-processor, physical memory (= electronic physical machines)
• Hardware provides the physical means for information processing
• Software: the (textual) instructions for that tell the hardware what to do
• The non-physical parts: e.g. MS. Word is a software program
• Example: your physical iPhone is hardware, the apps you run on it are software.
HARDWARE & SOFTWARE
(Hennessy & Patterson 2013.)
COMPUTER ARCHITECTURE
CPU
MAIN
MEMORY
(RAM)
SECONDARY
MEMORY
(e.g. Hard Disk)
INPUT
DEVICE
OUTPUT
DEVICE
(Cioffi-Revilla 2014.)
• CPU = Central Processing Unit, Does all the computing work
• Processes the instuctions on an program
• Controller, Registers, Arithmetic & Logic Unit
• Main Memory (RAM, Random Access Memory)
• Fastest memory, close to CPU (so that CPU-Memory-can work well together)
• Instructions for computing are loaded to Main memory and executed from there by the CPU
• Secondary memory (Hard Disk)
• Slower memory with big volume
• Able to store big amounts of data, but the access is much slower
• Programs fetched first to Main memory from secondary memory, before they are run by the CPU
• Input and Output, I/O devices
• Screen, mouse, keyboard, network connections, …
COMPUTER ARCHITECTURE
(Hennessy & Patterson 2013.)
• Computers run programs according the instruction cycle, also called as
fetch-execute-cycle (or fetch-decode-execute cycle)
• Basically it is about cycling two steps
• 1. Fetch the next instruction of the program from main memory to CPU
• 2. Execute that instruction in CPU
• Repeat steps 1 & 2
INSTRUCTION CYCLE
(Hennessy & Patterson 2013.)
• In relation to the information processing paradigm, computers can be
seen quite similar to complex adaptive social systems
• Computers are formed of information processing system (CPU,
Memory) and its environment (via I/O devices)
• Can a social system be seen as an information processing
“computer”?
AN ANALOGY BETWEEN COMPUTERS AND COMPLEX SYSTEMS
(Hennessy & Patterson 2013.)
WHAT IS PROGRAMMING
• Programming is the act of writing the instructions for the Computer/CPU
to execute
• A Program is a set of those instructions
• An iPhone app is a program
• The textual form of those instructions is called CODE and it is separated
from the DATA, which is to information the CODE is computing via CPU
• Programs are written in special languages called programing languages
PROGRAMMING
(Hennessy & Patterson 2013.)
• Central Processing Unit (CPU) can only understand instructions that are
written in its “native” language
• This CPU language is called Machine Code, and it varies from CPU to
CPU, based on make and model
• For example ARM <-> Intel X86 machine codes
• Machine language is not (or is hardly) human readable. The closest
correspondent is low-level Assembly Language
• Machine code or machine language is a set of instructions executed directly by a computer's central
processing unit (CPU). Each instruction performs a very specific task, such as a load, a jump, or an ALU
operation on a unit of data in a CPU register or memory. Every program directly executed by a CPU is
made up of a series of such instructions. (Wikipedia 2015, Machine code)
CPU HAS ITS OWN LANGUAGE
(Hennessy & Patterson 2013.)
• People write code in “human readable programming languages” (or
semi-human-readable, as assembly)
• One is able to see what the program does from the code
• CPU does not understand human readable languages & code, as it only
understands Machine Code
• Human readable programming languages needs to be translated to
machine code so that CPU is able to execute the code
• There are two ways to do this:
1. Compiling
2. Interpreting
PROGRAMMING IS DONE IN HUMAN READABLE LANGUAGES
(Hennessy & Patterson 2013.)
• Compiling code: The human readable code is transformed (=compiled)
once to machine code. After this the machine code program can be run
many times.
• -> This is equivalent in translating a book to a foreign language
(machine code), After the translation, book can be read many times.
• Interpreting code: The human readable code is interpreted to Machine
code at the same time it is executed by the CPU. This means, that the
interpretation/translation is happening at the same time the instructions
are executed.
• ->This is equivalent of having a real life conversation via a human
interpreter.
• Whether a language is compiled or intepreted has practical effects
• Speed, how variables are resolved, etc.
HELPING MACHINES READ CODE:COMPILING AND INTERPETING
(Hennessy & Patterson 2013.)
• The abstraction level of a programming language depends on how “far” it
is from Machine Code & dealing with hardware related specifics (such as
memory management)
• Languages can be compiled/interpreted to other languages
ABSTRACTION LEVEL OF THE LANGUAGE
Low level language High level language
Machine Code Assembly C Java RVisual
Programming
C++ Scala
• There are hundreds of programming languages
• http://en.wikipedia.org/wiki/List_of_programming_languages
• Languages differ in
• Syntax = how they are written, rules of writing instructions
• Semantics = what different words and concepts mean
• Pragmatics = what the language is used for
• Languages also differ in that are they compiled or interpreted to
machine code
PROGRAMMING LANGUAGES
IN C LANGUAGE:
#include<stdio.h>
main()
{
printf("Hello World");
}
SYNTAX & SEMANTICSHELLO WORLD -EXAMPLE
IN JAVA LANGUAGE:
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, World");
}
}
IN SCHEME -LANGUAGE:
(define hello-world
(lambda ()
(begin
(write ‘Hello-World)
(newline)
(hello-world))))
IN PYTHON LANGUAGE:
print "Hello, World!"
• Data types:
• Most basic type of information in the language
• integer, real, boolean…
• Data structures:
• More complex structures of data.
• list, stack, array, tree
• Variables: places to store functions and data
• Assignments: a way to tie a certain value to certain variable
• X = 5 + 2;
• Functions:
• A command that performs certain functionality
• Takes arguments and retunrs a value
• Print(“Hello World”) “Hello World”
• Control Structures:
• Control the flow of the program
• Loop, skip, iterate, do something while certain conditions hold
PROGRAMMING LANGUAGES INCLUDE
(Cioffi-Revilla 2014.)
• There are many different paradigms in the ways people do programs;
below are the three most common:
• Procedural / Imperative Programming
• Line-by-line telling what the program should do:
• 1 Do This
• 2 Do that
• 3 Do those things
• Object-Oriented Programming (OOP)
• Based on objects that contain functions and data
• Objects preserve state
• Functional Programming (FP)
• Functions as first class citizens
PARADIGMS OF PROGRAMMING
• An algorithm is a self-contained set of step-by-step operations to achieve a desired result.
• There are different algorithms for different purposes
• Search algorithms
• Sort algorithms
• Image processing algorithms
• Etc..
• A real life algorithm might be: how people get study credits
• Sign up for a course
• Participate lectures
• Do lecture assignments and final work
• Return lecture assignments and final work
• If your work passes the grading, you get study credits
ALGORITHMS
(Cioffi-Revilla 2014.)
• The way one writes code matters, because you or someone else needs
to be able to easily understand & modify the code
• This may happen after long periods of time (after one has forgotten
how the program works)
• Good coding style produces code that is simple, readable,
understandable, concise and well structured
• Code is also a way to communicate how the program works
• Documenting your code is a crucial part of programming!
• General principles according Cioffi-Revilla 2014
• Readability
• Commenting
• Modularity
• Defensive coding
CODING STYLE
• A good summary on how to write, refactor and manage code and data:
• Gentzkow, Matthew and Jesse M. Shapiro. 2014. Code and Data for the
Social Sciences: A Practitioner’s Guide. University of Chicago mimeo,
http://web.stanford.edu/~gentzkow/research/CodeAndData.pdf
• Handles matters such as:
• Automation
• Version Control
• Directories
• Data Keys
• Abstractation
• Documentation
• Management
MANAGING AND REFACTORING CODE&DATA
• You learn programming by doing!
• Start with something small
• University of Helsinki: Many Computer Science Courses
• CSS02 – Introduction to Programming in Social Sciences (II period, 2015).
• MOOC Courses Online:
• Coursera
• Data Science Specialization (Highly recommended)https://www.coursera.org/specialization/jhudatascience/1?utm_medium=catalog
• CodeAcademy
• http://www.codecademy.com/learn
• MIT Open Course WARE
• http://ocw.mit.edu/courses/intro-programming/
• Udemy
• https://www.udemy.com/courses/Development/
WHERE TO LEARN PROGRAMMING
WHAT IS MODELING
• Model is a formal and purposeful representation and abstraction of
reality
• Scientific Modeling is a scientific activity, the aim of which is to make a particular
part or feature of the world easier to understand, define, quantify, visualize, or
simulate by referencing it to existing and usually commonly accepted knowledge.
It requires selecting and identifying relevant aspects of a situation in the real
world and then using different types of models for different aims, such as
conceptual models to better understand, operational models to operationalize,
mathematical models to quantify, and graphical models to visualize the subject.
(Wikipedia 2015, Scientific Modeling)
• Reality Abstraction Model of the Phenomena
MODEL
1. Models of Phenomena: model based on real world phenomena (e.g.
how ants collect food)
2. Models of Data: modeling based on raw data (e.g. plotting)
3. Models of Theory: model is the structural and formal presentation of
a textual theory
• Different Modeling Perspectives (Ontological)
• Physical models (e.g. miniature buildings)
• Fictional models (e.g. Bohr model of atom)
• Mathematical models: set-theory models, equations..
• Descriptions
• Mixed models
• A good summary on scientific modeling:
• http://plato.stanford.edu/entries/models-science/
MODELS AS REPRESENTATIONS
(Stanford Encyclopedia 2015.)
• Ontology is the philosophical study of the nature of being,
becoming, existence, or reality, as well as the basic categories of
being and their relations. Traditionally listed as a part of the major
branch of philosophy known as metaphysics, ontology deals with
questions concerning what entities exist or can be said to exist, and how
such entities can be grouped, related within a hierarchy, and subdivided
according to similarities and differences. (Wikipedia 2015, Ontology)
• In computer science and information science, an ontology is a
formal naming and definition of the types, properties, and
interrelationships of the entities that really or fundamentally exist
for a particular domain of discourse. It is thus a practical application
of philosophical ontology, with a taxonomy. (Wikipedia 2015, Ontology
information science)
ONTOLOGY
• Entire social world consists of social systems and their environments
• These systems form of
• Classes
• Objects (of a certain class, called instances)
• Associations between classes and objects (e.g relationships between
entities)
• Real World (Referent Social System) Model (abstracted Social
System)
ONTOLOGY & SOCIAL SYSTEMS
(Cioffi-Revilla 2014.)
CAN YOU FIND CLASSES, OBJECTS AND ASSOCIATIONS?
This Image is Public Domain. From: http://www.publicdomainpictures.net.
FAMILY, PARENT, CHILD, GENDER/SEX, PARENT-CHILD-RELATIONSHIP, HETERONORMATIVITY, PHOTO STUDIO…
This Image is Public Domain. From: http://www.publicdomainpictures.net.
• Deep epistemological and philosophy of science related questions,
which are not unproblematic
• What is the true relationship between the model and reality?
• What can be actually researched with models?
• What questions the models are actually able to answer?
• Modeling takes also a certain stance on the philosophy of science,
leaning towards empiricism & positivism, or at least critical realism.
MODELING IS PROBLEMATIC
• A really good primer on model thinking is the course given by Scott E.
Page at the University of Michigan. One is able to participate the course
for free in Coursera: https://www.coursera.org/course/modelthinking
• Why Model?
• To be an intelligent citizen of the world
• To be a clearer thinker
• To understand and use data
• To better decide, strategize, and design
• Course videos also freely available in YouTube:
• https://www.youtube.com/watch?v=K-
gxhxGwJ38&index=2&list=PLGqc26s6O0E2P2BnK73JWXk4YYTgl3dm
b
MODEL THINKING
MODELING WITH UNIFIED MODELING LANGUAGE (UML)
• The Unified Modeling Language (UML) is a general-purpose modeling
language in the field of software engineering, which is designed to
provide a standard way to visualize the design of a system. (Wikipedia
2015, UML)
• UML is a standardized notational system for graphically representing
complex systems consisting of classes, objects, associations among
them, dynamic interactions and other scientifically important features.
(Cioffi-Revilla 2014)
• Developed during the 1990s
• Is part of the ISO standard
• Static Modeling: Models the static structure of the system
• Dynamic Modeling: Models the dynamic behavior of the system
UNIFIED MODELIN LANGUAGE (UML)
• Use Case Diagrams
• Class Diagrams
• Sequence Diagrams
• State Diagrams
• Component Diagrams
• Deployment Diagrams
• Most useful for Social Science modeling might be the Class, State,
Sequence diagrams
MAIN TYPES OF UML MODELS
(Bell 2004.)
• Class diagram represents the static structure of a complex system
• Class diagram forms of
• Rectangles representing classes and objects (name on top)
• Classes and objects can have
• Attributes (e.g age, sex)
• Methods = a certain function the class or object is able to perform
(e.g.getMarried())
• Links between rectangles representing associations between classes
and objects
CLASS DIAGRAM
(Cioffi-Revilla 2014.)
CLASSES
nameOfClass
Attributes (optional)
Methods (optional)
Family
-age
-weight
-height
Person
• Four types of associations represented by different arrowhead-links:
• Inheritance/generalization
(empty arrowhead)
• Aggregation
(empty diamond)
• Composition
(black diamond)
• Generic association
(plain link / directional arrow symbol)
CLASS DIAGRAM & ASSOCIATIONS
(Cioffi-Revilla 2014.)
(Image from: http://www.javacodegeeks.com/2013/01/quick-summary-object-associations.html)
ASSOCITATIONS
Family
-age
-weight
-height
Person
belongs to
• Multiples represent the quantities in relation of association
• E.g. How many children a parent has in the particular model
• There are many different range options
• 0..1 = between 0 and 1
• 1 = exactly 1
• 0..* or * = between 0 and unspecified many
• 1..* = between 1 and unspecified many
• 0..N or N = between 0 and unspecified many
• 1..N = between 1 and unspecified many
CLASS DIAGRAM & MULTIPLES
(Cioffi-Revilla 2014.)
MULTIPLES
Family
-age
-weight
-height
Person
belongs to
1..*
0..1
HOW TO MODEL THIS IN UML CLASS DIAGRAM?
This Image is Public Domain. From: http://www.publicdomainpictures.net.
• Sketch a UML Class Diagram model that represents elections
• What are the main classes, objects and relationships between the
classes?
• Do you find the model useful?
ASSIGNMENT
• Gentzkow, M.; Shapiro, J, M. 2014. Code and Data for the Social Sciences: A Practitioner’s Guide. University of Chicago mimeo, http://faculty.chicagobooth.edu/matthew.gentzkow/research/CodeAndData.pdf
• Granger, C. 2015. Coding is not the new literacy. http://www.chris-granger.com/2015/01/26/coding-is-not-the-new-literacy/
• Epstein, J. M. 2008. Why Model?. Keynote address to the Second World Congress on Social Simulation. In Why Model?: Keynote address to the Second World Congress on Social Simulation. George Mason University.
• Page, S. E. 2012. The Model Thinker: Prologue, Introduction and Chapter 1. Link provided by University of Michigan & Coursera:
• http://vserver1.cscs.lsa.umich.edu/~spage/ONLINECOURSE/R1Page.pdf
• Stanford Encyclopedia of Philosophy, 2012. Models in Science.
• http://plato.stanford.edu/entries/models-science/
• Bell, D. 2003. UML basics: An introduction to the Unified Modeling Language. The Rational Edge. https://www.ibm.com/developerworks/rational/library/content/RationalEdge/sep03/f_umlbasics_db.pdf
LECTURE 2 READING
• Cioffi-Revilla, C. 2014. Introduction to Computational Social Science. Springer-Verlag, London
• Gentzkow, M.; Shapiro, J, M. 2014. Code and Data for the Social Sciences: A Practitioner’s Guide. University of Chicago mimeo, http://faculty.chicagobooth.edu/matthew.gentzkow/research/CodeAndData.pdf
• Hennessy, J. L.; Patterson, D. A. 2013. Computer Organization and Design. Elsevier, Waltham.
• Stanford Encyclopedia of Philosophy, 2012. Models in Science.
• http://plato.stanford.edu/entries/models-science/
• Bell, D. 2003. UML basics: An introduction to the Unified Modeling Language. The Rational Edge. https://www.ibm.com/developerworks/rational/library/content/RationalEdge/sep03/f_umlbasics_db.pdf
• Wikipedia 2015, Scientific Modeling. http://en.wikipedia.org/wiki/Scientific_modelling
• Wikipedia 2015, Ontology. http://en.wikipedia.org/wiki/Ontology
• Wikipedia 2015, Ontology (information science) http://en.wikipedia.org/wiki/Ontology_(information_science)
REFERENCES
Thank You!
Questions and comments?
twitter: @laurieloranta