scientific computing (w2) an introduction to scientific computing workshop 2

Scientific Computing (w2)

An Introduction toScientific Computing

workshop 2


R Programming Workshop 2An introduction to computer programming using the R computing environmentSolving problems using a computerUsing computers as a tool for:

– Plotting graphs and pictures– Analysis of data– Learning maths and physics…

Scientific Computing (w2)Getting started with R

Logon to a UoL computer.

Start All Programs RStudio [click]

Type commands at the ‘prompt’ in the R ‘console’.

Scientific Computing (w2)Getting Help

• From Rstudio – click “Help” menu (top) – select “R Help”

• From the console – type ?plot for help with the “plot” command

• From the web - http://www.r-project.org/ Or http://stackoverflow.com/questions/tagged/r

http://www.r-project.org/

http://www.r-project.org/

http://stackoverflow.com/questions/tagged/r

http://stackoverflow.com/questions/tagged/r


Refresher: variables and objects

Scientific Computing (w2)Variables, Objects and

Assignment> x <- 1.2[1] 1.2

in R, the things you manipulate and store are objects. There are different classes of object and R has methods for

treating an object in a manner appropriate to its class.

“1.2” is just a number, a scalar. In fact, it’s a floating point number (not an integer). When we assign an object this

value, the object instantly takes the right form. In this case ‘x’ becomes an object storing our single floating point

number.


Assignment> x <- c(1.2, 1.3, 1.4, 2.0)[1] 1.2 1.3 1.4 2.0



We use the c(…) function to combine several elements (in this case numbers) into one object, often called a vector (a 1D list) or more generally an array (which can have more

than 1 dimension). The old ‘x’ has been replaced with a new one, which is now an array storing 5 values.


Assignment> my.name <- “spider man”[1] “spider man”



“spider man” is a string of characters. There are 10 characters (including space) stored as a single string. The object ‘my.name’ is created as a string object and assigned

this value.


Assignment> my.name <- c(“spider man”, “cat woman”)

[1] “spider man” “cat woman”



You can have an array of strings too…


Assignment> x <- array(1:20, dim=c(4,5))

Here we define a 2 dimensional array,

with size 4x5 (4 rows, 5 columns).There are 20

elements.

This produces the numbers 1, 2, …, 20

(20 of them)

Create an array. We need to define the

values going in to it, and its dimensions.


Assignment> x <- array(1:20, dim=c(4,5))> x [,1] [,2] [,3] [,4] [,5][1,] 1 5 9 13 17[2,] 2 6 10 14 18[3,] 3 7 11 15 19[4,] 4 8 12 16 20

> x[2,3]

Here is the 4x5 array. Notice the order the values

are placed.

What does this do?


Assignment> 4 > 3[1] TRUE

Evaluate whether 4 is greater than 3. The answer is

TRUE

This is a logical object. Possible values are TRUE

and FALSE.


Assignment> 4 >= 3[1] TRUE

Evaluate whether 4 is greater than or equal to 3. The

answer is still TRUE


and FALSE.


Assignment> 4 <= 3[1] FALSE

Evaluate whether 4 is less than or equal to 3. The answer is FALSE


and FALSE.


Assignment> x <- c(1.0, 2.3, 4.5, 10.0)> x > 4.0[1] FALSE FALSE TRUE TRUE

For each element in the array ‘x’

evaluate whether it is greater than 4

This is an array of logical values. The

array has the same size/shape

as ‘x’.

This is an example of an ‘element-wise’ operation. The operation is simply repeated for each an every element of the

arrays involved.


Some simple data analysis

Scientific Computing (w2)Means, variance & all that

> print(morley) Expt Run Speed001 1 1 850002 1 2 740003 1 3 900004 1 4 1070005 1 5 930006 1 6 850

The morley data have100 measurements in three

columns:Exp = experiment no. (1-5)Run = run no (1-20)Speed = measured speed -299000 km/s

R has several datasets built-in and ready to play with, to help you practice data analysis at the computer. Here we use 100

speed of light measurements.


> print(morley$Speed)

If we consider just the ‘Speed’ column of the morley array we have 100 numbers in the form of a 1d array (a

“vector”)




> print(morley$Speed)> mean(morley$Speed)[1] 852.4

The mean(…) function will compute the mean value of whatever you feed it. In this case it uses the 100 values

stored in morley$Speed



The output (or returned value) is just one number, the mean. As

usual, R treats this as a 11 array.


y <- c(3.4,4.5,2.3,-0.01)> mean(y)

Try entering your own list of values. Remember to combine them into a single object – using c(…) – which can be processed by the mean(…)

function

R comes with a lot of functions for data analysis and manipulation. And you can add more by installing packages.

There are 5000+ packages you can install to add new commands, some packages add dozens of new commands.


> var(y)> length(y)> sqrt(var(y))> sd(y)

The sd(…)function returns the standard deviation (sqrt of variance). This is a ready-made function that does the same

calculation as the previous line.

Work from the inside out. The var(…)function returns the number

of elements in ‘y’. This is then used as input to the sqrt(…)

function.

The length(…) function returns the number of

elements in ‘y’.

The var(…) function will compute the variance value of whatever you feed it. In this case it uses the values

you assigned to ‘y’.


> var(y)> length(y)> sqrt(var(y))> sd(y)> sd(y)/sqrt(length(y))

This computes the “standard error” on the mean of y

n

ii yy

nnn

SSE

1

22

)(1

11


> mean(morley$Speed)[1] 852.4> sd(morley$Speed)[1] 79.01055> sd(morley$Speed) / sqrt(length(morley$Speed))> [1] 7.901055

This computes the “standard error” on the mean of the speedkm/s 8852,299

Scientific Computing (w2)Plot the speed of light data

> plot(morley$Speed)> hist(morley$Speed)

The hist(…) function will generate a histogram (and

plot it).

Try plotting the data

Scientific Computing (w2)Plot the speed of light data

> plot(morley$Speed)> hist(morley$Speed)> hist(morley$Speed, col=“steelblue”,

border=“white”, main=“my histogram”)> hist(morley$Speed, col="steelblue",border="white", breaks=seq(500,1200,by=100))

Define exactly where you want the breaks (between histogram bins) to go.


Extracting subsets of an array


Assignment> morley$Speed > 900[1] FALSE FALSE FALSE TRUE...


Assignment> morley$Speed > 900[1] FALSE FALSE FALSE TRUE...> mask <- (morley$Speed > 900)> print(morley$Speed[mask])

store the result of the test in the (logical) array

‘mask’only keeps elements of morley$Speed for which the corresponding element

of mask=TRUE

I.e. this is the subset of speeds > 900


Assignment> morley$Speed > 900[1] FALSE FALSE FALSE TRUE...> mask <- (morley$Speed > 900)> print(morley$Speed[mask])> mean(mask)

the logical value TRUE is treated as a numerical value 1, and FALSE is treated as 0. So mean(mask)

gives the fraction of TRUE’s.I.e. the fraction of speeds > 900


Assignment> morley$Speed > 900[1] FALSE FALSE FALSE TRUE...> mask <- (morley$Speed > 900)> print(morley$Speed[mask])> mean(mask)> morley[mask, ]

morley is a 100x3 array (100 rows, 3 columns). Here we use mask to select the matching rows only. Leaving the column

index empty means it will use all columns


A simple physics simulation

Scientific Computing (w2)Brownian motion & random

walksEinstein (1905) developed physical theory of Brownian motion

Scientific Computing (w2)A 1-dimensional random

walk

-4 -3 -2 -1 0 1 2 3 4

start at “time” x=0position y=0


walkmove ahead to x=1randomly jump +1 or -1 in ynow at y=1

-4 -3 -2 -1 0 1 2 3 4


walk

-4 -3 -2 -1 0 1 2 3 4

move ahead to x=2randomly jump +1 or -1 in ynow at y=0


walk

-4 -3 -2 -1 0 1 2 3 4

move ahead to x=3randomly jump +1 or -1 in ynow at y=-1


walk

-4 -3 -2 -1 0 1 2 3 4what happens to y as we keep advancing x…?

move ahead to x=4randomly jump +1 or -1 in ynow at y=-2


Writing you first script


Help files, history, memory manager here

graphics –Plots appear hereThe console –

Type new commands here

Editor -Write scripts here

Scientific Computing (w2)R Script Files

Edit a script file and save (“mysource.R”)

– to begin a new script: • File New file R Script

– to save the script: • File Save / Save As (in working directory)

– to open/edit an existing script:• File Open

Source the script file (run the script)> source(“mysource.R”)– Or click on Source button in editor

window


We are going to write a short script

Click in the “editor” window (top left of RStudio)

Start typing the commands on the next slide

NOTE: Anything starting with a # symbol is a comment – the computer ignores it

Comments are helpful to the reader/writer

Remember to Save the script periodically (e.g. Save

As…“random_walk.R”)

Scientific Computing (w2)# generate n.step steps

n.step <- 100jump <- 1x <- 0:n.step

# compute the y stepsdy <- jump * sample(c(-1, 1), size=n.step,

replace=TRUE)plot(dy)

# compute the cumulative y positiony <- c(0, cumsum(dy))

# plot the walkplot(x, y, type="s")

a command may run over more than one line

when you have finished typing, save and source (run) the script.

Scientific Computing (w2)A (short) random walk


How does the script work?


Randomly choose a jump of +1 or -1

# generate n.step stepsn.step <- 100jump <- 1x <- 0:n.step





Try this at the console to see how sample(…) works

> sample( c(-1, 1), size=10, replace=TRUE)

then try it again (use the up arrow key)







Add up all the jumps- a cumulative sum

Try this at the console to see how cumsum(…) works

> x <- 1:10> sum(x)> cumsum(x)






# plot the walkplot(x, y, type="s”)

now change number of steps to 10000 and Source again

Scientific Computing (w2)A (longer) random walk

run (Source) again, and again, …

Scientific Computing (w2)# plot a line at y=0

abline(h=0, lty=2)

# plot the start and end pointspoints(n.step, y[1], pch=16, cex=1.5)points(n.step, y[n.step+1], pch=16, cex=1.5)

# join with a line the start and end pointslines(c(n.step, n.step), c(0, y[n.step+1]))

# compute the distance travelledr <- y[n.step+1] - y[1]

# print on the plotlabel <- paste("distance=", signif(r, 3))text(max(x), max(y), labels=label, pos=2)

Now that it works, we are going to add some more lines to the bottom of our script.

Click in the “editor” window (top left)

Start typing the following commands at the end your existing script

Scientific Computing (w2)# plot a line at y=0

abline(h=0, lty=2)

# plot the start and end pointspoints(n.step, y[1], pch=16, cex=1.5)points(n.step, y[n.step+1], pch=16, cex=1.5)

# join with a line the start and end pointslines(c(n.step, n.step), c(0, y[n.step+1]))

# compute the distance travelledr <- y[n.step+1] - y[1]


Scientific Computing (w2)A (longer) random walk

Scientific Computing (w2)Brownian motion & random

walksNow let’s try in 2 dimensions

Pick a random direction to move in, call it theta

Select this from a uniform distribution from 0 to 2 radians (equal probabilities)

Step a distance (jump=1) in direction theta

Resolve into components of the motion in x and in y

Scientific Computing (w2)# generate n.step random directions (angles)

n.step <- 100theta <- runif(n.step, 0, 2*pi)jump <- 1

# compute the x and y step sizesdx <- jump * cos(theta)dy <- jump * sin(theta)

# compute the cumulative x and y positionsx <- c(0, cumsum(dx))y <- c(0, cumsum(dy))

# plot the walkplot(x, y, type="l", bty="n", col="red")

Begin a new scriptfile New file R Script

Remember to save it periodically

Scientific Computing (w2)# generate n.step random directions (angles)

n.step <- 100theta <- runif(n.step, 0, 2*pi)jump <- 1

# compute the x and y step sizesdx <- jump * cos(theta)dy <- jump * sin(theta)

# compute the cumulative x and y positionsx <- c(0, cumsum(dx))y <- c(0, cumsum(dy))

# plot the walkplot(x, y, type="l", bty="n", col="red")

Scientific Computing (w2)# plot the start and end points

points(x[1], y[1], pch=16, cex=1.5)points(x[n.step+1], y[n.step+1], pch=16,

cex=1.5)

# join the start and end pointslines(x[c(1, n.step+1)], y[c(1, n.step+1)],

lwd=3)

# compute the distance from start to endr <- sqrt(x[n.step+1]^2 + y[n.step+1]^2)


Add more lines to the new script

Scientific Computing (w2)# plot the start and end points

points(x[1], y[1], pch=16, cex=1.5)points(x[n.step+1], y[n.step+1], pch=16,

cex=1.5)

# join the start and end pointslines(x[c(1, n.step+1)], y[c(1, n.step+1)],

lwd=3)

# compute the distance from start to endr <- sqrt(x[n.step+1]^2 + y[n.step+1]^2)


Scientific Computing (w2)Interlude: loops

> 1:10> for (i in 1:10) { print(i) }

> data.array <- array(0, dim=100)> for (i in 2:length(data.array)) { data.array[i] <- data.array[i-1]+1 print(i)}> data.array

try this at the console. what happens between

the curly brackets is ‘looped’

you can use many lines inside a loop


Now that your script works, we are going to make some changes to it.

There is no need to replace the entire script. The new script reuses some the lines from our last script. Keep these lines, add to and edit the rest. Recycle code!

Save this new one with a different file name.

Remember: Anything starting with a # symbol is a comment – the computer ignores it

Scientific Computing (w2)n.step <- 100

n.walks <- 1000jump <- 1x <- array(0, dim=n.walks)y <- array(0, dim=n.walks)

for (i in 1:n.walks) { theta <- runif(n.step, 0, 2*pi) dx <- jump * cos(theta) dy <- jump * sin(theta) x[i] <- sum(dx) y[i] <- sum(dy)}

# compute the radial distancesr <- sqrt(x^2 + y^2)

# plot the end point of each walk in (x,y) coordinatesplot(x, y)

this is a loop - for each of i = 1,2,3,…,n.walksdo whatever is {inside}

prepare two arrays to store the outputs

Pythagoras’ theorem


# plot histogram of x's hist(x, breaks=20, col="blue",

border="white", prob=TRUE)

# compare to the Normal curvex.norm <- seq(-50, 50, by=0.2)y.norm <- dnorm(x.norm, mean=0,

sd=sqrt(n.step)*2/pi)lines(x.norm, y.norm, lwd=4)

Still want more…?add this to your 2d script

2

|)sin(|2

1||

2

0 dxxdx

Scientific Computing (w2)What have we learnt?

• some different kinds of object and data in R• how to select subsets of arrays, use logical

operators• how to compute mean, variance, std. deviation, std.

error• how to make and plot a histogram• how to start, test, edit, load and save a simple script

Now you know how to do this, try using these to • improve your data analysis and presentation

in lab work. • perform repetitive/iterative tasks in

maths/physics• explore (map out) new mathematical

functions

scientific computing (w2) an introduction to scientific computing workshop 2

Documents

array x

assignment x x

array of logical values

morley array

r help

speed of light measurements

speed meanmorley

speed column