ams597 spring 2011 hao han april 05, 2011 1. introduction to matlab the name matlab stands for...
TRANSCRIPT
1
Statistical Computing in MATLAB
AMS597 Spring 2011
Hao Han
April 05, 2011
2
Introduction to MATLABThe name MATLAB stands for MATrix LABoratory.Typical uses include:
Math and computation– Algorithm development– Data acquisition– Modeling, simulation, and prototyping
Data analysis, exploration, and visualization– Scientific and engineering graphics– Application development, including graphical user
interface (GUI) building
We will focus on the statistical computing in MATLAB.
3
Desktop Tools &Development EnvironmentWorkspace Browser – View and make changes
to the contents of the workspace.Command Windows – Run MATLAB
statements (commands).Hotkey: Ctrl+c -> break while the status is busy
M-file Editor – Creating, Editing, Debugging and Running Files.
4
MATLAB VariablesVariable names are case sensitive. Variable names must start with a
letter and can be followed by digits and underscores.MATLAB does not require any type of declarations or dimension
statements. When it encounters a new variable name, it automatically creates the variable and allocates the appropriate amount of storage.
For example: New_student = 25; To view the matrix assigned to any variable, simply enter the
variable name. Special Variables:
pi value of πeps smallest incremental number inf infinityNaN not a numberrealmin the smallest usable positive real numberrealmax the largest usable positive real number
5
MATLAB MatricesMATLAB treats all variables as rectangular matrices.Separate the elements of a row with blanks or commas. Use a semicolon ‘;’ to indicate the end of each row. Surround the entire list of elements with square brackets ‘[ ]’.
Claim a matrix:a = [1 2 3; 4 5 6; 7 8 9]a = 1 2 3 4 5 6 7 8 9Subscripts: the element in row i and column j of A is denoted by A(i,j).a(3,2)=8 or a(6)=8
Claim a scalar: x = 2;Claim a row vector:r = [1 2 3]r = [1,2,3]Claim a column vector:c = [1;2;3]c = [1 2 3]’
6
Matrix ManipulationsThe Colon Operator: 1:5 is a row vector
containing integers from 1 to 5.
To obtain non-unit spacing, specify an increment.
For example, 100:-7:50
Extracting a sub-matrix: Sub_matrix = matrix(r1:r2,c1:c2);
sub_a = a(2:3,1:2)
sub_a =4 5
7 8
Replication:b = [1 2; 3 4];b_rep = repmat(b,1,2)b_rep = 1 2 1 2 3 4 3 4
Concatenation:c = ones(2,2);c_cat = [c 2*c; 3*c 4*c]c_cat = 1 1 2 2 1 1 2 2 3 3 4 4 3 3 4 4c_cat = cat(DIM,A,B);Deleting rows or columns:
c_cat(:,2)=[];
7
Structures and Cell ArraysStructure Cell ArrayWay of organizing related
dataCreate a structure, s, with
fields, x, y, and names.y = 1;s.x = [1 1];s.name = 'foo';% or equivalentys2 = struct('y',1,'x',[1 1],'name','foo');
Test for equality:% works for any s1, s2isequal(s1,s2);
Cell arrays can have entries of arbitrary datatype% create 3 by 2 cell arraya = cell(3,2); a{1,1} = 1;a{3,1} = 'hello';a{2,2} = randn(100,100);
Using cell arrays with other datatypes can be tricky
% create 2 by 1 cell arraya = {[1 2], 3};y = a{1}; % y is 1 by 2 numeric arrayycell =a(1); % is 1 by 1 cell arrayx = y+1; % allowedxcell = ycell+1; % not allowedonetwothree = [a{1:2}]; % = [1 2 3]
8
MATLAB Operators Relational operators:
Less than < Less than or Equal <= Great than or Equal >= Equal to == Not equal to ~=
Logical operators: not ~ % highest precedence and & % equal precedence with or or | % equal precedence with and
Matrix computations:+ - * / ^
A’; % transposeA \ b; % returns x s.t. A*x=bA / b; % returns x s.t. x*A=b
Element wise operators:+ Addition
- Subtraction
.* Element-by-element multiplication
./ Element-by-element division
.\ Element-by-element left division
.^ Element-by-element power
.' Unconjugated array transpose
9
MATLAB FunctionsMATLAB provides a large number of standard elementary
mathematical functions, including abs, sqrt, exp, and sin. For a list of the elementary mathematical functions, type:
help elfun For a list of more advanced mathematical and matrix
functions:
help specfun
help elmat Seek help for MATLAB function references, type:
help somefun or more detailed
doc somefun
10
Flow Control (‘if’ statement)The general form of
the ‘if’ statement is
if expression…
elseif expression…
else…
end
Example 1:if i == j
a(i,j) = 2;elseif i >= j
a(i,j) = 1;else
a(i,j) = 0;end
Example 2:if (common>60)&&(area>60)
pass = 1;end
11
Flow Control (‘switch’ statement)switch Switch among
several cases based on expression
The general form of the
switch statement is:switch switch_expr
case case_expr1…case case_expr2…otherwise…
end
Example :x = 2, y = 3;
switch xcase x==y
disp('x and y are equal');case x>y
disp('x is greater than y');otherwise
disp('x is less than y');end
% x is less than y
12
Flow Control (‘for’ loop)for Repeat
statements a specific number of times
The general form of a for statement is
for variable=expression……
end
Example 1:for x = 0:0.05:1
fprintf('%3.2f\n',x);end
Example 2: a = zeros(3,4);
for i = 1:3 for j = 1:4
a(i,j) = 1/(i+j); endend
13
Flow Control (‘while’ loop)while Repeat
statements an indefinite number of times
The general form of a
while statement is
while expression……
end
Example 1:n = 1;y = zeros(1,10);while n <= 10
y(n) = 2*n/(n+1);n = n+1;
endExample 2:
x = 1;while x
%execute statementsend
14
Flow Control (‘break’ statement)break terminates the execution of for and while loopsIn nested loops, break terminates from the innermost
loop onlyExample:
y = 3;for x = 1:10
fprintf('%d\n',x);if (x>y)
break;end
end% Question: what is the output?
15
Graphics: 2-D plotBasic commands:
Example 1 [plot(vector)]:
plot(x, 's')
plot(x,y, 's')
plot(x1, y1, 's1', x2,y2, 's2', …)
title('…')
xlabel('…') ylabel('…')
legend('…', '…')
x=0:pi/10:2*pi;
x=[sin(x)' cos(x)'];
figure;
plot(x)
16
Graphics: 2-D plot (cont’d)Example 2:
Example 3 [plot(vector,matrix)]:
t=(0:pi/50:2*pi)';
k=0.4:0.1:1;
Y=cos(t)*k;
plot(t,Y)
x = 0:0.01:2*pi;
y = sin(x);
z = cos(x);
hold on;
plot(x,y, 'b');
plot(x,z, 'g');
hold off;
17
Graphics: 2-D plot (cont’d)• plot(x1, y1,’s1’, x2,y2,’s2’, …)
t=(0:pi/100:pi)';
y1=sin(t)*[1,-1];
y2=sin(t).*sin(9*t);
t3=pi*(0:9)/9;
y3=sin(t3).*sin(9*t3);
plot(t,y1,'r:',t,y2,'b',t3,y3,'bo')
axis([0,pi,-1,1])
• Linetype - : -- -.
• Color b g r c m y k w
• Markertype . + * ^ < > v d h o p s x
plot(t,y1,'.r',t,y2, 'b+',t3,y3,'ob:')
18
Subplots
>> subplot(2,2,1)>> …>> subplot(2,2,2)>> …>> subplot(2,2,3)>> …>> subplot(2,2,4)>> …
19
Graphics: 3-D plot• plot3(x,y,z)
t=(0:0.02:2)*pi;x=sin(t);y=cos(t);z=cos(2*t);
plot3(x,y,z,'b-',x,y,z,'bd');
view([-82,58]);
box on;
legend('Chain','Gemstone')
20
Data Analysis and Statistics
21
Basic Data Analysis Import/Export data: Use the system import wizardFile -> import data -> find and open files -> finish
Use commands as follows:1. help load & help save2. help xlsread & help xlswrite
% Reading into a text filefid = fopen(‘filename.txt’,‘r’);X = fscanf(fid,‘%5d’); % or freadfclose(fid);% Writing onto a text filefid = fopen(‘filename.txt’,‘w’);count = fwrite(fid,x); % or fprintffclose(fid);
Scatter plot Statistics Toolbox:
help stats
Basic Data Analysis Function (help datafun)
Function Description
cumprod Cumulative product of elements.
cumsum Cumulative sum of elements.
cumtrapz Cumulative trapezoidal numerical integration.
diff Difference function and approximate derivative.
max Largest component.
mean Average or mean value.
median Median value.
min Smallest component.
prod Product of elements.
sort Sort array elements in ascending or descending order.
sortrows Sort rows in ascending order.
std Standard deviation.
sum Sum of elements.
trapz Trapezoidal numerical integration.
cov Covariance matrix
corrcoef Correlation coefficients
22
Data Preprocessing Missing values:You should remove NaNs from the data before performing statistical computations.
Removing outliers:You can remove outliers or misplaced data points from a data set in much the same manner as NaNs.
1. Calculate the mean and standard deviation from the data set.2. Get the column of points that lies outside the 3*std. (3σ-rule)3. Remove these points
Code Description
i = find(~isnan(x));x = x(i)
Find indices of elements in vector that are not NaNs, then keep only the non-NaN elements.
x = x(find(~isnan(x))) Remove NaNs from vector.
x = x(~isnan(x)); Remove NaNs from vector (faster).
x(isnan(x)) = []; Remove NaNs from vector.
X(any(isnan(X)'),:) = [];
Remove any rows of matrix X containing NaNs.
23
Regression and Curve FittingThe easiest way to find estimated regression coefficients
efficiently is by using the MATLAB backslash operator.Note that we should avoid matrix inversion (from slow to fast…):
% Fit X*b=Y
xx = x’*x; xy=x’*y;tic; bhat1 = (xx)ˆ(−1)*xy; toc;
tic; bhat2 = inv(xx)*xy; toc;tic; bhat3 = xx \ xy; toc;
Other ways use build-in functions: regress() or glmfit()Multiple linear regression model: y = b0 + b1x1 + b2x2 + … Example: Suppose you measure a quantity y at several values of time t.
t=[0 .3 .8 1.1 1.6 2.3]';y=[0.5 0.82 1.14 1.25 1.35 1.40]';plot(t,y,'o')grid on
0 0.5 1 1.5 2 2.50.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
24
Regression Example (cont’d) Polynomial regression: There are six equations in three unknowns, represented by the 6-
by-3 matrix
X = [ones(size(t)) t t.^2]
The solution is found with the backslash operator.a = X\y a = 0.5318 0.9191 -0.2387
Now evaluate the model at regularly spaced points and overlay the original data in a plot.
T=(0:0.1:2.5)';Y=[ones(size(T)) T T.^2]*a;plot(T,Y,'-',t,y,'o')grid on
0 0.5 1 1.5 2 2.50.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
25
Regression Example (cont’d) Linear-in-the-parameters regression, e.g. exponential function:
X = [ones(size(t)) exp(-t) t.*exp(-t)];
a = X\ya = 0.1018 0.4844 -0.2847T=(0:0.1:2.5)';Y=[ones(size(T)) exp(-T) T.*exp(-T)]*a;plot(T,Y,'-',t,y,'o')grid on
0 0.5 1 1.5 2 2.50.4
0.6
0.8
1
1.2
1.4
1.6
26
Thank You!Questions or comments?