1 introduction to matlab & data analysis tutorials 8 and 9: cell arrays advanced text processing...
TRANSCRIPT
![Page 1: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/1.jpg)
1
Introduction to Matlab & Data Analysis
Tutorials 8 and 9: Cell Arrays
Advanced Text Processing And File Handling
Please change directory to directory E:\Matlab (cd E:\Matlab;)
From the course website
(http://www.carine.co.il/htmls/page_1176.aspx?c0=13889&bsp=14333&bssearch=4,0,5,3,41,0
)
Download:
t89.zip and unzip itWeizmann 2010 ©
![Page 2: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/2.jpg)
2
Outline
2
Cell arrays: Creating and indexing Useful functions for strings lists
Structures Advanced string manipulation
Regular expressions File handling
Reading files Writing to files High-level file handling functions
Final example – P53
![Page 3: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/3.jpg)
3
Cell Arrays – Lecture Reminders Cell arrays –
Used for keeping different types of data in the same array
For example: A{1}= 2; A{2}= 4:2:44; A{3}= ‘hello’;
Extremely useful for handling lists of strings
Notice the curly brackets
2 4:2:44 hello
Cell Cell Cell Cell Array
![Page 4: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/4.jpg)
4
Creating Cell Arrays – Lecture Reminder
A(1) = {3}; A{2} = 3; A{3} = ‘radio blabla’; A{4} = 2:2:66;B(1:3) = {3, [1, 2], ’abc’};
C = {‘george clooney’ ; … ‘richard gere’ }; %Initializing an empty cell array:
D=cell(4,2);
>>A‘ans = [ 3][ 3]
' radio blabla'[ 1x33 double]
C = ' george clooney'
' richard gere'
D = ][ ][ ][ ][ ][ ][ ][ ][
![Page 5: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/5.jpg)
5
Indexing Cell Arrays Define a cell array:>> A(1) = {3};>> A{2} = 3;>> A{3} = ‘radio blabla’;
>> A{4} = 2:2:66; (or load A.mat;)
What is the difference?A(1)
A{1}
>>x=A(1) >>class(x)
>>x=A{1}>> class(x)
>>x=A(3)>> class(x)
>>x=A{3} >>class(x)
x = [3]cellx = 3doublex = 'radio blabla'cellx = radio blablachar
3 [1,2,7] ‘Str’
Cell Cell Cell Cell ArrayTry:
![Page 6: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/6.jpg)
6
Manipulating Cell arraysJust like numerical arrays…Examples:x([1,3,5]) = {'aaa','bbb','ccc'}x = repmat(x,2,3)x(:,4)x(1:2,3:5)
% Notice:% Using curly brackets returns couple of cells
[a, b]=x{1:2}
Numerical array default value is zero, in cell array it is []
![Page 7: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/7.jpg)
7
Cell Arrays Are Very Useful For Keeping Lists of Strings
Cell arrays of strings can be treated similarly to numerical arrays.
Many functions can work both numerical & cell arrays Many functions which work on strings can handle cell
arraysload fruit.mat;%fruit={‘mango’,’banana’,’melon’,’apple’,’kiwi’,’orange’};%fruit_prices=[30 15 10 5 35 8]; Find what is the price of melon?ind = find(strcmp(fruit,’melon’));fruit_prices(ind) Sort the fruits from cheapest to most expensive[sorted_p,y]=sort(fruit_prices);fruit(y)
ans = 10
{‘apple‘,’orange‘,’melon‘,’banana‘,’mango‘,’kiwi‘}
![Page 8: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/8.jpg)
8
Manipulating Cell Arrays That Hold Lists Of Strings
unique
intersect
setdiff
union
![Page 9: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/9.jpg)
9
Manipulating Cell Arrays That Hold Lists Of Strings - Example
%fruit={‘mango’,’banana’,’melon’,’apple’,’qiwi’,’orange’};
%fruit_sales={‘mango’,’banana’,’melon’,…
’mango’,’mango’,’qiwi’,’banana’,’mango’};
Which fruits were not sold today?setdiff(fruit,unique(fruit_sales))
{'apple‘,'orange‘}
For efficiency
![Page 10: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/10.jpg)
10
ismember Function Is Useful For Mapping One List To Another
Finds if an element exists in a list>> b = {‘z’,’y’,’x’,’w’};>> a = ismember(‘x’,b)a = 1
If it does – ismember can tell you where it is>>[a,map]= ismember(‘x’,b)a=1, map=3
ismember is good for mapping one list to another – when order is important! >>[a,map]= ismember({‘x’,’y’,‘c’},b);a=[1 1 0], map=[3 2 0]
![Page 11: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/11.jpg)
11
Comparing Two Lists of Strings:ismember, find and intersect
Which function to use? I want to find the order of
elements of one list in another list?
ismember I want to find which elements
of a list are also in another list?
intersect I want to find all the
occurrences of an element in a list?
find
When the element appears in the list more than once, ismember will return only the last position
![Page 12: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/12.jpg)
12
Using ismember - Example
>> a = ismember(‘banana’, fruit_sales)a=1>> a = ismember(‘orange’, fruit_sales)a=0>> a = ismember(fruit, fruit_sales);a = [1 1 1 0 1 0]% Reminder: fruit_prices = 30 15 10 5 35 8
Example: calculate the amount of money made by each fruit sale
>> [a,b]= ismember(fruit_sales, fruit);a = [1, 1, 1, 1, 1, 1, 1, 1]b = [1, 2, 3, 1, 1, 5, 2, 1]
>> sales_money = fruit_kilos .* fruit_prices(b)sales_money = [90, 30, 10, 60, 240, 17.5, 45, 150]
![Page 13: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/13.jpg)
13
Structures
![Page 14: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/14.jpg)
14
Lecture Reminder - Structures Creation
>> dogs.name = 'rufus';>> dogs.breed = 'Bulldog';>> dogs.age = 1.5; % in years>> dogs.special_food = 'none';>> dogsdogs =
name: 'rufus' breed: 'Bulldog ' age: 1.5000 special_food: 'none‘
14
![Page 15: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/15.jpg)
15
Lecture Reminder - Structures creation
Adding more dogs…>> dogs(2).name = 'king-kong';>> dogs(2).breed = ‘Chihuahua';>> dogs(2).age = 5; >> dogs(2).special_food = 'filet mignon';
>> dogs(3).name = 'wong';>> dogs(3).breed = 'pekingese';>> dogs(3).age = 20; >> dogs(3).special_food = 'sushi';
>> dogs =
1x3 struct array with fields: name breed age special_food
15
![Page 16: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/16.jpg)
16
Structures – Short Example
Define a “fruits” structure array that has the fields: name price color
and contains two fruits of your choice
Get: Cell array of the names Array of the prices The first fruit
>> fruits(1).name = 'Lemon';>> fruits(1).color = 'Yellow';>> fruits(1).price = 20; >> fruits(2).name = 'Apple';>> fruits(2).color = 'Green';>> fruits(2).price = 10;
>> {fruits.name}'Lemon' 'Apple'>> [fruits.price]20 10>> a = fruits(1)a = name: 'Lemon' color: 'Yellow' price: 20
![Page 17: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/17.jpg)
17
Structure Advertisement
Although this tutorial focuses on cells:
Using Structures to aggregate variables that belong to the same entity makes the program easier to design, more readable and easier to debug.
![Page 18: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/18.jpg)
18
Advanced Text processing (String Manipulation)
1. Review of useful functions:1. findstr, strfind, strtok, strtrim2. sprintf
2. Regular expressions
![Page 19: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/19.jpg)
19
Review of Useful Functions For String Manipulation
So far we learned simple string manipulations: str2num, num2str strcmp, strncmp, strcmpi, strncmpi
More advance string manipulation functions (used in text processing): findstr, strfind strtok strtrim sprintf (related functions: fprintf, sscanf)
![Page 20: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/20.jpg)
20
Finding One String Inside Another - findstr and strfind
findstr(str1,str2) – Searches the longer of the two input
strings for any occurrences of the shorter string (input order does not matter!):
>> k = findstr('beauty is in the eyes of the beholder','be')
k=[1, 30]
strfind(str1,str2) The order matters: finding str2 inside
str1 str1 can be a cell array of strings!!!
![Page 21: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/21.jpg)
23
Consider the line ‘this is an example’ How we write a program that breaks it to a
cell array of single words?rem=‘this is an example’;
words=cell(0);
while 1
[tok,rem] = strtok(rem);
if isempty(tok)
break;
end
words{end+1}=tok;
end
Example –Parsing a Line Using strtok
words'
ans =
'this' 'is' 'an' 'example'
![Page 22: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/22.jpg)
25
load fruit.mat;for i=1:length(fruit) s = sprintf('Fruit number %d: %s', i, fruit{i}); disp(s);end
sprintf – Write Formatted Data Into Strings
Fruit number 1: mangoFruit number 2: bananaFruit number 3: melonFruit number 4: appleFruit number 5: qiwiFruit number 6: orange
Number String
sprintf(format,…) – write formatted data into strings
Good for creating massages for disp Related functions: fprintf, sscanf
format special characters: %s – a string %d – an integer %f – a float (short double)
![Page 23: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/23.jpg)
26
sprintf - Example Consider the cell arraynames = {'Danny', 'Noa', 'Moti'}; Write a script that prints:Number:1, Name:Danny.Number:2, Name:Noa.Number:3, Name:Moti. Answer:for i=1:length(names) s = sprintf('Number:%d, Name:%s.',…
i, names{i}); disp(s);end
See also: sscanf & fprintf
![Page 24: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/24.jpg)
27
More Useful String Manipulation Functions
strtrim(str) – removes all leading and trailing white-space>> strtrim(' do not blink ')'do not blink‘
strtok(str,delim) - breaks a string into “tokens”>> [tok,rem]=strtok('this is an example', ' ')
tok =‘this’ rem = ‘ is an example’ strfind (str1,str2) - searches str2 in str1. str1 can be a cell array of strings! >> k = strfind('beauty is in the eyes of the
beholder','be') k=[1, 30] findstr(str1,str2) – Searches the longer of the two input
strings for any occurrences of the shorter string More useful functions at:
Help -> Matlab -> Functions by category -> Strings functions
![Page 25: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/25.jpg)
28
Regular expressions
![Page 26: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/26.jpg)
29
Regular Expression - Definition
Wikipedia – Regular expressions provide a concise and flexible means for identifying strings of text
of interest, such as particular characters, words, or patterns of characters.
ind = regexp(long_str,'\w+ain')
Regular expressions
We need to learn the regular expressions “language” syntax
![Page 27: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/27.jpg)
30
Regular Expressions Syntax
Defining a pattern: [] is like OR
Any character out of a,b,c or d: [abcd] Anything other than a,b,c or d : [^abcd]
Character range: (all characters a to z) [a-z] Special Charecters used in defining a pattern:
Any character: . Whitespace: \s Newline: \n Tab: \t Any alphanumeric character: \w [a-zA-Z_0-9] Any digit: \d [0-9]
![Page 28: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/28.jpg)
31
Pattern definition - Expression Quantifiers: One or more: exp+ (Example: ‘[\w]+’) Zero or more: exp* Between n and m times: exp{n,m}Examples
Read more about “regular expressions” in the MATLAB help!(search “regular expressions” )
Function: loc = regexp(str, pattern)
Regular Expressions Syntax
‘\w\s+\w’ – Two alphanumeric expressions with one or more spaces in the middle
‘[SRM]amy’ –
Ramy, Samy or Mamy
![Page 29: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/29.jpg)
32
Using Regular Expressions to Search For Pattern occurrences In a Long String
Example:
prof_higgins = 'The rain in Spain stays mainly in the plain.';
We would like to find all the words that rhyme with ‘ain’
1. Defining the pattern: new word (preceded with space) One or more alphanumeric characters ‘ain’ pattern= ‘\w+ain[\s\.]’ OR pattern= ‘[a-zA-Z]+ain [\s\.]’
![Page 30: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/30.jpg)
33
>> prof_higgins = … 'The rain in Spain falls mainly on the plain.';
Find occurrences indices: >> loc = regexp(prof_higgins,'\w+ain')loc = [5 13 25 39]
Get pattern occurrences:>> words = regexp(prof_higgins,'\w+ain','match')words = {'rain','Spain','main','plain'}
Using Regular Expressions to Search For Pattern occurrences In a Long String
![Page 31: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/31.jpg)
34
Replace all pattern occurrences:
>> eliza_doolittle=regexprep(prof_higgins,’ain’,’yne’)
elisa_doolittle = ‘The ryne in Spyne falls mynely on the plyne.’
Split a line to the words (Good for parsing lines of input file): >> words = regexp(prof_higgins, '\s', 'split');words ={'The‘, 'rain‘,'in‘,'Spain‘,'falls‘,'mainly‘,'on‘,'the‘,
'plain.‘}
Using Regular Expressions to Replace Pattern Occurrences In a Long String
![Page 32: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/32.jpg)
35
Using Regular Expression to Parse a line (see strtok for another option)
no_rhymes = regexp(prof_higgins, 'ain\w*\s', 'split')no_rhymes =
{'The r' 'in Sp' 'falls m' 'on the plain.‘}
Error: The last word does not have space after it
Fixing it:
no_rhymes = regexp(prof_higgins, '\w+ain[\s\.]', 'split')no_rhymes =
{'The ' 'in ' 'falls mainly on the ' '' }
![Page 33: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/33.jpg)
36
Running Example – Finding Bomb Threats
You are a CIA agent,who is in charge of identifying potential bombing threats of cities, by going over emails of terrorists .
![Page 34: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/34.jpg)
37
Using Regular Expression to Identify Significant Lines
Assume an email is stored as a cell array of strings (each line in a cell), called “email”
Using Regular expression: Identify lines that contain the expression “bomb” in it. When you find such a line, print: “Help!!!” load email.mat;for i=1:length(email)
line=email{i};if( )
disp(‘HELP!!!’);end
end
~isempty(regexp(line,’bomb’))
![Page 35: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/35.jpg)
38
Using Regular Expression to Identify Significant Lines
Notice there is a “bug” in the code: load email.mat;for i=1:length(email)
line=email{i};if(~isempty(regexp(line,’bomb’)) )
disp([‘HELP!!!:’ line]);end
end
HELP!!!:thinking of bombing rehovotHELP!!!:thinking of bombing sderotHELP!!!:thinking of going to the bombamella festival next week
How do we fix the bug?Hint | is or: ‘smil[e|ed|ing]’
![Page 36: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/36.jpg)
39
Using Regular Expression to Identify Significant Lines
Here is a fix for the bug:
load email.mat;for i=1:length(email)
line=email{i};if(~isempty(regexp(line,’[Bb]omb[ed|ing|s]*\s’)))
disp([‘HELP!!!:’ line]);end
end
HELP!!!:thinking of bombing rehovotHELP!!!:thinking of bombing sderot
| is or
![Page 37: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/37.jpg)
40
Regular Expression Tokens Are Used to Retrieve Specific Part of the Pattern Occurrences
tokens = regexp( …'bla bla [email protected] bli bli [email protected] ya', …
'(\w+)@(\w+)\.ac\.il', 'tokens')
Token 1 Token 2
tokens =
{ {‘ami’, ‘weizmann’} {‘tami’ ‘tau’} }
ocuurence1
tokens{1}{1} = ‘ami’
Token1
Token2
ocuurence2Token
1Token
2
![Page 38: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/38.jpg)
41
Using Tokens to Retrieve Specific Parts of the Pattern Occurrences
Now that you identified the suspicious email, take out the threatened city Hint: Use
regexp(line, <some expression>, ‘tokens’).
for i=1:length(email)line=email{i};if(~isempty(regexp(line,’[Bb]omb[ed|ing|s|\s]*\s’))) city = regexp(line,…
'[Bb]omb[ed|ing|s|\s]*\s(\w+)',…
'tokens');disp([‘HELP!!! Bomb threat on ‘ city{1}{1}]);
endend
HELP!!! Bomb threat on:rehovotHELP!!! Bomb threat on:sderot
![Page 39: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/39.jpg)
42
Using Tokens to Retrieve Specific Parts of the Pattern Occurrences
Here is a loop-less version: load email.mat;cities = regexp(email, '[Bb]omb[ed|ing|s]*\s(\w+).*', 'tokens')
is_threat = ~cellfun('isempty',cities);cities = cities(is_threat);cities = [cities{:}];cities = [cities{:}];warnings = strcat('HELP!!! Bomb threat on: ', cities)disp(strvcat(warnings))
HELP!!! Bomb threat on:rehovotHELP!!! Bomb threat on:sderot
regexp can handle cell array
![Page 40: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/40.jpg)
43
Handling Files
![Page 41: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/41.jpg)
44
Lecture Reminder –Opening and Closing Files
Opening a file for reading:fid=fopen(‘filename’,’r’); Opening a file for writing:fid=fopen(‘filename’,’w’); fid is a scalar MATLAB integer, called a
file identifier. You use the fid as the first argument to
other file input/output routines
Always close your file!!! fclose(fid);
Permissions: ‘a’ – append‘r+’- read and writeMore in the HELP…
![Page 42: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/42.jpg)
45
Lecture Reminder –Reading a File Line by Line
Reading line by line:line = fgetl(fid); How can we read the entire file?fid = fopen('names.txt');
while feof(fid)==0tline = fgetl(fid);
if ~ischar(tline) break; endtline = strtrim(tline);%<do whatever you want>
end
fclose(fid);
Open
Close
feof – did file reached the end
fgetl – file get linebreak if not char
![Page 43: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/43.jpg)
46
Lecture Reminder – Writing to a File
Open the file for writing permission Writing, line by line, using:
fprintf(fid,format,…); % similar to sprintf!!! Format – is a string with special characters:
%s – a string, %d – an integer, %f – a float (short double) Close the file Example:
fid = fopen(‘tmp.txt', 'w');for i=1:length(lines) fprintf(fid,’this is a line: %s\n’,lines{i});Endfclose(fid);
![Page 44: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/44.jpg)
47
fid = fopen('names.txt', 'r');l_cnt = 0;
while feof(fid)==0 line = fgetl(fid); if ~ischar(line) break; end l_cnt = l_cnt +1; disp(['Line number ' num2str(l_cnt) ':' line]); end
fclose(fid);
File handling - Example
Open the file names.txt for read
Display it with line numbers:Line number 1: <line1>Line number 2: <line2> …
Close the file
![Page 45: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/45.jpg)
48
File Handling - Example Congratulations!
You were just promoted to a senior spy. You have a directory full of emails text
files. Now you need to read all emails files,
identify the bomb threat, and write them into a summary threat_report.txt file.
![Page 46: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/46.jpg)
49
File Handling - ExampleSolution strategy:1. Open output the threats file 2. Go over all the emails in a given
directory:1. Open an input email file2. Read it, line by line 3. identify threats
When a threat is identified – Print the line
4. Close the input email file3. Close output threats file
![Page 47: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/47.jpg)
50
File Handling – Example:Programs Design
searchEmailsDirForThreats – Open report output file Open a directory and get all the files
names For each file run
searchEmailForThreats – Open email input file Search line by line for threat If threat is found –
Write the threat to the output file
1. Email file name2. Report output fid
![Page 48: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/48.jpg)
51
File Handling – Example:Main Function Design
function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname)
%<getting all files names> % <opening report output file>
% <going over the files>
% <closing report output file>
![Page 49: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/49.jpg)
52
File Handling – Example:Main Function Design
function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname
%<getting all files names>if (~isdir(in_emails_dir)) error([in_emails_dir ' is not a directory']);end % getting file namesfs = dir(in_emails_dir);file_names = {fs.name};
Directory management:
dir, pwd, cd, copyfile, delete, movefile, mkdir, rmdir, …
![Page 50: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/50.jpg)
53
File Handling – Example:Main Function Design
function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname
%<getting all files names> % <opening report output file>out_report_fid = fopen(out_report_fname, 'w');if out_report_fid < 0 error(['File ' ,out_report_fname ,' could not open']);end threats_found = 0;
![Page 51: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/51.jpg)
54
File Handling – Example:Main Function Design
function threats_found = searchEmailsDirForThreats(in_emails_dir, out_report_fname % <going over the files>for i=1:length(file_names) email_fname = file_names{i};
if (~isdir(email_fname)) threats_found = threats_found + ..
searchEmailForThreats(out_report_fid, … [in_emails_dir '/' email_fname]); end end% <closing report output file>fclose(out_report_fid);
![Page 52: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/52.jpg)
55
File Handling – Example:Looking for Threats In an Email
function threats_found = searchEmailForThreats(out_report_fid,email_fname)
% <opening email input file>%<going over the file line by line>while feof(in_fid) == 0 % <read line> if % <is found threat>
%<get the threatened city> % <adding to the report> endend%<closing input file>
![Page 53: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/53.jpg)
56
File Handling – Example:Opening File For Read
function threats_found = searchEmailForThreats(out_report_fid,email_fname)
% <opening email input file>in_fid = fopen(email_fname, 'rt');if in_fid < 0 error(['File ' , email_fname ,' was not found.']);end threats_found = 0; l_cnt = 0;
![Page 54: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/54.jpg)
57
File Handling – Example:Reading a File Line by Line
function threats_found = searchEmailForThreats(out_report_fid,email_fname)
% <opening email input file>%<going over the file line by line>while feof(in_fid) == 0 % <read line > line = fgetl(in_fid); if ~ischar(line) break; end l_cnt = l_cnt+1; line = strtrim(line); if % <is found threat>
… endend
![Page 55: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/55.jpg)
58
File Handling – Example:Using Regular expression to find and retrieve pattern occurences
while feof(in_fid) == 0 % <read line> % <is found threat> if (~isempty(regexp(line,'.*bomb.*'))) city = regexp(line, '.*bomb\w*\s([\w-]+).*', 'tokens'); % <adding to the report> fprintf(out_report_fid,'File: %s, Line number:%d, Threat on %s - %s\n', ... email_fname , l_cnt, city{1}{1},line); threats_found = threats_found + 1; end
end
![Page 56: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/56.jpg)
59
File Handling – Example:Looking For Threats in an Email
function threats_found = searchEmailForThreats(out_report_fid,email_fname)
% <opening email input file>%<going over the file line by line>while feof(in_fid) == 0 % <read line> if % <is found threat>
%<get the threatened city> % <adding to the report> endend%<closing input file>fclose(in_fid);
![Page 57: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/57.jpg)
60
High-Level File Handling Functions
![Page 58: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/58.jpg)
61
Matlab Has a Collection of High Level Write / Read Functions
Matlab has a collection of high level read and write functions
These functions can save the need to write read/ write the file line by line.
Examples: dlmread, dlmwrite textread, textscan xlsread importdata
![Page 59: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/59.jpg)
62
High-level File Reading Function Example- textread
Reading an entire text file in one line: lines=textread(filename,format,parameters) Example: When reading a file containing a single word in every
line: names=textread(‘names.txt’,’%s’);
If there are more words in a line – each word will be read separately
Example 1:
email=textread(‘email.txt’,’%s’); What happens?
email = {'thinking' 'of'' bombing' 'rehovot''thinking‘…}
![Page 60: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/60.jpg)
63
High-level File Reading Function Example- textread
Example 2: Reading a text file, line by line Try:
email = textread('email.txt', '%s', 'delimiter','\n‘);
What happens?
email = {'thinking of bombing rehovot''thinking of bombing sderot''thinking of going to the bombamella festival next week’}
![Page 61: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/61.jpg)
64
MATLAB functions for High-level file reading
Reading an entire Excel file in one line:
[nums,t]=xlsread(filename,options…) Will create a numerical array nums and a
cell array t. Try:
[n,t]=xlsread('rt_example3.xls') What happens?
Textual cells are set to NaNs in n Numerical cells are set to ‘’ (empty strings) in t
Note: can read each sheet (read the HELP)
![Page 62: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/62.jpg)
65
MATLAB functions for High-level file reading
Reading an entire Excel/tab delimited text file /other preformatted files:
A=importdata(filename,options…) Will create a structure A, which contains:
A.data - numerical array A.textdata - a cell array.
Try: A=importdata('rt_example3.xls') What happens?
![Page 63: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/63.jpg)
66
Summery – File Handeling
Matlab has diverse and powerful functions for text processing
Before you start coding using low levels I/O function – Check if one of the high level functions solves it.
![Page 64: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/64.jpg)
67
Final example:Looking for p53 TFBS
(Transcription Factor Binding Sites)in human promoters
![Page 65: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/65.jpg)
68
Looking for p53 TFBS in human promoters
A TF can recognize a variable site Some positions are fixed Some are optional, e.g. A/T are
acceptable, but not G/C. Consensus sequence: the pattern
representing all possible recognized sites.
![Page 66: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/66.jpg)
69
Looking for p53 TFBS in human promoters
Let’s define a consensus for p53 half-site:1. Pos #1: G/A/T2. Pos #2: G/A3. Pos #3: A/G/C4. Pos #4: C5. Pos #5: A/T6. Pos #6: A/T7. Pos #7: G8. Pos #8: N9. Pos #9: T/C/G10. Pos #10: T/C
Variable space0-13
Half-site Half-site
![Page 67: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/67.jpg)
70
Looking for p53 TFBS in human promoters
How do we even start???1. Read the promoter file into a cell array.2. Go through the promoters:
Look for the p53 consensus (need to define it – regular expression) When we find it store the data on the hit
3. Open a result file4. Go through all the hits you found
Print them into the results file
![Page 68: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/68.jpg)
71
Looking for p53 TFBS in human promoters
1. Reading the promoter file:
The file name: masked_promoters.some.txtThe file format: FASTA>gene1 header lineSequence…Sequence…> gene2 header lineSequence…Sequence…
>GENE=ENSG00000001036 Transcript=1 LLid=2519 orgDBsym=FUCA2 other details… CCATGTTCTAAACGACTTCATAGATTTATTTCTTTCAGTCAT…
![Page 69: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/69.jpg)
72
Looking for p53 TFBS in human promoters
1. Reading the promoter file:promoters={};ensID={};symb={};
fid=fopen('masked_promoters.all.txt');while feof(fid)==0 tline = fgetl(fid); >process the data> endfclose(fid);
![Page 70: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/70.jpg)
73
Looking for p53 TFBS in human promoters
1. Reading the promoter file:while 1 >from previous slide…> if(tline(1)=='>') %it is a header tmp=regexp(tline,…
'.*GENE=(\w+)\s.*orgDBsym=(\w+)',… 'tokens');
ensID{end+1}=tmp{1}{1}; symb{end+1}=tmp{1}{2}; else %it is a promoter promoters{end+1}=tline; endend
![Page 71: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/71.jpg)
74
Looking for p53 TFBS in human promoters
2. Go through the promoters:
hit_seq={};hit_gene=[];hit_pos=[];p53_consensus = ...'[GAT][GA][AGC]C[AT][AT]G.[TCG][TC].{0,13}[GAT][GA]
[AGC]C[AT][AT]G.[TCG][TC]';
for i=1:length(promoters) [m s e] = regexp(promoters{i}, p53_consensus, 'match', …
'start', 'end');%let’s ignore that DNA is double stranded…
if(~isempty(m)) hit_seq(end+1:end+length(m))=m; hit_gene(end+1:end+length(m))=repmat(i,1,length(m)); hit_pos(end+1:end+length(m))=s; endend
![Page 72: 1 Introduction to Matlab & Data Analysis Tutorials 8 and 9: Cell Arrays Advanced Text Processing And File Handling Please change directory to directory](https://reader030.vdocuments.net/reader030/viewer/2022032313/56649e695503460f94b6610b/html5/thumbnails/72.jpg)
75
Looking for p53 TFBS in human promoters
3&4. Open a result file, print all the hits
fid=fopen('p53_TFBS.txt','w');%printing a header linefprintf(fid,'gene ID\tgene name\tsite\tpos\n');for i=1:length(hit_gene) fprintf(fid,'%s\t%s\t%s\t%d\n',
ensID{hit_gene(i)},... symb{hit_gene(i)},...
hit_seq{i},... hit_pos(i));
endfclose(fid);