cis52 – file manipulation

54
1 © 2001 John Urrutia. All rights reserved. CIS52 – File Manipulation File Manipulation Utilities Regular Expressions sed, awk

Upload: trula

Post on 12-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

CIS52 – File Manipulation. File Manipulation Utilities Regular Expressions sed, awk. Overview. comm – comparison of sorted files cut – output sections of lines in a file find – find files that match a pattern paste – merges records in files pr – paginate files into pages - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CIS52 – File Manipulation

1© 2001 John Urrutia. All rights reserved.

CIS52 – File Manipulation

File Manipulation Utilities Regular Expressions

sed, awk

Page 2: CIS52 – File Manipulation

2© 2001 John Urrutia. All rights reserved.

Overviewcomm – comparison of sorted filescut – output sections of lines in a filefind – find files that match a patternpaste – merges records in filespr – paginate files into pagestr – translate or delete characters

Page 3: CIS52 – File Manipulation

3© 2001 John Urrutia. All rights reserved.

Overviewregular expressionssed – Stream Editor (batch file editor) awk – Aho,Weinberger,Kernighan (Pattern

match)

Page 4: CIS52 – File Manipulation

4© 2001 John Urrutia. All rights reserved.

The comm before the stormCompares 2 sorted files

Results reported in 3 columns1st – records found only in file 12nd – records found only in file 23rd – records that match in both files

Options remove corresponding columns – [1] [2] [3]

Page 5: CIS52 – File Manipulation

5© 2001 John Urrutia. All rights reserved.

comm – cont.Either file name can be substituted

with standard input

Example:File1 File2

aa bbdd ccee ddgg eehh ff

Page 6: CIS52 – File Manipulation

6© 2001 John Urrutia. All rights reserved.

comm resultsFile1 File2 Bothaa

bbcc

ddee

ffgghh

option -1

bbcc

ddee

ff

option -2aa

ddee

gghh

option -12ddee

Page 7: CIS52 – File Manipulation

7© 2001 John Urrutia. All rights reserved.

cut to the chaseAllows you to extract portions of

each record in a file.

Delimits data in the file into fields or columns.Default delimiter is the tab character

Can be changed by the –d option

Page 8: CIS52 – File Manipulation

8© 2001 John Urrutia. All rights reserved.

cut cont.cut - [b | c | [ f [-d char] [-s] ] list

[--output-delimiter=string]b – bytes

c – characters (same as bytes)

f – fieldsd – delimiter characters– display only records with

delimiters

Page 9: CIS52 – File Manipulation

9© 2001 John Urrutia. All rights reserved.

cut ! printchar – single byte used to delimit

fields in a record

list – list of range/s of characters to displayRanges are comma separated.

1-7 first 7 characters in record

1,7 first and seventh characters

Page 10: CIS52 – File Manipulation

10© 2001 John Urrutia. All rights reserved.

cut ! print againstring – list of characters to

substitute for the delimiters.

Page 11: CIS52 – File Manipulation

11© 2001 John Urrutia. All rights reserved.

cut - Example

[/@linux2 uid]$ cat file1The quick brown fox eyed the jactitating dog[/@linux2 uid]$ cut –f1,3,5,8 –d’ ‘ file1The brown eyed dog[/@linux2 uid]$ cut –f1,4-6,8 –d’ ‘ file1The fox eyed the dog

Page 12: CIS52 – File Manipulation

12© 2001 John Urrutia. All rights reserved.

find that pot of goldfind – selects all files that meet the

selection criteria in the expressionNo action is taken unless it is specified

Sub-directories are scanned automatically

The expression can be simple or complex

Page 13: CIS52 – File Manipulation

13© 2001 John Urrutia. All rights reserved.

find me somethingThe criteria expression:

And’s each operand separated by a space

Or’s each operand separated by –o

Processes left to right sequentially

Page 14: CIS52 – File Manipulation

14© 2001 John Urrutia. All rights reserved.

find criteria continuedActions

-print prints the path of all files that meet the selection criteria

-exec cmds\; executes the commands before the \:

-ok same as –exec but must have a Y from stdin.

Page 15: CIS52 – File Manipulation

15© 2001 John Urrutia. All rights reserved.

find criteria continued again

Evaluations-type specify a type of file (ie. directory)

-atime ±n accessed ±n days ago.

-mtime ±n modified ±n days ago.

-user uid owner of the file

-nouser uid owner is not known to system

Page 16: CIS52 – File Manipulation

16© 2001 John Urrutia. All rights reserved.

paste tastes goodpaste [options] [filelist]

each record in the file is merged into 1 record-s process filelist sequentially. All

records are processed before going to the next file

-d [delimiter list] each character in turn delimits the file records.

Page 17: CIS52 – File Manipulation

17© 2001 John Urrutia. All rights reserved.

paste continued[/@linux2 uid]$ cat file1

ABC

[/@linux2 uid]$ cat file2123

[/@linux2 uid]$ cat file3xyz

Page 18: CIS52 – File Manipulation

18© 2001 John Urrutia. All rights reserved.

paste continued

[/@linux2 uid]$ paste file1 file2 file3

Output file

A 1 xB 2 yC 3 z

[/@linux2 uid]$ paste –s file1 file2 file3

Output file

A B C1 2 3x y z

Page 19: CIS52 – File Manipulation

19© 2001 John Urrutia. All rights reserved.

pr – public relations--NOTpr paginate file(s) for printing

Can specify page attributesChanged lines through the –l option

For multiple files each starts a new page

Page 20: CIS52 – File Manipulation

20© 2001 John Urrutia. All rights reserved.

pr – continuedpr paginate a file for printing

Creates a header and trailerChanged through the –h optionSuppress through the –t option

Can create columns of data–nbr Number of columns per line–Sx Character used to separate

columns

Page 21: CIS52 – File Manipulation

21© 2001 John Urrutia. All rights reserved.

pr – continuedCan create numbers for each line

–nckc - character data separator

default is tab characterk – number of digits

Page 22: CIS52 – File Manipulation

22© 2001 John Urrutia. All rights reserved.

Regular ExpressionsA set of characters that define the

criteria used to identify a string within a record.

Used by vi, grep, sed, awk, and others.

Page 23: CIS52 – File Manipulation

23© 2001 John Urrutia. All rights reserved.

tr – Translate thistr – [c] [d] [s] [t] set1 [ set2 ]

Translate from set1 to set2c – compliment of set1

d – delete characters found in set1

s – squeeze out duplicates

t – truncate set1 to length of set2

Page 24: CIS52 – File Manipulation

24© 2001 John Urrutia. All rights reserved.

Regular ExpressionsSimple strings

Bound by / … /Interpreted literallyie. /e D/ - matches exactly e D

Taste Dee – OK Taste don’t – not OK

Page 25: CIS52 – File Manipulation

25© 2001 John Urrutia. All rights reserved.

Regular ExpressionsThe • special single sub character

Matches any single character

ie. – /.eny/ matches Aeny Beny Ceny

The [ char-range ] define a character class

The [^ char-range ] define the not-in-character class

Page 26: CIS52 – File Manipulation

26© 2001 John Urrutia. All rights reserved.

Regular ExpressionsThe

(asterisk)Matches 0 or more of the preceding character.

What’s this?

/. // [ a-zA-Z ] /

/ ([^)] )/

Page 27: CIS52 – File Manipulation

27© 2001 John Urrutia. All rights reserved.

Regular Expressions

The /^ (for the rabbit) characterIn the beginning …

The $/ (for the teacher) characterAt the end …

Page 28: CIS52 – File Manipulation

28© 2001 John Urrutia. All rights reserved.

Regular ExpressionsQuote the raven – backslash

\. This yields •

\\ This yields \

\* This yields *

\[ This yields [

\] This yields ]

\ / This yields /

Page 29: CIS52 – File Manipulation

29© 2001 John Urrutia. All rights reserved.

sed – the old Stream EDitor sed [-n] [-fscript ] [file-list]

Copies and edits to standard output

Edits file(s) in a non-interactive mode

Gets its instructions from a script file–f filename contains sed instructions

No option 1st command argument is used

–n suppress stdout unless specified

Page 30: CIS52 – File Manipulation

30© 2001 John Urrutia. All rights reserved.

sed – the old mill stream Record processing

1. Read record from file list

2. Read record from script (or cmd line)

3. Apply selection criteria

4. If selected perform instructionand repeat 2 4 until no more script

5. Repeat 1 5 until no more file list.

Page 31: CIS52 – File Manipulation

31© 2001 John Urrutia. All rights reserved.

He sed what!!??Instruction format

[addr1 ] ,addr2 ] ] inst [arg-list]

AddressA line number

Regular expression

Addr1 – start

Addr2 – stop

Page 32: CIS52 – File Manipulation

32© 2001 John Urrutia. All rights reserved.

Address line numbers$ Designates the last line of the last file

1st address line numberStarts selecting records based on their

position in the input file list relative to 1.

2nd address line numberStops selecting records when position in

the input file list is > than the line number.

Page 33: CIS52 – File Manipulation

33© 2001 John Urrutia. All rights reserved.

He sed some moreInstructions

! – Not negates the address selection sed ‘!/line/ p’ file.list

{…} – Groups the instructions for the address selection

Page 34: CIS52 – File Manipulation

34© 2001 John Urrutia. All rights reserved.

sed Instructionsp – Print now and continue

d – Delete and get the next record

q – Quit processing; Stop; Go Away

Page 35: CIS52 – File Manipulation

35© 2001 John Urrutia. All rights reserved.

sed Instructionsc – Change

[addr1] [addr2] c\ yada yada yadaall selected records are replaced as a group by the change value

a – Append[addr1] a\ …

add the text to the end of the selected records

Page 36: CIS52 – File Manipulation

36© 2001 John Urrutia. All rights reserved.

sed Instructionsi – Insert

[addr1] a\ … add the text to the beginning of the selected records

n – Next[addr1] n

writes the current, gets the next and continues the script

Page 37: CIS52 – File Manipulation

37© 2001 John Urrutia. All rights reserved.

sed Instructionsw – Write

[addr1] [,addr2] w filename

writes the selected records to a file

r – Read[addr1] r filename

reads records from the filename and appends them to the selected record

Page 38: CIS52 – File Manipulation

38© 2001 John Urrutia. All rights reserved.

sed Instructionss – Substitute

[addr1] [,addr2] s/ptrn /repl /[g] [p] [w f ]for each selected record match the pattern and replace

g – Replace all non-overlapping occurrences

p – Print the record

w – write the record to the filename

Page 39: CIS52 – File Manipulation

39© 2001 John Urrutia. All rights reserved.

Hawk – Squawk – awk The programmable utility that does everything.

Aho – Weinberger – Kernighan

Provides:Conditional execution

Looping

Handles:Numeric & string variables

Regular expresions

C print facilities

Page 40: CIS52 – File Manipulation

40© 2001 John Urrutia. All rights reserved.

awkawk [–Fc] [–f] program-file [ file list ]

F – field delimiter character

f – name of the awk program file

program-file instream instructions

List of files to process

Page 41: CIS52 – File Manipulation

41© 2001 John Urrutia. All rights reserved.

awk – program linespattern [ action ]

Like sed pattern selects records

Record processing is the same as sed

Page 42: CIS52 – File Manipulation

42© 2001 John Urrutia. All rights reserved.

awk – patternPatterns follow regular expression format.

~ Tests for match to regular expression

!~ Tests for NO match to regular expression

, – Establishes a pattern range all records are processed inclusively within the range

BEGINexecutes before the first record is processed

ENDexecutes after the last record is processed

Page 43: CIS52 – File Manipulation

43© 2001 John Urrutia. All rights reserved.

awk – relational operators< – less than

<= – less than or equal to

== – equal to

!= – not equal to

>= – greater than or equal to

> – greater than

Page 44: CIS52 – File Manipulation

44© 2001 John Urrutia. All rights reserved.

awk – operatorsArithmetic

+ – addition

- – subtraction

* – multiplication

/ – division

Assignment= – assigns value to the left

+= – adds value to the left

Page 45: CIS52 – File Manipulation

45© 2001 John Urrutia. All rights reserved.

awk – boolean operators&& – and

|| – or

! – not

Page 46: CIS52 – File Manipulation

46© 2001 John Urrutia. All rights reserved.

awk – actions# - Comment to the right on any line

Default action is print to stdout

Multiple actions can be takenUse {…} to enclose multiple actions

Separate actions with ;

Page 47: CIS52 – File Manipulation

47© 2001 John Urrutia. All rights reserved.

awk – actionsprint variable …

Var , Var2 , Var3Prints variables separated by delimiter

Var Var2 Var3NO separators

“literal value “Prints exactly everything between the “ “

Page 48: CIS52 – File Manipulation

48© 2001 John Urrutia. All rights reserved.

awk – actionsprintf “cntl string” variable …

Control String\n – new line\t – tab

%[-] [n] [.d] conv char- left justificationn number of character.d decimal positions

Page 49: CIS52 – File Manipulation

49© 2001 John Urrutia. All rights reserved.

awk – actions%[-] [n] [.d] conv char

- left justificationn number of character.d decimal positionsconv char – conversion character

d - decimal, e - exponent, f - floating-pointo - octal, x - hexadecimals - string

Page 50: CIS52 – File Manipulation

50© 2001 John Urrutia. All rights reserved.

awk – variablesawk provided variables

NF – total number of fields

$1…$n – each field in the current record

FS – input field separator (default space or tab )

OFS – output field separator (default space )

Page 51: CIS52 – File Manipulation

51© 2001 John Urrutia. All rights reserved.

awk – variablesawk provided variables

NR – current record number

$0 – entire current record

RS – record separator (default newline )

ORS – output record separator (default newline )

FILENAME – name of current input file

Page 52: CIS52 – File Manipulation

52© 2001 John Urrutia. All rights reserved.

awk - variablesAssociative Arrays

array_name [ string ]The array name should be meaningfulThe index of the array is a stringElements are automatically created

for ( element in array ) actions

Page 53: CIS52 – File Manipulation

53© 2001 John Urrutia. All rights reserved.

awk - functionslength(string) – returns the number of

characters in string

int(num) – returns the integer portion

index(str1,str2) – returns the index of str2 found in str1 or 0 if not present

split(str,arr,del) – populates arr[ ] from fields in str delimited by del – returns count of elements.

Page 54: CIS52 – File Manipulation

54© 2001 John Urrutia. All rights reserved.

awk - functionssprintf(fmt , args) – formats args using

the fmt and returns the formatted string.

substr(str , pos , len) – returns a substring of str starting with position pos for a length of len.