chapter four unix file processing. 2 lesson a extracting information from files

39
Chapter Four Chapter Four UNIX File Processing UNIX File Processing

Upload: kathleen-daniel

Post on 26-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

Chapter FourChapter Four

UNIX File ProcessingUNIX File Processing

Page 2: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

2

Lesson A

Extracting Information from Files

Page 3: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

3

Objectives

• Explain the UNIX approach to file processing

• Use basic file manipulation commands

• Extract characters and fields from a file using the cut command

Page 4: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

4

Objectives

• Rearrange fields inside a record using the paste command

• Merge files using the sort command

• Create a new file by combining cut, paste, and sort

Page 5: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

5

UNIX Approach toFile Processing

• Based on the approach that files should be treated as nothing more than character sequences

• Because you can directly access each character, you can perform a range of editing tasks – this offers flexibility in terms of file manipulation

Page 6: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

6

Understanding UNIX File Types

• Regular files, also known as ordinary files– Create information that you maintain and manipulate, and include ASCII

and binary files – represented by a – in the 1st position of the file permissions.

• Directories– System files for maintaining file system structure – represented by a d in

the 1st position of the file permissions.

• Special files– Character special files relate to serial I/O devices

• Communicates one character at a time – represented by a c In the 1st position of the file permissions

– Block special files relate to devices such as disks• Communicates using blocks of data – represented by a b in the 1st position

of the file permissions.

Page 7: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

7

File Structures

• Files can be structured in many ways depending on the kind of data they store

• UNIX stores data, such as letters and product records, as flat ASCII files

• Three kinds of regular files are– Unstructured ASCII character– Unstructured ASCII records– Unstructured ASCII trees

Page 8: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

8

Page 9: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

9

Processing Files• When performing UNIX commands, UNIX

processes data by receiving input from a standard input device (e.g. keyboard) and sends it to a standard output device (e.g.monitor)

• System administrators and programmers refer to standard input as stdin, standard output as stdout

• A third standard device is called standard error, or stderr. When UNIX detects errors, it directs the data to stderr, which is the monitor

Page 10: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

10

Using Input and Error Redirection

• You can use redirection operators to retrieve input from something other than the standard input device and send output to something other than the standard output device

• Examples of redirection:– Redirect the ls command output to a file, instead of to

the monitor (or screen)– Redirect a program that receives input from the

keyboard to receive input from a file instead– Redirect error messages to files, instead of to the

screen by default

Page 11: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

11

Using Input and Error Redirection

Create a file by: typing in all the commands,or by redirecting the cat command output to a file

Page 12: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

12

Creating Files – cat and touch• When you manipulate files, you work with the files

themselves, as well as their contents• Create files using output redirection

– cat command - concatenate text via output redirection – creates a file and enters text into the file.

• cat >file1– Each character that you type will be entered into the file. To

terminate file entry, <CTRL>c.– touch command - used to create empty files or to change the timestamp on

a file• touch file1

– Creates an empty file

Page 13: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

13

Deleting Files – rm• To delete files

– The rm command permanently removes a file or an empty directory

• rm file1 (will remove the specified file from the dir)• rm f* (will remove all files beginning with an f in the working dir)

The -r option of the rm command will The -r option of the rm command will remove a directory and everything it remove a directory and everything it contains as well as any directory contains as well as any directory beneath. beneath. Be very careful with this command. Be very careful with this command. You can remove an entire branch of the You can remove an entire branch of the directory tree!directory tree!

In the directory structure on the right, In the directory structure on the right, rm –r work rm –r work

will remove the work directory, file3, file4, the will remove the work directory, file3, file4, the projects directory, and the file spec!projects directory, and the file spec!

Page 14: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

14

Copying files - cp

• Copy files as a means of back-up or as a means to assist with new file creation– cp command - copies the file(s) specified by

the source path to the location specified by the destination path

• cp file1 file2 (simply creates a copy of file1) • cp file1 newdir/file2 (creates a copy of file1 in the

directory newdir)• cp file1 file2 file3 newdir (copies all three files to the

directory newdir)

Page 15: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

15

Moving Files – mv• The mv command moves a file

from one directory to another directory. – mv file1 work

• This command will remove file1 from the jdoe directory and move it to the work directory.

• The mv command can also be used to rename a file within the current directory without moving it.– mv file1 myfile

• This command will simply rename file1 to myfile. It will remain in the jdoe directory

Page 16: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

16

Finding files - find• The find command helps you locate a file in the directory

structure by name, size, date last modified, etc.– The first parameter specifies the directory from which the search

will begin, you may search by filename using the –name parameter, by last access time using the –atime parameter, by group name using the –group parameter, by last modification time using the –mtime parameter. See ‘man find’ for a full list of parameters.

• To search for file1 from your current directory:– find . –name file1 (. Indicates the current directory)

• To search for file1 from the / directory:– find / -name file1

• To search for all files beginning with an ‘f’ from your current dir:– find . –name “f*” (You must use double quotes around a name with a

wildcard)• To search for all files from / belonging to the group Acct:

– find / -group Acct• To search for all files created or modified within the last 5 days:

– find . –mtime -5

Page 17: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

17

Manipulating Files

Page 18: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

18

Manipulating Files – Combining Multiple Files

• Combining files using output redirection– cat command - concatenate text of two different

files via output redirection• cat product1 product2 >combinedprods

– combinedprods will consist of all of the records in product1 followed by all of the records in product2

– paste command - joins text of different files in side by side fashion

• paste product1 product2 >sidebyside– sidebyside will consist of the records in product1 and the

records in product2 in 2 columns.

• Extracting fields of a file using output redirection– cut command - removes specific columns or fields

from a file

Page 19: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

19

Manipulating Files

Page 20: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

20

Manipulating Files - sort

• Re-arranging the contents of a file– sort command - sorts a file’s contents

alphabetically or numerically– The sort command offers many options:

• You can sort the contents of a file and redirect the output to another file

• Utilizing a sort key which provides the option of sorting on a field position within each line

Page 21: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

21

Manipulating Files

Page 22: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

22

Lesson B

Assembling Extracted Information

Page 23: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

23

Objectives

• Create a script file

• Use the join command to link files using a common field

• Use the awk command to create a professional-looking report

Page 24: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

24

Using Script Files

• UNIX users create shell script files to contain commands that can be run sequentially as a set – this helps with the issues of command automation and re-use of command actions

• UNIX users use the vi editor to create script files, then make the script executable using the chmod command with the x argument

Page 25: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

25

Using Script Files

Type out the script and then make it executable using the chmod command.

Page 26: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

26

Scripts• Scripts can be used to simply give a short name to a complex

command or to combine multiple UNIX commands into a single command.

• Your scripts should always be placed in the bin directory beneath your home directory. You may have to create this directory if you do not have one already. (mkdir bin)

• You also need to check to make sure that UNIX will find your script.– To do this, type in

• echo $PATH• View the directories that are listed. • Do you see youruserid/bin? For example, my username is marty. I look

for /home/fac/marty/bin.– If you see it – you’re fine, skip the next step. Your bin directory will be searched

any time you issue a command.

– If you DON’T see it,

» From your home directory, type in

» PATH=$PATH:bin

• Now we’re ready to create our first script….

Page 27: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

27

Scripts cont.

• Let’s write a script called home to change our current directory to the home directory.– vi bin/home

• # This script will take you to your home directory from any location• cd ..• <Escape>:wq

– The # symbol at the beginning of a line makes the line a comment

– Now, we need to make the script executable.• chmod u+x bin/home …. Or….. chmod 700 bin/home• We will now be able to run the script by simply typing in the script name.

– home (and the script executes automatically!)

– We will discuss script files in much more detail in Chapter 6 and 7.

Page 28: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

28

Using the Join Command• The join command is used in relational database

processing

• Relational databases consider files as tables and records as rows

• Relational databases also consider fields as columns that can be joined to create new records

• The UNIX join command lets you extract information from files sharing a common field. You can use this command to associate lines in two files on the basis of a common field that they both share.

Page 29: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

29

Page 30: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

30

Using the Join Command to Create the Vendor Report

Use the join command to create reports showing the relationship between two files

Page 31: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

31

A Brief Introduction to theAwk Program

• Awk, a pattern-scanning and processing language helps to produce professional-looking reports

• The awk command lets you do the same things as the cat command (in conjunction with the join command), but more quickly and easily

Page 32: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

32

A Brief Introduction to theAwk Program

Awk uses a print formatting function from the C programming language to achieve a more professional-looking report

Page 33: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

33

Using the awk Command toRefine the Vendor Report

• To refine and automate the vendor report, create a shell script that includes only the awk command, not a series of separate commands. To have awk perform the automation properly, redirect its input to come from a disk file and not from the keyboard.

Page 34: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

34

Using the awk Command toRefine the Vendor Report

Awk has many features that let you manage your report output to your specification

Page 35: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

35

Chapter Summary• UNIX supports regular files, directories, and

character and block special files

• File’s structures depend on data being stored and three kinds of regular files are unstructured ASCII characters, records and trees

• When running, UNIX receives input from the standard input device (keyboard) also known as stdin, and sends output to the standard output device (monitor) also known as stdout. Another standard device, stderr, refers to the error file that defaults to the monitor

Page 36: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

36

Chapter Summary

• The touch command updates a file’s time and date stamps and creates empty files

• The rmdir command removes empty directories

• The cut command extracts specific columns or fields from a file

• To combine two or more files, use the paste command

• Use the sort command to sort a file’s contents alphabetically or numerically

Page 37: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

37

Chapter Summary

• To automate command processing, include commands in a script file that you can later execute as a program

• Use the join command to extract data from two files sharing a common field and use this field to join the two files

• Awk is a pattern-scanning and processing language useful for creating a formatted report with a professional look

Page 38: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

38

Page 39: Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files

39