chapter five advanced file processing. 2 lesson a selecting, manipulating, and formatting...

43
Chapter Five Chapter Five Advanced File Advanced File Processing Processing

Upload: merry-moody

Post on 26-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

Chapter FiveChapter Five

Advanced File ProcessingAdvanced File Processing

Page 2: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

22

Lesson ALesson A

Selecting, Manipulating, and Selecting, Manipulating, and Formatting InformationFormatting Information

Page 3: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

33

ObjectivesObjectives

Use the pipe operator to redirect the Use the pipe operator to redirect the output of one command to another output of one command to another commandcommand

Use the grep command to search for Use the grep command to search for a specified pattern in a filea specified pattern in a file

Use the uniq command to remove Use the uniq command to remove duplicate lines from a fileduplicate lines from a file

Page 4: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

44

ObjectivesObjectives

Use the comm and diff commands to Use the comm and diff commands to compare two files compare two files

Use the wc command to count words, Use the wc command to count words, characters and lines in a filecharacters and lines in a file

Use the manipulate and format Use the manipulate and format commands: sed, tr, and prcommands: sed, tr, and pr

Page 5: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

55

Advancing YourAdvancing YourFile Processing SkillsFile Processing Skills

The select commands, which extract dataThe select commands, which extract data

Page 6: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

66

Advancing YourAdvancing YourFile Processing SkillsFile Processing Skills

The manipulation and transformation commands alter The manipulation and transformation commands alter and transform into useful and appealing formats dataand transform into useful and appealing formats data

Page 7: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

77

Using the Select Using the Select CommandsCommands

Select commands: grep, diff, uniq, comm, Select commands: grep, diff, uniq, comm, wcwc

Using Pipes – The pipe operator (|) Using Pipes – The pipe operator (|) redirects the output of one command to redirects the output of one command to the input of another command the input of another command – An example would be to redirect the output of An example would be to redirect the output of

the ls command to the more commandthe ls command to the more command– The pipe operator can connect several The pipe operator can connect several

commands on the same command linecommands on the same command line

Page 8: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

88

Using PipesUsing Pipes

Using pipe operators and connecting commands is useful when viewing directory information

Page 9: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

99

Using the grep CommandUsing the grep Command

Used to search for a specific pattern in a file, Used to search for a specific pattern in a file, such as a word or phrasesuch as a word or phrase

grep’s options and wildcard support allow for grep’s options and wildcard support allow for powerful search operationspowerful search operations

You can increase grep’s usefulness by You can increase grep’s usefulness by combining with other commands, such as head combining with other commands, such as head or tailor tail

Page 10: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1010

Using the grep CommandUsing the grep Command

grep can take input from other commands and also be directed to provide input for other commands

Page 11: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1111

Using the uniq CommandUsing the uniq Command

Removes duplicate lines from a fileRemoves duplicate lines from a file

It compares only consecutive lines, therefore It compares only consecutive lines, therefore uniq requires sorted inputuniq requires sorted input

Uniq has an option that allows you to generate Uniq has an option that allows you to generate output that contains a copy of each line that has output that contains a copy of each line that has a duplicate a duplicate

Page 12: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1212

Using the comm CommandUsing the comm Command

Used to identify duplicate lines in sorted filesUsed to identify duplicate lines in sorted files

Unlike uniq, it does not remove duplicates, and it Unlike uniq, it does not remove duplicates, and it works with two files rather than oneworks with two files rather than one

It compares lines common to file1 and file2, and It compares lines common to file1 and file2, and produces three column outputproduces three column output– Column one contains lines found only in file1Column one contains lines found only in file1– Column two contains lines found only in file2Column two contains lines found only in file2– Column three contains lines found in both filesColumn three contains lines found in both files

Page 13: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1313

Using the diff CommandUsing the diff Command

Attempts to determine the minimal Attempts to determine the minimal changes needed to convert file1 to file2changes needed to convert file1 to file2

The output displays the line(s) that differThe output displays the line(s) that differ

The associated codes in the output The associated codes in the output indicate that in order for the files to match, indicate that in order for the files to match, specific lines must be added or deletedspecific lines must be added or deleted

Page 14: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1414

Using the wc CommandUsing the wc Command

Used to count the number of lines, words, and Used to count the number of lines, words, and bytes or characters in text filesbytes or characters in text files

You may specify all three options in one You may specify all three options in one issuance of the commandissuance of the command

If you don’t specify any options, you see counts If you don’t specify any options, you see counts of lines, words, and characters (in that order)of lines, words, and characters (in that order)

Page 15: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1515

Using the wc CommandUsing the wc Command

The options for the wc command:

–l for lines

–w for words

–c for characters

Page 16: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1616

Using the Manipulate and Using the Manipulate and Format CommandsFormat Commands

These commands are: sed, tr, prThese commands are: sed, tr, pr

Used to edit and transform the Used to edit and transform the appearance of data before it is appearance of data before it is displayed or printeddisplayed or printed

Page 17: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1717

Introducing sedIntroducing sed

sed is a UNIX editor that allows you to make sed is a UNIX editor that allows you to make global changes to large filesglobal changes to large files

Minimum requirements are an input file and a Minimum requirements are an input file and a command that lets sed know what actions to command that lets sed know what actions to apply to the fileapply to the file

sed commands have two general formssed commands have two general forms– Specify an editing command on the command lineSpecify an editing command on the command line– Specify a script file containing sed commandsSpecify a script file containing sed commands

Page 18: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1818

Introducing sedIntroducing sed

The many options of sed allow you to create new files containing the specific data you specify

Page 19: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

1919

Translating CharactersTranslating CharactersUsing the tr commandUsing the tr command

tr copies data from the standard input to tr copies data from the standard input to the standard output, substituting or the standard output, substituting or deleting characters specified by options deleting characters specified by options and patterns and patterns

The patterns are strings and the strings The patterns are strings and the strings are sets of characters are sets of characters

A popular use of tr is converting lowercase A popular use of tr is converting lowercase characters to uppercasecharacters to uppercase

Page 20: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2020

Using the pr Command toUsing the pr Command toFormat Your OutputFormat Your Output

pr prints specified files on the standard pr prints specified files on the standard output in paginated formoutput in paginated form

By default, pr formats the specified files By default, pr formats the specified files into single-column pages of 66 linesinto single-column pages of 66 lines

Each page has a five-line header, its latest Each page has a five-line header, its latest modification date, current page, and five-modification date, current page, and five-line trailer consisting of blank linesline trailer consisting of blank lines

Page 21: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2121

Using the pr Command toUsing the pr Command toFormat Your OutputFormat Your Output

Page 22: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2222

Using the pr Command toUsing the pr Command toFormat Your OutputFormat Your Output

Page 23: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2323

Lesson BLesson B

Using UNIX File-Processing ToolsUsing UNIX File-Processing Tools

to Create an Applicationto Create an Application

Page 24: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2424

ObjectivesObjectives

Design a new file-processing Design a new file-processing applicationapplication

Design and create files to implement Design and create files to implement the applicationthe application

Use awk to generate formatted outputUse awk to generate formatted output

Page 25: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2525

ObjectivesObjectives

Use cut, sort, and join to organize and Use cut, sort, and join to organize and transform selected file informationtransform selected file information

Develop customized shell scripts to extract Develop customized shell scripts to extract and combine file dataand combine file data

Test individual shell scripts and combine Test individual shell scripts and combine all scripts into a final shell programall scripts into a final shell program

Page 26: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2626

Designing a New File-Designing a New File-Processing ApplicationProcessing Application

The most important phase in developing a The most important phase in developing a new application is the designnew application is the design

The design defines the information an The design defines the information an applications needs to produceapplications needs to produce

The design also defines how to organize The design also defines how to organize this information into files, records, and this information into files, records, and fields, which are called logical structuresfields, which are called logical structures

Page 27: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2727

Designing RecordsDesigning Records

The first task is to define the fields in the The first task is to define the fields in the records and produce a record layoutrecords and produce a record layout

A record layout identifies each field by A record layout identifies each field by name and data type (numeric or name and data type (numeric or nonnumeric)nonnumeric)

Design the file record to store only those Design the file record to store only those fields relevant to the record’s primary fields relevant to the record’s primary purposepurpose

Page 28: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2828

Linking Files with KeysLinking Files with Keys

Multiple files are joined by a key – a common Multiple files are joined by a key – a common field that each of the linked files sharefield that each of the linked files share

Another important task in the design phase is to Another important task in the design phase is to plan a way to join the filesplan a way to join the files

The flexibility to gather information from multiple The flexibility to gather information from multiple files comprised of simple, short records is the files comprised of simple, short records is the essence of a relational database system. UNIX essence of a relational database system. UNIX provides several commands providing this provides several commands providing this flexibility flexibility

Page 29: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

2929

Page 30: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3030

Creating the ProgrammerCreating the Programmerand Project Files and Project Files

With the basic design complete, you now With the basic design complete, you now implement your application designimplement your application design

UNIX file processing predominantly uses UNIX file processing predominantly uses flat files. Working with these files is easy, flat files. Working with these files is easy, because you can create and manipulate because you can create and manipulate them with text editors like vi and Emacs them with text editors like vi and Emacs

Page 31: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3131

Page 32: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3232

Formatting OutputFormatting Output

The awk command is used to prepare The awk command is used to prepare formatted outputformatted output

For the purposes of developing a new file-For the purposes of developing a new file-processing application, we will focus processing application, we will focus primarily on the printf action of the awk primarily on the printf action of the awk commandcommand

Page 33: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3333

Formatting OutputFormatting Output

Awk provides a shortcut to other UNIX commands

Page 34: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3434

Using a Shell Script toUsing a Shell Script toImplement the ApplicationImplement the Application

Shell scripts should contain:Shell scripts should contain:– The commands to executeThe commands to execute– Comments to identify and explain the script so Comments to identify and explain the script so

that users or programmers other than the that users or programmers other than the author can understand how it worksauthor can understand how it works

Use the pound (#) character to mark Use the pound (#) character to mark comments in a script filecomments in a script file

Page 35: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3535

Running a Shell ScriptRunning a Shell Script

You can run a shell script in virtually any You can run a shell script in virtually any shell that you have on your systemshell that you have on your system

The Bash shell accepts more variations in The Bash shell accepts more variations in command structures that other shellscommand structures that other shells

Run the script by typing sh followed by the Run the script by typing sh followed by the name of the script, or make the script name of the script, or make the script executable and type ./ prior to the script executable and type ./ prior to the script namename

Page 36: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3636

Putting it all together toPutting it all together toProduce the ReportProduce the Report

An effective way to develop applications is An effective way to develop applications is to combine many small scripts in a larger to combine many small scripts in a larger script filescript file

Have the last script added to the larger Have the last script added to the larger script print a report indicating script script print a report indicating script functions and resultsfunctions and results

Page 37: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3737

Putting it all together toPutting it all together toProduce the ReportProduce the Report

Page 38: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3838

Putting it all together toPutting it all together toProduce the ReportProduce the Report

Page 39: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

3939

Chapter SummaryChapter Summary

The UNIX file-processing commands can be The UNIX file-processing commands can be organized into two categories: (1) select and (2) organized into two categories: (1) select and (2) manipulation and transformationmanipulation and transformationThe uniq command removes duplicate lines from a The uniq command removes duplicate lines from a sorted filesorted fileThe comm command compares lines common to The comm command compares lines common to file1 and file2, and produces output that shows the file1 and file2, and produces output that shows the variances between the twovariances between the twoThe diff command attempts to determine the The diff command attempts to determine the minimal set of changes needed to convert file1 into minimal set of changes needed to convert file1 into file2file2

Page 40: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

4040

Chapter SummaryChapter Summary

The tr command copies data read from the The tr command copies data read from the standard input to the standard output, standard input to the standard output, substituting or deleting characters specifiedsubstituting or deleting characters specifiedThe se command is a file editor designed to The se command is a file editor designed to make global changes to large filesmake global changes to large filesThe pr command prints the standard output in The pr command prints the standard output in pagespagesThe design of a file-processing application The design of a file-processing application reflects what the application needs to producereflects what the application needs to produceUse record layout to identify each field by Use record layout to identify each field by name and data typename and data type

Page 41: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

4141

Chapter SummaryChapter SummaryShell programs should contain commands to Shell programs should contain commands to execute programs and comments to identify execute programs and comments to identify and explain the programs. The pound (#) and explain the programs. The pound (#) character denotes commentscharacter denotes comments

Write shell scripts in stages so that you can Write shell scripts in stages so that you can test each part before combining them into one test each part before combining them into one script. Using small shell scripts and combining script. Using small shell scripts and combining them in a final shell script file is an effective them in a final shell script file is an effective way to develop applicationsway to develop applications

Page 42: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

4242

Page 43: Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information

4343