Download - grep (Global REgular expresion Print)
grep (Global REgular expresion Print)
• Operation– Search a group of files– Find all lines that contain a particular regular expression
pattern– Write the result to an output file– grep returns to the prompt with no extra output when it is
done• Syntax: grep [-cilLnrsvwx] pattern [list of files]• Examples
– find information about the user, harley>grep harley /etc/passwd
– Find all lines in the files containing the string xxx .>grep xxx .
grep Flags
1. -c count the number of matches2. -i Ignore case when searching for matches3. -l List the file names containing matches4. -L list files that do not have a match5. -n Write the line number in front of each line6. -r perform a recursive directory search7. -s suppress warning and error messages8. -v search for lines without the matching pattern9. -w search only for complete words10. -x lines that exactly match the pattern
Regular Expressions
• Industry standard way to specify patterns– In Java: string.match("pattern");– In Java: string.replaceAll("pattern", string)
• Meta-characters/operators (some need to be escaped)^ beginning of line, $ end of a line* match 0 or more of the previous group+ match 1 or more of the previous group? match 0 or one of the previous group{n} match n of the previous group{m,n} match m to n of the previous group{n,} match n or more of the previous group| match either the group before or the groups after. match any character except for new line\ literally interpret the following meta-character or operator
Note: Many UNIX programs use these (vi, sed, more, grep, awk)
Regular Expression ExamplesRegular Expression String Match
[a-z](12){3}[c-e]{3} a121212cde Yes
a.*e+ abc12cde Yes
a.*f abc12cde No
^a.*e$ abc12cde Yes
^b*e$ abc12cde No
^a*e$ abc12cde No
\^.*\$ ^ab12cd$ Yes
^.*$ ^ab12cd$ Yes
^*$ ^ab12cd$ No
Note: To use ( ) { } or + grep use the –E (extended) switch or precede with \
More grep ExamplesContents of a file called homeworkMath: problems 12-10 to 12-33, due MondayBasketWeaving: make a 6-inch basket, DONEPsychology: essay on Animal Existentialism, due end of termSurfing:catch at least 10
grep commands >grep –v DONE homework displays all but line 2>grep –c DONE homework displays 1>grep –wi ".*a.*" on homework displays all lines>grep –w "m.*e" homework displays line 2>grep –i "d.*e" homework displays lines 1, 2 and 3>grep '\(Ma\|DO\).*' homework displays lines 1 and 2
Note: the last example escapes the parentheses and the vertical bar
Sorting Data• Background
– Each line in a file is a record– Each line is a series of fields separated by spaces and/or tabs
• Commands>sort fileName sorts fileName on the 1st field of each line>sort -k 6 fileName sorts on the 6th field of each line>sort –n –k 5 fileName sort on the 5th field numerically>sort –t sort –k4r –k3 abc fileName sort descending on the 4th field,
and then ascending on the 3rd with ':' as a delimeter>sort –t ':' fileName sort using ':' as a separator character>sort –u –k2r fileName sort reverse on the 2nd field and remove
duplicates (output must be unique)>sort –k 3,4 in a pipe sorts by the key, from field 3 through field 4>sort –k5n –k8 sorts numeric by the 5th field and alphabetic by the 8th
SED (Stream Editor
• SED is a filter– Input from stdin or a file– Output to stdout or a file– Modifies the input to produce the output– Non-interactive
• Processing– Read from an input stream– Perform line oriented commands– Write to an output stream
• Syntax: >sed [-i] command | [-e command] … [file]
Search and Replace
• Search, change and redirect to newFile>sed ‘s/cat/dog/g' file > newFile
• Search, change, and edit file>sed –i ‘s/cat/dog/g' file
• Specific range of lines: >sed '5,10s/cat/dog/g' file
• Lines apply search to lines containing OK: >sed '/OK/s/cat/dog/g' names
• Lines apply to lines having 2 numeric characters>sed '/[0-9]\{2\}/s/cat/dog/g' names
• Delete range of lines: >sed '5,10d' file
Note: single quotes suppress the shell's interpretation of special characters
Note: This syntax works in vi, more, awk
Note: You must escape the characters: +, { and } for it to work
Complex Commands sed –i \
-e 's/mon/Monday/g' \
-e 's/tue/Tuesday/g' \
-e 's/wed/Wednesday/g' \
-e 's/thu/Thursday/g' \
-e 's/fri/Friday/g' \
-e 's/sat/Saturday/g' \
-e 's/sun/Sunday/g' \
calendar
• The backslash is a continuation character
• The –e specifies another command (extension)
• The '/g/ means change every occurrence on each line, not just the first
AWK
• AWK (Aho, Weinberger, Kernigham)• Special purpose programming language
– Interpretive– Useful for UNIX Scripts
• Purposes– Filter text files based on supplied patterns– Produce reports– Callable from "vi"– Create simple databases– Simple mathematical operations– Creating scripts
• Not good for large complicated tasks• Other interpretive languages: perl, php
General Syntax
• The single quote causes the shell to ignore special characters
• The various clauses are optional
• Much of the syntax for <action> clauses is c and Java compatible
• The patterns utilize regular expressions
BEGIN {<initialization>}
<pattern> {<action>}
<pattern> {<action>}
•
•
•
<pattern> {<action>}
END {<final actions>}
>awk '<awk program>'
AWK General Operation
• Each file consists of a series of records• Each record is a series of fields• Defaults
– Record separator: new line character– Field separator: white space characters
• Flow of Operation– Read the input file line by line– If it matches the line, then process– Otherwise skip
Some AWK Simple Examples1. Print fields of records in a file
>awk ' {print $5, $6, $7, $8} ' fileName2. Print lines with a search string
>awk '/gold/ {print}' fileName3. Print the number of records
>awk 'END {print NR, "records"}' fileName4. Print records using a condition
>awk '{if ($3 < 1980) print $3}' fileNameor >awk ‘$2 > max {println $2}’ fileName
5. Comparing field to regular expression>awk ‘$2 ~ /[0-9]+/ {print $2}’ fileName
6. Using variables>awk '/gold/{sum += $2} END {print "value = " sum}‘ \
fileName
A Longer AWK command
awk –F ';' \'BEGIN \{num_gold=0; wt_gold=0; } \\ /[Gg]old/ { num_gold++; wt_gold += $2; } \\END \{ printf("\n Gold Pieces: %2d %5.2f\n“, \ num_gold, wt_gold); \}' \goldFile
Gold 3.5
Silver 2.25
Bronze 5.31
Gold 23.22
gold 0.22
goldFile
OutputGold Pieces: 3 26.94
Note: The backslashes are continuation lines
Semi colons delimit the fields in the file
Execute Program in a file
# awk program summarizing a coin collectionBEGIN {num_gold=0; wt_gold=0; } /gold/ {num_gold++; wt_gold += $2}; END { val_gold = 485 * wt_gold;printf("\n Gold Pieces: %2d", num_gold);
printf("\n Gold Weight: %5.2f", wt_gold); printf("\n Gold Value: %7.2f\n", val_gold);}
awk –F ';' –f <program> <fileName>
Output Gold Pieces: 3 Gold Weight: 26.94 Gold Value: 13065.90
Invoking AWK>awk [-F<ch>] [<program>] [-f <programFile>]
[<vars>] [- | <datafile>]
• <ch> is a field separator (default: space, tab)• <program> an AWK program• <programFile> a file containing an AWK
program• <vars> a series of variables to initialize
>awk –f program f1=file2 f2=file1 > output• - means accept AWK input from STDIN• <dataFile> a file containing data to process
Note: AWK is often invoked repeatedly in shell scripts
Search Patterns
• An exact string: /The/• A string starting a line: /^The/• A string ending a line: /The$/• A String ignoring case of first letter: /[Tt]he• Decimal: /[0-9]*.[0-9]*/• Alphanumeric: /[a-zA-Z0-9]*/• Choice between two strings: /(da|De).*/• Numeric: /[+-]?[0-9]+/• Any Boolean expression: $4>90 or $4>$5
Note: Some utilities require \(, \) and \| if you use ()| regular expression characters
Built in Variables
• NR: Total number of records• NF: Total number of fields• FILENAME: The current input file• FS: Field separator character• RS: Record separator character• OFS: Output field separator character• ORS: Output record separator character• OFMT: The default printf output format
Arrays and control structures
• Indexed and associative arrays– By index: months[3] = "March";– Associative: debts["Kim"] = 1000;– Note: arrays index from one, not zero
• Counter Controled: for (i=1, i<100; i++) data[i] = i;• Iterator: for (i in myArray) print i, names[i];• Pre test: i=0; while (i<20) data[i] = i++;• Condition:
if (i==1) print debts["Kim"] else print debts["Joe"]; print (i==1)? debts["Kim"] : debts["Joe"];
• Unconditional control statements– break: jump out of a loop– continue: next iteration– next: get next line of input– exit: exit the AWK program
Built-in functions
• Square root: print sqrt(3.6)
• Integer portion: print int(3.2)
• Substring: print substr("abcde", 3,2);
• Split: letters = split("a;b;c;d;e", ";");• Position: print index("gorbachev", "bach");
Note: if a substring doesn't exist, 0 returnedNote: Strings index from one, not zero
printf• printf(<template>, <arguments>);
– printf applies the template to the arguments– Formats are specified in the templates
%d for integer output%o for octal%x for hexadecimal%s for string%e for exponential format%f for floating point format
– Greater control%5.2f means 5 spaces wide, print two digits%-8.4s means left justify, 8 wide, print 4 characters%08s means output leading zeroes, print 8 characters
Escape Characters
• New line: \n
• Carriage return: \r
• Backspace: \b
• Horizontal tab: \t
• Form feed: \f
• A quote: \"
• A backslash: \\
AWK redirection and pipes
• Create a file with the first field>awk '{print $1 >> "file" }’
• Pipe output to another utility>ls –l | awk '{print $8}' | tr '[a-z]' '[ A-Z]'
Pipe to a utility to translate from lower to upper case
• Sort the grades file and print the first field>sort +4n grades | awk '{print $1}'
• list .txt files < 2000 bytes, print sorted descending>ls –l | grep '\.txt$' | awk '$5 < 2000 {print $9, $5}' | sort –nr +1
More Examples
• Print Bush's grades>awk '/Bush/{print $3, $4}' grades
• Print first name, last name, and quiz 3 grade for everyone who got more than a 90 on quiz 1 and 2>awk '{if ($4>90 && $5>90) print $3, $2, $6}' grades>awk '$4>90 && $5>90 {print $3, $2, $6}'
• Print username for user with userid 502>awk –F: '{if ($3==502) print $1}'>awk –F: '$3==502 {print $1}'