![Page 1: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/1.jpg)
GREP
![Page 2: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/2.jpg)
Whats Grep?
Grep is a popular unix program that supports a special programming language for doing regular expressions
The grammar in use for software doing regular expressions are based on grep; perl extends it further.
![Page 3: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/3.jpg)
ANY
Regular ExpressionSearch String
Compiles
Engine parses your search string
produces a state machine
![Page 4: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/4.jpg)
FALSEFALSE
Searches
Input sent into State Machine
Conceptually, 1 shape/letter at a time
![Page 5: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/5.jpg)
TRUETRUE
Found:The State Machine Object changes state (in this example it is set to true)
User checks machine state when it completes running
![Page 6: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/6.jpg)
Grep Expressions
The “grep” language for doing Regular Expressions on text processing
Grep pattern is another name
called “Regular Expressions”
![Page 7: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/7.jpg)
Grep Expressions
A string of text to match with special characters
“john.*”
would return True on a search of:“john was here”
![Page 8: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/8.jpg)
Grep Expressions“.*\.txt”
.* is anything (.) any length (*)
\. is literally a . (the \ before it means the next character is literal; that is not special)
txt is just letter matching
This would filter out txt files
Its similar to what you see in windows, but its not the same--its more powerful than simple “wildcards” (*) you often see.
![Page 9: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/9.jpg)
Special Chars
. = any single character
^ = beginning of a line
$ = end of line
\w = word & number characters
\d = decimals (numbers)
![Page 10: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/10.jpg)
\ = escape char
Backslash \ (leans to the left)
most popular escape character
Uses:
sneak past Illegal characters
make secret code characters
Data encoding always has them
![Page 11: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/11.jpg)
Examples
… = three of ANYTHING
\d\d\d = three numbers (decimals)
remember the \ is the escape code
\w\w\w = three letters (no symbols)
good: abc
bad: a34, ab!
![Page 12: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/12.jpg)
Approach
searching for “john” or “joan”
What is the difference between them?
jo_n
what symbol works?
jo\wn
jo.n
![Page 13: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/13.jpg)
Special Chars
\D = non numbers
\W = non-word characters
\s = white space
\S = non white space
\n = new line (return/enter key)
\t = tab
![Page 14: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/14.jpg)
\s\s\s = three whitespaces
tabs, space, possibly newlines
\D\s\W = non-decimal, space, non-word
Examples:
x 4, ! !, = 4, A <tab> 5
![Page 15: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/15.jpg)
Quantity Chars* = 0 or more
? = 0 or 1
+ = 1 or more
[] = any of the chars in the [abc]
[^] = NOT any of the chars in []
[a-zA-Z] = ranges of chars
![Page 16: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/16.jpg)
Examples
X+ = 1 or more X
XXX
[XYZ] = any of these 1 chars
X, Y, Z
[XYZxyz]+ = 1+ of any of these
y, XYz, zYZZyX, ZZzzzzz
![Page 17: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/17.jpg)
EXAMPLES
[a-zA-Z0-9] = any word or number but no spaces
\.?$ = maybe ends with a .
remember: $ is end of line
.* = 0 to ∞ of any letter
[^abc]* = 0 to ∞ anything but lowercase a,b, or c
![Page 18: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/18.jpg)
Problems
UniCode vs ASCII
Reg.Exp. language is older than UniCode
Many new Engines support UniCode
Minor Extensions to the language will be required for full UniCode support
![Page 19: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/19.jpg)
Options
RegExp Engines typically have options
ignoreCase
saves you from doing [Aa] for each
global
repeats if a match was found until the end of the input; by default: it stops at the 1st match (useful for replace)
![Page 20: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/20.jpg)
Options
multiline
Most breakup the input into lines:
At end of line, it resets for next line
This would make it ignore line endings (unless you use ^ or $ which refer to the beginning and end of lines)
![Page 21: GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software](https://reader036.vdocuments.net/reader036/viewer/2022081506/56649f325503460f94c4d7f5/html5/thumbnails/21.jpg)
/Common Use/
/string/ similar to “quotes” on strings
if you use “string” you must escape:
/\d\d/ (match 2 digit pattern)
vs
“\\d\\d” (match 2 digit string)