sed regular expression

Post on 15-Apr-2016

147 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

sed

TRANSCRIPT

Contents

SED Regular expression

Contents

To execute sed from file

Sed regular expression

Contents

To execute sed from file Sed regular expression

Using the file u3, do the following using sed, displaying the result on the screen

1.Output only the lines that contain cow Answer: sed -n '/cow/p' u3

2. Delete any line that contains cow Answer: sed '/cow/d' u3

3. Change the first instance of * on each line to ! Answer: sed 's/*/!/' u3

4. Change all occurrences of * on each line to ! Answer: sed 's/*/!/g' u3

5. Output only the lines that contain either cow or calf Answer: sed -n -e '/cow/p' -e '/calf/p' u3

6. Output the file after changing cow to COW on lines 10-20 Answer: sed '10,20s/cow/COW/g' u3

7. Output the entire file except lines 1-20 Answer: sed '1,20d' u3

8. Delete any lines containing the string "news“ Answer: $ sed '/news/d'

8. line 1 (one) line 2 (two) line 3 (three)

Command: sed -e '1,2s/line/LINE/' file

Output: LINE 1 (one)LINE 2 (two)line 3 (three)

9. Command: sed -e '1,2d' file

Output: line 3 (three)

10. Command: sed -e '3d' file

Output: line 1 (one) line 2 (two)

11. Write a script to insert

12. Write a script to change

Sed from a file

If your sed script is getting long, you can put it into a file, like so:

# This file is named "sample.sed“ # comments can only appear in a block at the beginning s/color/colour/gs/flavor/flavour/gs/theater/theatre/g

Then call sed with the "-f" flag:

sed -f sample.sed filename

Or, you can make an executable sed script:

#!/usr/bin/sed -f # This file is named "sample2.sed" s/color/colour/g s/flavor/flavour/g s/theater/theatre/g

then give it execute permissions: chmod u+x sample2.sed

and then call it like so: ./sample2.sed filename

Note that you have to escape with backslashes the many characters:

curlies \{ \} ,

round brackets \( \),

star \*,

plus \+,

question mark \?

Special characters Usage

^ Matches the beginning of the line

$ Matches the end of the line

. Matches any single character

\* Matches zero or more occurrence of the character

\+ Matches one or more occurrence

\? Matches zero or one instance of the character

[ ] Matches any character enclosed in [ ]

[^ ] Matches any character not enclosed in [ ]

(character)\{m,n\} Match m-n repetitions of (character)

(character)\{m,\} Match m or more repetitions of (character)

(character)\{,n\} Match n or less (possibly 0) repetitions of (character)

(character)\{n\} Match exactly n repetitions of (character)

\(expression\) Group operator. Also memorizes into numbered variables - use for backreference as \1 \2 .. \9

\n Backreference - matches nth group

&

Regular expression in sed

Regular Expressions (character classes)

The following character classes are short-hand for matching special characters.

[:alnum:] Printable characters (includes white space)

[:alpha:] Alphabetic characters

[:blank:] Space and tab characters

[:cntrl:] Control characters

[:digit:]Numeric characters

[:graph:] Printable and visible (non-space) characters

[:lower:] Lowercase characters

[:print:] Alphanumeric characters

[:punct:] Punctuation characters

[:space:] Whitespace characters

[:upper:] Uppercase characters

[:xdigit:] Hexadecimal digits

The '^' character means the beginning of the line. Example: sed 's/^Thu /Thursday/' filename

will turn "Thu " into "Thursday", but only at the beginning of the line.

Example: sed -e '/^#/d’

Example: /[Uu]nix/!d deletes lines that do not contain the word unix.

6d deletes line 6

/^$/d deletes all blank lines

1,10d deletes lines 1 through 10

1,/^$/d deletes from line 1 through the first blank line

/^$/,/$/d deletes from the first blank line through the last line of the file

/^$/,10d deletes from the first blank line through line 10

/^Co*t/,/[0-9]$/d deletes from the first line that begins with Cot, Coot, Cooot, etc through the first line

that ends with a digit

`[a-zA-Z0-9]'

This matches any letters or digits.

`[^a-z A-Z] ' This matches any letters .

Repetition using *

means 0 or more of the previous single character pattern.

[abc]* matches "aaaaa" or "acbca“

Hi Dave.* matches "Hi Dave" or "Hi Daveisgoofy“

0*10 matches "010" or "0000010" or "10"

Repetition using +

+ means 1 or more of the previous single character pattern.

[abc]+ matches "aaaaa" or "acbca“

Hi Dave.+ matches "Hi Dave." or "Hi Dave….“

0+10 matches "010" or "0000010" does not match "10"

a\+b\+ matches one or more `a's followed by one or more `b's: `ab' is the shorter possible match, but other

examples are `aaaab' or `abbbbb' or `aaaaaabbbbbbb'.

? Repetition Operator

? means 0 or 1 of the previous single character pattern.

x[abc]?x matches "xax" or "xx"

A[0-9]?B matches "A1B" or "AB" does not match "a1b" or "A123B"

`.\{9\}A$'

This matches an A that is the last character on line, with at least nine preceding characters.

`^.\{15\}A‘

This matches an A that is the 16th character on a line.

sed G myfile.txt > newfile.txt

In the above example using the sed command with G would double space the file myfile.txt and output the results to the newfile.txt.

sed = myfile.txt | sed 'N;s/\n/\. /‘

The above example will use the sed command to output each of the lines in myfile.txt with the line number followed by a period and a space before each line. As done with the first example the output could be redirected to another file using > and the file name.

sed 's/test/example/g' myfile.txt > newfile.txt

Opens the file myfile.txt and searches for the word "test" and replaces every occurrence with the word "example".

sed -n '$=' myfile.txt

Above this command count the number of lines in the myfile.txt and output the results.

Regular Expressions (cont…)/^M.*/

/..*/

/^$/

ab|cd

a(b*|c*)d 

[[:space:][:alnum:]] 

Line begins with capital M, 0 or more chars follow

At least 1 character long (/.+/ means the same thing)

The empty line

Either ‘ab’ or ‘cd’

matches any string beginning with a letter a, followed by either zeroor more of the letter b, or zero or more of the letter c, followed by the letter d.

Matches any character that is either a white space character or

alphanumeric.

Note:

Sed always tries to find the longest matching pattern in the input. How would you match a tag in an HTML document?

Grouping with parens

• If you put a subpattern inside parens you can use + * and ? to the entire subpattern.

a(bc)*d matches "ad" and "abcbcd" does not match "abcxd" or "bcbcd"

9. append three exclamation points to the end of each line in u3 that contains student10.repeat the previous command, but only output the lines that you change.11.If you wanted to actually change the original file for questions #3,4,6,7, and 9, how would youdo it? 9. sed '/student/s/$/!!!/' u310.sed -n '/student/s/$/!!!/p' u311.Save the output of the sed command in a temporary file and then use the mv command to rename itto the original. Never redirect output to the same file you are using for input within the same commandor pipeline! Example (#9):sed '/student/s/$/!!!/' u3 > xxx # <-- the shell overwrites xxx BEFORE it starts sedmv xxx u3

6. change all occurrences of cow to cows and cows using the parenthesis operators and \1 substitution Answer: sed 's/\(cow\)/\1s and \1s/' u3

Using the file u3, do the following using sed, displaying the result on the screen

1.Output only the lines that contain MCIS Answer: sed -n '/MCIS/p' u3

2. Delete any line that contains mcis Answer: sed '/mcis/d' u3

3. Change the first instance of * on each line to ! Answer: sed 's/*/!/' u3

4. Change all occurrences of * on each line to ! Answer: sed 's/*/!/g' u3

5. Output only the lines that contain either MCIS or VLSI

Answer: sed -n -e '/MCIS /p' -e '/VLSI /p' u3

6. Output the file after changing mcis to MCIS on lines 10-20

Answer: sed '10,20s/mcis/MCIS/g' u3

7. Output the entire file except lines 1-20

Answer: sed '1,20d' u3

8. Delete any lines containing the string "news“

Answer: $ sed '/news/d'

9 . line 1 (one) line 2 (two) line 3 (three)

Command: sed -e '1,2s/line/LINE/' file

Output: LINE 1 (one)LINE 2 (two)line 3 (three)

9. Command: sed -e '1,2d' file

Output: line 3 (three)

10. Command: sed -e '3d' file

Output: line 1 (one) line 2 (two)

11. Write a sed script that will take two words and a file name a input from the user.Let the inputs be word1, word2, and filename. Write - scripts to do the following

To insert the word2 at every place word1 is present in the file “u3”

Answer: #!/bin/shecho -n 'Enter the string to which the new string to be appended:'read string1echo -n 'Enter the string which is used to append:'read string2echo -n 'Enter the filename 'read filenamesed '/‘$string 1'/i\‘$string2'' $filename

Sed from a file

If your sed script is getting long, you can put it into a file, like so:

# This file is named "sample.sed“ # comments can only appear in a block at the beginning s/color/colour/gs/flavor/flavour/gs/theater/theatre/g

Then call sed with the "-f" flag:

sed -f sample.sed filename

Or, you can make an executable sed script:

#!/usr/bin/sed -f # This file is named "sample2.sed" s/color/colour/g s/flavor/flavour/g s/theater/theatre/g

then give it execute permissions: chmod u+x sample2.sed

and then call it like so: ./sample2.sed filename

Note that you have to escape with backslashes the many characters:

curlies \{ \} ,

round brackets \( \),

star \*,

plus \+,

question mark \?

Special characters Usage

^ Matches the beginning of the line

$ Matches the end of the line

. Matches any single character

\* Matches zero or more occurrence of the character

\+ Matches one or more occurrence

\? Matches zero or one instance of the character

[ ] Matches any character enclosed in [ ]

[^ ] Matches any character not enclosed in [ ]

(character)\{m,n\} Match m-n repetitions of (character)

(character)\{m,\} Match m or more repetitions of (character)

(character)\{,n\} Match n or less (possibly 0) repetitions of (character)

(character)\{n\} Match exactly n repetitions of (character)

\(expression\) Group operator. Also memorizes into numbered variables - use for backreference as \1 \2 .. \9

\n Backreference - matches nth group

Regular expression in sed

The '^' character means the beginning of the line.

Example:

sed 's/^Thu /Thursday/' filename

will turn "Thu " into "Thursday", but only at the beginning of the line.

Example:

sed -e '/^#/d’

Examples:

1,10d deletes lines 1 through 10/[Uu]nix/!d deletes lines that do not contain the word unix.

6d deletes line 6/^$/d deletes all blank lines1,/^$/d deletes from line 1 through the first blank line/^$/,/$/d deletes from the first blank line through the last

line of the file

/^$/,10d deletes from the first blank line through line 10

`[a-zA-Z0-9]'

This matches any letters or digits.

`[^a-z A-Z] '

This matches any letters .

Print only lines of 65 characters or longersed -n '/^.\{65\}/p‘

Print only lines of less than 65 characterssed -n '/^.\{65\}/!p' # method 1, corresponds to above

Print line number 52

sed -n '52p' # method 1

sed '52!d' # method 2

print section of file between two regular expressions

sed -n '/Iowa/,/Montana/p' # case sensitive

print all of file EXCEPT section between 2 regular expressions

sed '/Iowa/,/Montana/d'

The q or quit command

There is one more simple command that can restrict the changes to a set of lines. It is the "q“ command: quit.

the third way to duplicate the head command is:sed '11 q'

which quits when the eleventh line is reached.

This command is most useful when you wish to abort the editing after some condition is reached.

The "q" command is the one command that does not take a range of addresses.

Relationships between d, p, and !As you may have noticed, there are often several ways to solve the same problem with sed. This isbecause print and delete are opposite functions, and it appears that "!p" is similar to "d," while "!d" issimilar to "p." I wanted to test this, so I created a 20 line file, and tried every different combination. Thefollowing table, which shows the results, demonstrates the difference:Relations between d, p, and !Sed Range Command Results--------------------------------------------------------sed -n 1,10 p Print first 10 linessed -n 11,$ !p Print first 10 linessed 1,10 !d Print first 10 linessed 11,$ d Print first 10 lines

--------------------------------------------------------sed -n 1,10 !p Print last 10 linessed -n 11,$ p Print last 10 linessed 1,10 d Print last 10 linessed 11,$ !d Print last 10 lines--------------------------------------------------------

sed -n 1,10 d Nothing printedsed -n 1,10 !d Nothing printedsed -n 11,$ d Nothing printedsed -n 11,$ !d Nothing printed--------------------------------------------------------sed 1,10 p Print first 10 lines twice,Then next 10 lines oncesed 11,$ !p Print first 10 lines twice,Then last 10 lines once--------------------------------------------------------sed 1,10 !p Print first 10 lines once,Then last 10 lines twicesed 11,$ p Print first 10 lines once,then last 10 lines twice

Obviously the command

sed '1,10 q‘

cannot quit 10 times. Instead

sed '1 q'orsed '10 q‘

is correct.

1. Delete lines that contain "O" at the beginning of the line.

Answer: sed '/^O/d' list.txt

2. Translate capital C,R,O into small c,r,o

Answer: sed 'y/CRO/cro/' list.txt

3. Delete empty lines

Answer: sed '/^$/d' list.txt

4. Remove lines containing anything other than alphabets, numbers, or spaces

Answer: sed '/ ^[0-9a-zA-Z ]/d' list.txt

Specifying a Range of Characters with [...]

If you want to match specific characters,

you can use the square brackets to identify the exact characters you are searching for.

The pattern that will match any line of text that contains exactly one number is

^[0123456789]$

This is verbose.

You can use the hyphen between two characters to specify a range: ^[0-9]$

You can intermix explicit characters with character ranges.

This pattern will match a single character that is a letter, number, or underscore:

[A-Za-z0-9_]

If you wanted to search for a word that

Started with a capital letter "T." Was the first word on a line The second letter was a lower case letter And the third letter was a vowel

the regular expression would be "^T[a-z][aeiou] ."

Delete all lines NOT beginning with an 'a,e,E or I'"

Answer: sed '/^[^aeEI]/d' list.txt

You can easily search for all characters except those in square brackets by putting a "^" as the first character after the "[."

To match all characters except vowels use "[^aeiou]."

*

Repetition using *

means 0 or more of the previous single character pattern.

[abc]* matches "aaaaa" or "acbca“

Hi Dave.* matches "Hi Dave" or "Hi Daveisgoofy“

0*10 matches "010" or "0000010" or "10"

Lets looks at another example:/a*bc[e-g]*[0-9]*/ Matches:aaaaabcfgh19919234bcabcefg123456789abc45Aabcggg87310

d*avid Will match avid, david, ddavid dddavid and any other word with repeated ds followed by avid

Compress all consecutive sequences of zeroes into a single zero.

Answer: s/00*/0/g

Repetition using +

+ means 1 or more of the previous single character pattern.

[abc]+ matches "aaaaa" or "acbca“

Hi Dave.+ matches "Hi Dave." or "Hi Dave….“

0+10 matches "010" or "0000010" does not match "10"

a\+b\+ matches one or more `a's followed by one or more `b's: `ab' is the shorter possible match, but other

examples are `aaaab' or `abbbbb' or `aaaaaabbbbbbb'.

? Repetition Operator

? means 0 or 1 of the previous single character pattern.

x[abc]?x matches "xax" or "xx"

A[0-9]?B matches "A1B" or "AB" does not match "a1b" or "A123B"

`a\?b' Matches `b' or `ab'.

Match any character with .

The character "." is one of those special meta-characters.

By itself it will match any character, except the end-of-line character.

The pattern that will match a line with a single characters is ^.$• Any character (except a metacharacter!)matches itself.• The "." character matches any character except newline."F." Matches an 'F' followed by any character."a.b" Matches 'a' followed by any1 charfollowed by 'b'.

If you really want to match '.',

you can use "\."

a\.b a.b axb

Matching a specified number of the pattern using the curly brackets {}

Using {n}, we match exactly that number of the previous expression.

If we want to match 'aaaa' then we could use: a{4} This would match exactly four a's.

If we want to match the pattern 1999 in our file bazaar.txt, then we would do: sed '/19{3}/p' bazaar.txt This should print all lines containing the pattern 1999 in the bazaar.txt file.

The following expression would match a minimum of four a's but a maximum of 10 a's in a particular pattern: a\{4,10\} Let's say we wanted to match any character a minimum of 3 times, but a maximum of 7 times, then we could affect a regular expression like: .\{3,7\}

`\{I\}' As `*', but matches exactly I sequences (I is a decimal integer; for portability, keep it between 0 and 255 inclusive). `\{I,J\}' Matches between I and J, inclusive, sequences. `\{I,\}' Matches more than or equal to I sequences.

`.\{9\}A$‘

This matches nine characters followed by an `A'. `^.\{15\}A'

This matches the start of a string that contains 16characters, the last of which is an `A'.

`\(REGEXP\)‘

Groups the inner REGEXP as a whole, this is used to: * Apply postfix operators, like `\(abcd\)*': this will search for zero or more whole sequences of `abcd', while `abcd*' would search for `abc' followed by zero or more occurrences of `d'. Note that support for `\(abcd\)*' is required by POSIX 1003.1-2001, but many non-GNU implementations do not support it and hence it is not universally portable.

`REGEXP1\|REGEXP2'

Matches either REGEXP1 or REGEXP2.

Use parentheses to use complex alternative regular expressions.

The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used.

`N‘

Add a newline to the pattern space, then append the next line of input to the pattern space.

If there is no more input then SED exits without processing any more commands.

File spacing:

space a file

sed G file name

insert a blank line below every line which matches "regex“

sed '/regex/G'

count lines (emulates "wc -l")sed -n '$='

top related