Main index

Introducing UNIX and Linux


Overview
Using filters
      Collating sequence
      Character classes
Character-to-character transformation
Selecting lines by content
      Regular expressions
      Basic regular expressions
      Extended regular expressions
      Grep
Stream editor
      Sed addresses
Splitting a file according to context
Choosing between the three filters
More on Vi
Summary
Exercises

Stream editor

Whereas Grep selects lines from input and copies those lines to standard output, Sed will in addition change those lines if required. Just as with Grep, Sed takes a script either as an argument, or from a file by using option -f, and filters its input according to the instructions in that script. For Grep, the script consists simply of one or more BREs, and the output is formed of those lines of input matching one or more of those BREs. For Sed, the behaviour is more complex. Each Sed instruction is of the form

address command arguments

where address and/or arguments are optional. The address indicates which lines of the input command is to be performed on.

Actually, we need to be slightly more precise than this. Each time a line of input is read, it is first of all stored in an area called the pattern space. The instructions forming the script are then examined one-by-one, in order, and each instruction whose address matches the address of the input line has its command applied to whatever is currently in the pattern space. When all the instructions in the script have been examined, the contents of the pattern space are copied to the standard output, and the pattern space emptied ready for the next input line. This is repeated for the next line of input until the input is exhausted.

The simplest Sed script is the script containing nothing; since there are no instructions, an input line is copied to the pattern space which is then immediately copied to the standard output. Try it:
sed ''
Addresses for Sed come in several forms, as itemised in the section on Sed addresses.

Both ^ and $ can be used. To try some of these addresses, the easiest command we can use is probably d to delete the contents of the pattern space. So

sed '1,4d'

deletes lines 1 to 4 inclusive from the input. Try the following using the standard input:

sed 'd' deletes all the input
sed '3d' deletes line 3 only
sed '2,$d' deletes all lines except the first
sed '/^A/d' deletes all lines commencing A

An often used command is s, which is used to exchange part of a line (specified by a BRE) with another string. This command is used as follows:

s/BRE/replacement/

The pattern space is searched from left to right to find an instance of the BRE. If none is found, no change is made, otherwise the string that matches the BRE is replaced by the replacement. Normally only the first occurrence of the BRE is altered, but if you follow the command with g ('global') then all matches for the BRE in the pattern space are changed. Note that after the change, the altered string stays in the pattern space and can then be changed by later Sed commands in the same script. So, for example,

sed 's/Chris/Sam/g'

changes all occurrences of Chris to Sam,

sed 's/^= /?/'

changes each equals symbol at the start of a line to a question-mark, and

sed 's/[:punct:]//g'

removes all punctuation (equivalent to tr -d "[:punct:]").

Worked example

Write a Sed command to remove all whitespace that terminates lines.
Solution: The BRE [:blank:] matches a single whitespace character, [:blank:]* matches any number of them, and [:blank:]*$ when they occur at the end of a line. To delete them we replace them by nothing.

sed 's/[:blank:]*$//'

Although it is most common for simple Sed commands to be applied to all lines of the input, you should also be familiar with being able to specify addresses of lines. Sometimes an editing problem can be solved either by a complex edit on every line of input or by a simple edit on only some of the input lines - the latter approach is preferable.

Worked example

Write a filter to precede each word in /usr/dict/words containing a capital letter by an asterisk.
Solution: Using Sed we can match lines containing such words by the BRE [A-Z]. Using this BRE to specify addresses, on those lines we can use s to substitute the start of each line (^) by a *:

sed '/[A-Z]/s/^/*/'

It is usual for Sed to be an element of a pipeline but, unlike tr, Sed can take a filename as argument, in which case the input will come from that file. So another solution would be:

sed '/[A-Z]/s/^/*/' /usr/dict/words

If an ampersand (&) is met as part of the replacement string, it is replaced by the string that has been matched; the following will enclose each capital letter in the input by square brackets:

sed 's/[A-Z]/[&]/g'

If you want an actual ampersand to occur in the replacement string, it must be escaped by preceding it with a backslash.

If you give sed option -n ('noprint') then the pattern space will not automatically be sent to standard output; so sed -n '' will not give any output at all. We can use command p ('print') to copy the pattern space explicitly to standard output; so the following two commands are equivalent:

sed ''
sed -n 'p'

See what happens if you have just:

sed 'p'

We can use p to good effect if we wish to select only part of the input, so

sed '15p'

will display line 15 of the input only, and

sed '1,10s/[:alpha:]//g'

will display the first ten lines only, with all letters deleted. By using option -n, we can simulate simple use of Grep using Sed, since the following are equivalent:

grep 'BRE'
sed -n '/BRE/p'

Worked example

Write a filter to display the last line of the input prepended by The last line is.
Solution: Use $ to match the last line of input, option -n of sed to ignore other lines in the input, and command p to print it out after substituting The last line is for the beginning of the line:

sed -n '$s/^/The last line is /p'


Copyright © 2002 Mike Joy, Stephen Jarvis and Michael Luck