Main index

Introducing UNIX and Linux


Overview
Using filters
      Collating sequence
      Character classes
Character-to-character transformation
Selecting lines by content
      Regular expressions
      Basic regular expressions
      Extended regular expressions
      Grep
Stream editor
      Sed addresses
Splitting a file according to context
Choosing between the three filters
More on Vi
Summary
Exercises

Character-to-character transformation

Translating a file so that specific characters are replaced by others can be accomplished with tr. This command takes as arguments two strings, which may consist of any number of individual characters, ranges and character classes. If both strings are the same length, instances of characters in the first string are replaced by the corresponding character in the second. The command tr can only be used as a pipe - it cannot take a filename as an argument For example, to capitalise all the lower-case letters in the input we would have:

tr "a-z" "A-Z"

or alternatively

tr "[:lower:]" "[:upper:]"

Try this out using just standard input and standard output. To capitalise all the words in /usr/dict/words you would have:

tr "[:lower:]" "[:upper:]" < /usr/dict/words

Worked example

Write a filter to replace all digits by blank spaces.
Solution: Use [:digit:] to represent digits as first argument to tr.

tr "[:digit:]" "         "

The second argument to tr must not be shorter than the first. If the second argument is longer than the first, the excess characters in the second argument disregarded, so that in the pipe

tr "a-z" "A-Z123"

the characters 1, 2 and 3 are unaffected. The two arguments to tr are strings; as usual, if the strings contain whitespace they must be quoted, and the standard conventions for quoted strings are used. So for a filter to replace all blanks in the input with a B, you could have:

tr ' ' 'B'

Remember that between double quotes the characters $, * and @ have special meanings and that certain characters must be escaped. If neither string argument to tr includes characters requiring quoting, then the quotes are not needed. The following three filters are equivalent:

tr a-z A-Z
tr "a-z" "A-Z"
tr 'a-z' 'A-Z'

Although the strings tr is given as arguments do not always require quoting, when the strings contain no characters that are interpreted by the shell in an undesired fashion, it may be helpful to quote them anyway, and from now on we will always quote strings. This has two benefits - firstly, it reminds you to be careful that some characters may need to be escaped in the strings, and secondly it may make it easier to see where the two strings start and finish.

Worked example

Write a filter to replace all double quotes by single quotes.
Solution: The tricky part of this example is to specify the strings correctly. The first string is a double quote, but in order for it not to be interpreted by the shell, it must either be preceded by a \ or enclosed by single quotes. The second must also either be escaped with a \ or enclosed in double quotes. Either of the following two filters will solve the problem.

tr '"' "'"
tr \" \'

We can specify a string comprising a number of instances of a single character: "[X*5]" is the same as "XXXXX". The notation "[X*]" yields a string containing sufficient numbers of the character X so that if used as a component of the second string, the second string is long enough to match the first one. For instance, to replace all digits with a question mark, you could use either of the following:

tr "0-9" "[?*10]"
tr "0-9" "[?*]"

Worked example

Write a filter to replace all letters in the first half of the alphabet by A and all in the second half by Z.
Solution: Use tr, and note that there are 13 letters in the first half of the alphabet, each having an upper-case and a lower-case character. Thus the first half of the alphabet is represented by a set of 26 characters.

tr "A-Ma-mN-Zn-z" "[A*26][Z*26]"

There are also options available to tr; with option -d ('delete') and only one string as argument, all occurrences of characters specified by that string are deleted. With option -c ('complement') as well as -d all characters not occurring within the string are deleted.

Worked example

Write a filter to delete all non-letter characters from the input.
Solution: Use tr with option -c to specify all non-alphabetic characters, and -d to delete them.

tr -cd "A-Za-z"

Alternatively, use character classes:

tr -cd "[:alpha:]"

After all other changes have been performed, repeated instances of a character specified in the final string argument can be replaced by single instances of the same character using option -s ('squash'). In this case, the string passed to tr represents those characters on which this operation is performed. So to replace multiple spaces by single ones:

echo "hello   there    Chris" | tr -s " "
hello there Chris


Copyright © 2002 Mike Joy, Stephen Jarvis and Michael Luck