Main index

Introducing UNIX and Linux


Awk

Overview
What is 'awk'?
Invoking 'awk'
Naming the fields
Formatted output
      Operators used by Awk
Patterns
Variables
      Accessing Values
      Special variables
Arguments to 'awk' scripts
Arrays
Field and record separators
Functions
      List of Awk functions
Summary
Exercises

Formatted output

You will recall the use of printf as a shell utility for displaying information on standard output in a format you specify. For instance

printf "Hello %s!\n" $LOGNAME

will print on your screen

Hello chris!

printf never displays a Newline unless explicitly instructed to do so. The shell utility printf takes a number of arguments: the first is a string specifying the format of the output, the second and subsequent arguments are data (such as values of variables) to be displayed according to the specification given by the format string.

In Awk there is also a command called printf, which is almost identical to that used by the shell. The only major difference is that the arguments are separated by commas, not by whitespace. Try this script:

{ printf "The first field is %s\n", $1 }

Worked example

Write an Awk script which, when given input in two columns representing a person's first name followed by their family name, such as

Abraham Lincoln
John Kennedy
Benjamin Disraeli

will reverse the order of the names, and separate them with a comma:

Lincoln, Abraham
Kennedy, John
Disraeli, Benjamin

Solution: Using $1 and $2 to represent the first name and the family name of each person, display them using printf thus:

{ printf "%s, %s\n", $2, $1 }

Alternatively, using print, this would be

{ print $2 ", " $1 }

Before we can experiment much further with Awk, we need some data. Consider the problem of a grocery bill - you have purchased some vegetables that are priced per kilogram, and you buy a number of kilograms of various different vegetables. Create a file containing in column 1 the names of vegetables, in column 2 the price per kilogram, and in column 3 the number of kilograms purchased, something like:

potatoes 0.50 5
carrots 0.80 2.5
peas 2.20 1
beans 2.10 2
artichokes 8.50 0.5
sweetcorn 0.90 3

Name this file vegetables. We shall use this file, and Awk, to perform tasks such as totalling the cost for each vegetable, and evaluating the total bill.

Recall that when using printf to format an integer you use the format specifier %d; for a floating-point number the specifier is %f. You can also require a floating-point number to be displayed to a specific accuracy - if you include between the % symbol and the f a dot followed by a number, the floating-point number will be displayed with that number of digits after the decimal place. So we could copy the file vegetables using

{ printf "%s %.2f %.1f\n", $1, $2, $3 }

Try this, using

awk '{ printf "%s %.2f %.1f\n", $1, $2, $3 }' < vegetables

Note what happens when you have a whole number as one of the last two columns - it is printed with the relevant number of decimal places containing zeros:

potatoes 0.50 5.0
carrots 0.80 2.5
peas 2.20 1.0
beans 2.10 2.0
artichokes 8.50 0.5
sweetcorn 0.90 3.0

Simple arithmetic can be performed by Awk, with the same operators and conventions as bc, and are listed in To evaluate the number of seconds in a day and print it out, the following would suffice:

awk '{ print 24*60*60 }'

Try it - but remember that this will be done for each line of input, so if you pipe the contents of a file to this command, the output will have the same number of lines as the input, each line being the number 86400. If you just wish to do an arithmetic calculation, use bc.

Worked example

Write an Awk script to reformat the data in vegetables as following:

I bought 5.0 kilos of potatoes at 50p per kilo
I bought 2.5 kilos of carrots at 80p per kilo
 ...

Solution: Use printf with the %f specifier to display the 'number of kilos' field to one decimal place accuracy, and calculating the number of pence per kilo as 100 times the price in pounds. Since the pence per kilo is an integer, use the %d format specifier.

{ printf "I bought %.1f kilos of %s at %dp per kilo\n", $3, $1, 100*$2 }

If you wish to do floating-point arithmetic in Awk, and your script contains some whole numbers, then Awk will automatically convert those integers to floating-point numbers when it is sensible to do so. Thus 1/2 will evaluate to 0.5. Similarly, if Awk is expecting a field to be a string, and receives a number as input instead, that number will be treated as a string of digits (together with decimal point or minus sign, if appropriate).

Worked example

Write an Awk script which uses the data in vegetables to calculate the total amount of money spent on each vegetable, displaying it in the following format:

potatoes cost 2.50
carrots cost 2.00
 ...

Solution: We can calculate the total cost for each vegetable by multiplying the second and third fields together.

{ printf "%s cost %.2f\n", $1, $2*$3 }

Earlier on, we used cut to extract fields from lines of input. You may find it easier to use Awk in some instances.

Worked example

Display the current year.
Solution: We could use date and pipe the output to cut, as before, or we could use a format argument with date. Another method is to pipe the output of date to awk, using awk to print out the sixth field.

date | awk '{ print $6 }'}

For comparison, the other two methods would be written:

date | cut -d' ' -f6
date +"%Y"

It is up to you to decide which one you think is clearest.


Copyright © 2002 Mike Joy, Stephen Jarvis and Michael Luck