Main index

Introducing UNIX and Linux


Awk

Overview
What is 'awk'?
Invoking 'awk'
Naming the fields
Formatted output
      Operators used by Awk
Patterns
Variables
      Accessing Values
      Special variables
Arguments to 'awk' scripts
Arrays
Field and record separators
Functions
      List of Awk functions
Summary
Exercises

Arrays

Most high-level languages include arrays. An array (or associative array) is a collection of variables with a name, and each variable in that array has an index. An array index can be a string or a number. For example, we might have an array called daysin consisting of 12 variables, indexed by the names of the months of the year. These 12 variables would have names daysin["January"], daysin["February"], and so on up to daysin["December"].

Worked example

Write an Awk script to read as input a sequence of lines, each containing the name of a month. Output should be the name of the month read in followed by the number of days in it. For instance, for input

March
November

we would have as output

March has 31 days
November has 30 days

Solution: Use an array indexed by the names of the months, so that each array element has as its value the number of days in the month that is its index. At the start of the script, the array must be initialised.

BEGIN {   # Initialise the array daysin
   daysin["January"] = 31;   daysin["February"] = 28
   daysin["March"] = 31;     daysin["April"] = 30
   daysin["May"] = 31;       daysin["June"] = 30
   daysin["July"] = 31;      daysin["August"] = 31
   daysin["September"] = 30; daysin["October"] = 31
   daysin["November"] = 30;  daysin["December"] = 31
   }
# For each input line, output month name and no. of days
   { printf "%s has %d days\n", $1, daysin[$1] }

Note that we can place multiple Awk commands on a single line by separating them with semicolons. Try this example. If you enter a month name that is incorrectly spelled, Awk will see that the element of the array with that index has not been assigned a value, and will assume it is therefore 0.

Returning to our shopping expedition, we may wish to store the data on each vegetable to be used later on. For example, if we purchased several bags of potatoes at different shops, we would need to enter several lines starting with potatoes. The scripts we have written already will not be able to total the costs for potatoes, they will just total the cost of each item on each line of input; that is, for each separate purchase. What we could do is to have an array costs indexed by the names of the vegetables, which we can update each time a new line of data is read in:

{ costs[$1] += $2*$3 }

The symbol += indicates that the variable on the left of the symbol has its value updated by adding to it the number on the right of the symbol. At the start of the script, we would not initialise costs, since we do not at that point know the names of the vegetables to be mentioned in the input. When the first line of vegetables is read in, which is

potatoes 0.50 5

the following action is performed:

costs["potatoes"] += 0.50*5

The value of costs["potatoes"] starts off at 0, since it begins uninitialised, and its value is increased by 2.50.

Just as in the shell, Awk contains for loops. In fact, Awk allows several types of for loop. One of these allows you to loop through arrays and pick out those indices that have been used. The for statement looks like:

for (variable in  array) statement

So, we could examine the values of the elements of costs for all indices by using

for (veg in costs) printf "%s costs %.2f\n",
            veg, costs[veg]

A complete Awk script for totalling the costs for all vegetables would then be

{ costs[$1] += $2*$3 }
END  { for (veg in costs)
             printf "%s costs %.2f\n", veg, costs[veg] }

Worked example

Calculate the average cost per kilo for each vegetable.
Solution: The total cost and the total weight for each vegetable must be calculated.

# Use arrays costs and weights to store the total costs
#   and total weight for each vegetable.
{ costs[$1] += $2*$3; weights[$1] += $3 }

# At the end, for each vegetable, divide its total costs
#   by the total weight, and output the value
END { for (veg in costs)
        printf "%s: %.2f pence per kilo\n",
           veg, costs[veg]/weights[veg] }

There is a special array ENVIRON which contains all the (exported) shell environment variables. To display the value of your PATH, the following Awk statement could be used:

printf "%s\n", ENVIRON["PATH"];

Copyright © 2002 Mike Joy, Stephen Jarvis and Michael Luck