Main index

Introducing UNIX and Linux


Files

Overview
The UNIX directory hierarchy
Filesystems
Manipulating files
      Creating directories
      Creating files
      links
      'Dot' files
Protecting files
      Groups
      File access control
      Changing privileges
File contents
      Text files
      Comparing files
      Filtering files
      Non-text files
Printing files
File archives and file compression
Other relevant commands
Summary
Exercises

Non-text files

We must address the question of what a file contains if it is not a text file. Clearly we cannot use the text utilities described above - not only will the file not be neatly split up into lines, the characters contained within it will in general not be printable. The file command gives us a rough indication as to what sort of data a file (binary or text) contains, but no more. If we need to know exactly what characters are contained in a file that is not printable because it is in, for example, binary format, od will give us precisely that information. Thinking of a file as a sequence of bytes, od lists each byte in a representation that can be printed. The name stands for octal dump, and by default it lists the bytes by their octal (base 8) codes word-by-word (a word being typically 4 bytes).

Since computers use binary code internally, when in the past it was necessary to examine data, it was often not possible to display that data in any way other than as a representation of binary numbers. One of the simplest ways of doing this was to group the bits (binary digits) together in sequences of 3, consider each 3-bit sequence as representing a digit in base 8, and print out the data as a string of octal digits. Hence we get the phrase octal dump.

A more useful way to generate output is with option -t c, whereby each byte is either printed as a 3-digit octal number that is the code for that character, or the character itself (if it is printable), or backslash followed by a character (if a standard escape sequence is known, such as \n for the newline character). For instance,

od -t c bintest
0000000 201 003  \n 013  \0 001 200  \0
0000010  \0  \0   @  \0  \0  \0 251 230
0000020  \0  \0  \0  \0  \0  \0        
0000030  \0  \0  \0  \0  \0  \0  \0  \0
0000040 274 020      \0 320 003 240   @
0000050 222 003 240   D 225   *     002
0000060 224 002 240 004 224 002   @  \n
0000070 027  \0  \0   h 324   " 343 240
0000100 003  \0  \0  \b 302  \0   b  \b
  ...

We see that the first byte in the file has code 201 in octal (which is 129 in decimal). The third byte is a Newline character. Just for comparison, a file called hellotest, containing one line that is simply the word Hello, would be displayed thus:

od -t c hellotest
0000000 H e l l o \n
0000006

The command has several possible options, which we do not list here.

If you just want to examine a binary file quickly, and see what printable strings exist within it, use command strings. This can be useful if you have compiled a program, such as one written in C, and that program contains strings whose value is of interest to you (filenames, for instance). Going through the binary code with od would be tedious.

A useful command we introduce at this point is touch. This command has the effect of changing the date and time that its file arguments were last modified - they are altered to the current date and time. If the files that are its arguments do not currently exist, they are created with size 0; touch is useful to create a file if you haven't yet decided what to put in it, but want it to be there. This might happen during the development phase of a program. It is also useful to indicate that the file is in some sense 'up-to-date'.


Copyright © 2002 Mike Joy, Stephen Jarvis and Michael Luck