Main index

Introducing UNIX and Linux


Files

Overview
The UNIX directory hierarchy
Filesystems
Manipulating files
      Creating directories
      Creating files
      links
      'Dot' files
Protecting files
      Groups
      File access control
      Changing privileges
File contents
      Text files
      Comparing files
      Filtering files
      Non-text files
Printing files
File archives and file compression
Other relevant commands
Summary
Exercises

File archives and file compression

It will often be necessary to take a copy of a complete directory, either for the purpose of storing it in a safe place (a 'backup') in case the computer system 'crashes', or to send it to a different computer system. There are two particular problems that utilities such as cp are unable to address. First, different machines and operating systems store data in different formats, and it may be necessary to convert the format in which the files in the directory are stored. Second, cp does not handle links.

There have historically been two commands, tar ('tape archive') and cpio ('copy in out'), that have been used. Both work by copying all the files in the directory, together with data describing the structure of the directory, into a single file known as an archive. Unfortunately, both tar and cpio work differently and produce archives in different formats. Although tar was used much more extensively than cpio, it was felt necessary to create a completely new command that would perform the functions of both rather than try to update tar so that it would also do everything cpio would do.

Neither tar nor cpio became part of POSIX, but a new command pax ('portable arch\-ive ex\-ch\-ange') has been written. We give a couple of examples illustrating both pax and tar, but note that pax is not found on all Linux systems.

To create a new archive, give pax the argument -w ('write') or tar the argument -c ('create'). The archive file will be sent to standard output. So to archive the contents of the current directory to the tape drive /dev/rst8, either of the following will work:

tar -c . >/dev/rst8
pax -w . >/dev/rst8

Alternatively, you can redirect the output to a file. To extract the contents of an archive, the standard input to pax or tar should be redirected from the archive, pax requires argument -r ('read') and tar argument -x ('extract'). Naturally, when unpacking an archive, you don't want to overwrite any files or directories that you have already created. It is a good idea to check the contents of an archive by means of the -t option to both tar and pax, which simply causes the names of the files in the archive to be listed.

Having multiple copies of directories - whether 'real' or archived - is bound to take up space. If you have created an archive - mydir.pax, say - you can compress the file and reduce its size, by means of the command compress (not a POSIX command). This creates a file mydir.pax.Z (note the .Z suffix) and deletes mydir.pax; the file mydir.pax.Z will have a smaller size than mydir.pax. The actual reduction in file size depends on what the file to be compressed contains, but is typically a factor of between 0.5 and 0.2. For example:

ls
mydir.pax
wc -c mydir.pax
206336
compress mydir.pax
ls
mydir.pax.Z
wc -c mydir.pax.Z
89473

To reverse the compression, use the command uncompress. If you have stored any large files that you do not use on a regular basis, you may wish to compress them.

Worked example

Copy the contents of your current directory to /tmp/backup preserving all links.
Solution: Using pax -w we can create a new archive; store this in a temporary file, create /tmp/backup, change directory to /tmp/backup, and read the archive.

pax -w . >/tmp/backup.pax
mkdir /tmp/backup
cd /tmp/backup
pax -r </tmp/backup.pax


Copyright © 2002 Mike Joy, Stephen Jarvis and Michael Luck