Command-line Basics: Compressing Files and Directories

joshtronic

Modern operating systems usually offer compression capabilities as part of their file manager. This wasn’t always the case, so it’s nice to see that modern graphical user interfaces have made strides to catch up with the command-line. You see, compression tools have always been available for quite sometime via the command-line, for a wide variety of different compression algorithms.

Getting started

We’re going to discuss how to compress files and directories using the command-line tools zip, gzip, and tar.

These commands tend to come standard issue on most Unix-like system like macOS and Linux. If you are missing any of the tools discussed, I would consult the documentation for your favorite package manager and see if they are available.

Also, if you’d prefer to work with some dummy files instead of the files already available on your file system, you can run the following commands to create a few files and directories that will be referenced below in the examples:

$ mkdir /tmp/dir-of-files
$ touch /tmp/dir-of-files/{one,two,three,four,five}

The files are empty, so there won’t be much benefit to compressing them but for the sake of example that should be just fine.

Tools of the trade

As mentioned, we’re going to use three different commands, zip, gzip, and tar. Each command is associated with a different file extension or extensions and compression algorithm:

  • zip - *.zip files
  • gzip - *.gz files
  • tar - *.tar, *.tar.gz, *.tpz and *.tgz files

You may have noticed that *.gz showed up in association with tar. This is because both gzip and tar follow the Unix philosophy of doing one thing and doing it well.

For gzip that one thing is compressing a single file. tar on the other hand creates a single archive file out of multiple files (called a tape archive or “tarball”).

Because gzip compresses single files, and tar can create single files out of multiple files, it’s like they were meant to be. So much so that tar accepts arguments that allow it to then compress an archive with gzip resulting in a *.tar.gz or “gzipped tarball”.

Compressing a file

Compressing a single file is about as easy as it comes with zip and gzip.

For zip simply pass it the archive name, and the file you’d like to add to it:

$ zip one.zip /tmp/dir-of-files/one

For gzip, you will switch the order, specifying the file first, and then the archive name. Also, be sure to include the -k or --keep argument, or your original file(s) will be deleted after compression:

$ gzip -k /tmp/dir-of-files/one one.gz

Compressing multiple files

Compressing multiple files is nearly identical to compressing individual files with zip, all we need to do it pass in the multiple file names:

$ zip two-three.zip /tmp/dir-of-files/two /tmp/dir-of-files/three

# Or slightly more concise:
$ zip two-three.zip /tmp/dir-of-files/{two,three}

For gzip, things aren’t so easy. In fact, compressing multiple files with gzip is impossible because gzip doesn’t have any understanding of the file system.

This is where our friend tar comes in. Because it takes multiple files and creates a single “tape archive”, we can then use gzip to compress the individual file that is created.

We could delve into piping the output from tar into gzip to create a compressed tarball, but tar makes things easy by allowing us to pass in an argument to tell it to use gzip for compression.

The syntax for tar is a bit more verbose than what we’ve experienced thus far. In fact, you have to pass in an argument to explicitly tell it that you’d like to do the following:

  1. Create an archive - -c or --create
  2. Compress the archive with gzip - -z or --gzip
  3. Output to a file - -f or --file=ARCHIVE

Fortunately we can condense the arguments like so:

$ tar -czf two-three.tgz /tmp/dir-of-files/two /tmp/dir-of-files/three

# Or a bit less typing
$ tar -czf two-three.tgz /tmp/dir-of-files/{two,three}

Compressing a directory of files

Compressing a couple of files is all well and good, but often times we need to compress a bunch of files that are all in a single directory.

Sure, you could right-click and then click “compress” from your favorite GUI file manager, but what’s the fun in that?

To compress an entire directory with zip simply include the -r or --recurse-paths argument:

$ zip -r files.zip /tmp/dir-of-files

Similar to compressing multiple files with gzip, we will need to leverage tar to take the directory and create an archive that we can then compress:

$ tar -czf files.tgz /tmp/dir-of-files

Unlike zip, tar doesn’t need any special arguments as recursing directories is default behavior.

Also worth noting that by default, tar will include the full, absolute path for the files it adds to an archive.

This is usually fine depending on how many directories deep you are. For those types of files, you’ll notice when you decompress it, that the full path is then re-created in whichever directory you are in.

To avoid this and force a shallow path in the archive, you can pass in the -C or --directory=DIR argument followed by a .. This tells tar to change to the specified directory before doing any work.

The trailing . tells tar to archive the current directory, which is the directory we just changed to:

$ tar -czf files.tgz -C /tmp/dir-of-files .

Conclusion

The tools we’ve discussed in this article are actually just the tip of the iceberg. While they do cover the majority of the most common compression types, you could venture into a more arcane realm by using commands like rar, bzip2 and the archiver with the highest compression ratio, 7z.

Similar to how tar can leverage gzip, it can also be combined with other compression tools to create a wide variety of *.tar.* files.

Ready to go beyond the basics? Anything you could ever want to know about zip, gzip, and tar can be found in their man pages!

  Tweet It

🕵 Search Results

🔎 Searching...

Sponsored by #native_company# — Learn More
#native_title# #native_desc#
#native_cta#