|Dr. Dobb’s Journal December, 1997
by Mark Nelson
Sun has given Java developers a new set of library components that provide support for reading and writing Zip files. The addition of powerful compression and archiving abilities to Java is a boon to developers, who have traditionally had to rely on either third party libraries or proprietary tools for these functions. This article takes a quick look at how to use Sun’s java.util.zip package, and how to avoid a few common mistakes that the library overlooks.
The New Arrival
As I’m writing this, Sun has just released the 1.1 release of the Java Developer’s Kit, or JDK. For a point release, JDK 1.1 has quite a few major changes. One of those changes was the addition of JAR files. JAR files are simply compressed archives that contain various components of a Java applet, including class files, images, sound clips, etc. The JAR format should speed up the loading of applet components across the web, first by compressing data, and second by reducing the number of separate transactions required during the download process.
Sun has pitched its tent squarely in the middle of the kingdom of openness, so naturally the JAR file format needs to adhere to an accepted industry standard. Sun chose to use the ZIP file format, which had a couple of big advantages. First, commercial and free ZIP tools are widely available, including the deservedly revered zlib and InfoZip products. Just as importantly, the adoption of an open format like Zip allows Sun to taunt Microsoft, whose ActiveX technology requires that developers use the proprietary CAB format for compression and archiving.
To implement support for JAR files, Sun’s Java developers first ported most of zlib to pure Java. (Some critical code has been implemented using native methods.) This took care of implementing the deflate compression algorithm used in PKZip 2.x. They then created a fairly thin set of wrapper classes that are used to create the archive structure around the deflated data. The result is the java.util.zip package.
The venerable and inscrutable Zip format
Before we look at how java.util.zip deals with Zip processing, it helps to know a little bit about the structure of a Zip file. Figure 1 shows the layout of the Zip format. You can think of a Zip file as a stream oriented format, which means you can create an archive by writing sequentially without ever having to seek backwards in the output file.
Figure 1 – The Zip file format
The Zip file starts with a sequence of files, each of which can be compressed or stored in raw format. Each file has a local header immediately before its data, which contains most of the information about the file, including timestamps, compression method and file name. The compressed file contents immediately follow, and are terminated by an optional data descriptor. The data descriptor contains the file’s CRC and compressed size, which are frequently not available when writing the local file header. (If they are, the data descriptor can be skipped.)
Each file in the archive is laid down sequentially in this format, followed by a central directory at the end of the Zip archive. The central directory is a contiguous set of directory entries, each of which contains all the information in the local file header, plus extras such as file comments and attributes. Most importantly, the central directory contains pointers to the position of each file in the archive, which makes navigation of the Zip file quick and easy.
The complete roster of package java.util.zip includes 14 classes, one interface, and two exception classes. While that may sound like a lot, most users will be able to skip over the bulk of the package. By concentrating on the four core classes of the package, you can perform the three most common operations on Zip files: creation, extraction, and directory reads. Those classes are:
write both the local file header and the central directory record for a
specific file, such as file sizes, time stamps, comments, attributes, etc.
||This input stream behaves like a standard
||This output stream looks like an
Putting the classes to work
The first sample program,
ZipList, is shown in Listing 1. This program has less than 50 lines of Java code, and it manages to print out the entire contents of a Zip file, including nicely formatted date and time stamps. To see the contents of foo.zip,
ZipList.class can be executed using the following command:
java ZipList foo.zip
ZipList produces the following output for a sample Zip file:
Listing of : temp.zip Raw Size Size Date Time Name -------- --------- --------- ---------- -------------------------- 1981717 1718555 17-Mar-97 9:05:02 PM temp/Tjava.pdf 547 272 20-Apr-97 1:36:40 PM TEMP.DAT 92870 41915 11-Jul-95 9:50:00 AM COMMAND.COM 1536 110 03-Mar-97 4:55:50 AM ~OLEAPP.DOC
Coming up with this listing couldn’t be much easier. It involves just two steps. First, I create a new
ZipFile object by calling the standard constructor with a filename:
ZipFile z = new ZipFile( args[ 0 ] );
ZipFile object has been created, I call the
entries() method, which returns an
Enumeration object. The
Enumeration object successively returns
ZipEntry objects, one for each file in the Zip file. The methods in the
ZipEntry object used to read various attributes include:
String getComment() long getCompressedSize() long getCrc() byte getExtra() int getMethod() String getName() long getSize() long getTime() boolean isDirectory()
With these methods, you should be able to follow the code in Listing 1 quite easily. Java makes the job even easier by providing built in classes to deal with dates, times, and strings.
The first sample program accompanying this article shows you how to list the contents of a Zip file. I built on that knowledge to create the program shown in Listing 2,
ZipExtract.java. This GUI program lets the user enter a zip file name in a simple text box, then list the files in the Zip file in a list box. The user can then select files, and extract them in a batch process. The program is shown in action in Figure 2.
Figure 2 – ZipExtract.java at work
The method used to load up the list box in
ZipExtract is a stripped down variation of the code in
ZipList. Since the list box only contains the file name, the enumeration loop only has to call
The majority of the non-UI code in
ZipExtract is found in the two methods that are called when the Extract Files button is pressed. The
extractFiles() method consists of a loop that is run through once per selected file name in the list. The routine calls
ZipFile.getEntry() for each file to get the
ZipEntry object associated with the given file name. It would be nice if at that point there was a method called
ZipEntry.extract(), but unfortunately things aren’t that simple.
The designers of
java.util.zip decided to leave the actual hard work of extraction and insertion of files entirely in the hands of the package user. The
ZipFile object provides you with a
ZipInputStream object (upcast to
InputStream) via the
getInputStream() method, and the rest is up to you. You read bytes from the stream (which transparently decompresses), and write the output wherever you wish. I do this in the
ZipExtract sample program in method
extractOneFile(). I read in 100K bytes at a time, and write them to the specified output file. Since the extraction process from a
ZipFile is exceptionally fast, I can supply a progress update only once every 100K bytes and still seem fairly responsive.
A careful examination of the code in
ZipExtract.java reveals a shortcoming of the
java.util.zip package. When extracting files using
unzip.exe, we normally expect file timestamps and protection bits to be set to the values stored in the Zip file. This doesn’t happen anywhere in
ZipExtract, and it couldn’t even if I wanted it to.
Java.util.zip ducked its responsibilities in this area by not providing any sort of
extract() method. Worse yet, the entire Java library leaves out the functions needed to do this on my own. If you want to set timestamps or protection bits from Java, you are going to have to resort to native methods, a decidedly un-PC proposition.
Creating Zip archives
The final sample program I wrote to illustrate this article is
ZipCreate.java, shown in Listing 3. This is a simple command line program that is called with a zip file name as a command line argument, followed by a list of wild card filespecs.
ZipCreate expands the wild card file specs into a list of files, removes any duplicates, and creates a Zip file with the resulting list. All this is done using a worker class I created called
Oddly enough, creating a Zip archive doesn’t involve the
Zipper.create() in Listing 3 shows the process, which is fairly simple. A new
ZipOutputStream object is created using a standard filename. Files are then added to the output stream one at a time, using a
ZipEntry object to control the process. The data that is written preceding the file in the Zip archive is done using method
ZipOutputStream.putNextEntry(). The file data is then written using a series of standard
write() calls, which compresses transparently. Once again, the lack of a standard
insert() function means we have to do all the hard work ourselves. After all of the file data is written, a call to
ZipOutputStream.closeEntry() writes the data that follows the compressed file.
Listing 3 really highlights a few blank spots in the Java library. First of all, it would be great to be able to use the Java API to expand wild card file specifications. The
File.list() method in Java provides the hooks to do just that, but the lack of a regular expression parser means that third party solutions are required for implementation. I poked around on the net and found a shareware package called pat 1.1 by Steve R. Brandt, which you can find at http://www.javaregex.com/. It plugged in quickly and easily, and required only a small amount of code to integrate with my app.
Another glaring oversight in the Java API is found in the
ZipEntryclass definition. Although
ZipEntry is used to carry around information about a file, such as its length, timestamps, and protection bits, none of this data is created automatically! If you want your Zip file to contain accurate timestamps and protection setttings for your file, you are going to have to enter them yourself. And even worse, it appears that you will have to resort to native methods, because the Java library doesn’t have the functions you need to do this in a platform independent fashion.
Throughout these examples I’ve only glossed over the topic of errors. Of course, one of the really nice things about Java is that you can get away with a casual treatment of errors. All of my programs feature a try/catch block at a high level, which means any fatal errors thrown by
java.util.zip will be caught and printed out when they occur. So rather than constantly checking flags and status bits after every library call, I can proceed as if everything works perfectly, knowing full well that errors will be caught somewhere else if and when they occur.
When I compare this sort of error handling to that I needed to use when writing demo programs for my C++ Zip library at Greenleaf Software, it’s easy to see why Java is a really great language for applications. Handling errors at a high level makes the rest of your code quite a bit easier to read, write, and maintain.
The creation and inclusion in Java of packages such as
java.util.zip is a good move by Sun. This sort of utility is bound to help convince people that Java is more than just a toy language for demo applets on the World Wide Web. Personally, the idea that I can write utility and demo programs that have a high level of platform independence is really exciting. I’ve had pretty good luck writing command line C and C++ programs that port between various platforms and compilers, but never this easily. And I never even considered the idea of trying to write portable GUI programs. All this is much more feasible now.
On the downside,
java.util.zip is presently a fairly shallow package. First of all, it is far too easy to create Zip files that are going to be unusable by other programs. Sun’s classes don’t check for validity of things such as file names, extra data, time stamps, and so on. Second, Sun doesn’t provide support for low level file attribute manipulation, which is really needed for a good package. And finally, it would be a good idea to make
java.util.zip a little friendlier by adding functions to actually perform the insertion and extraction of files.
Compiler vendors have always had trouble deciding whether they wanted to do full scale library development. Every vendor has a few half-hearted library components, such as MS-DOS graphics libraries, complex number libraries, or container classes. Some library efforts turn into real products, Microsoft’s MFC for example. Right now Sun has only dipped a toe into the water, time will tell whether they decide to dive in or not.