Stripper: A Self-contained and minimal CODE-DISTRIBUTION generator
I have been thinking about an interesting problem for some time: how to generate code-distributions that are both self-contained and minimal.
- By code-distribution I mean a Java application distributed in the form of one or more JAR files, of which one class is a designated “main class” (that is, it contains a “public static void main(String[])” method).
- By self-contained I mean that the code-distribution contains all the classes and resources needed by the main class to run.
- By minimal I mean that the code-distribution does not contain any classes that are not used, directly or indirectly, by the main class.
Minimal and self-contained code-distributions such as these (which I will call “MSD’s” in this posting) are generally useful for application-deployments since they minimize storage requirements, load-times and, where applicable, download-times.They are perfect for “sample code” provided by bloggers who do not want their readers to be bogged down in a “morass” of source-code that they do not need to look at for the task at hand.
MSD’s may not always be feasible, especially if third-party licenses prevent classes (present in third-party JAR’s) from being extracted and re-packaged from their originally distributed form. For example, a certain third-party product’s license might require that they be distributed in the form they were originally provided.
Having looked at MSD’s, I present an MSD-generator named “Stripper” (including complete source-code) to creating them.
Stripper requires the following inputs:
- A list of JAR files representing a self-contained distribution for a “main” class: Typically, this is the end-result of your build process (such as after running an ant or maven build script).
- The name of the main-class.
- Optionally, a list of source-code locations (directories or JAR files containing “.java” files for some or all of the classes in the distribution.
and produces the following outputs:
- An executable JAR that is the MSD, with the “Main-Class” attribute in its manifest set to the name of the designated main class.
- If requested, a separate JAR file containing the source-code for your classes in the MSD.
Download details, such as a ready-built executable JAR file, a source JAR and instructions for use are provided below.
Approach
Stripper uses the following approach to generate MSD’s:
- Starting with the main class, use static analysis to find all classes it uses, followed by transitively finding all classes those classes use. In other words, Stripper first finds all classes used by the main class, followed by finding the classes used by all of these classes, and continuing this process transitively for all classes.
- For all interfaces found above, find their super-interfaces and sub-interfaces. For each of these, find all implementation classes, and the classes used by those classes, recursively.
- For all classes found above, include their super classes and sub-classes, and the classes they use, recursively.
Of these steps, the first one (static analysis) seems reasonable on its face, since we wish to generate a self-contained distribution. The second and third steps are needed because classes sometimes use classes via external configuration (such as properties or XML configuration files) dynamically; static analysis (step 1) alone does not recognize these dynamically named classes.
For example, the following source code shows an InputStream being created dynamically from a custom class named “my.custom.InputStream”:
InputStream in = ( InputStream) Class.forName(“my.custom.InputStream”).newInstance(bytes);
Static analysis of this class shows that it uses InputStream, not “my.custom.InputStream”. We still need to include “my.custom.InputStream” in the output MSD, however, since not doing so would result in a “NoClassDefFoundError” at runtime.
To address this form of dynamic instantiation, we must include all subclasses of used classes (and implementations of used interfaces) in the generated MSD. Unfortunately, the MSD then potentially ends up containing classes that are never used.
FINDING USED CLASSES
Stripper finds all classes a given class uses uses the following steps:
- Include all interfaces implemented by the class and its super-classes.
- Include all super-classes of the class.
- Include the types (and their super-types) of all declared fields of the class and its super-classes.
- Include the return types (and their super-types), exception types (and their super-types) and parameter types of all methods declared by the class and its super-classes.
- Finally, scan the class’s constant pool, including all string entries that “look like” class names and for which “class” resources are found in the input distribution.
MANIFESTS
Stripper computes the manifest (the “META-INF/MANIFEST.MF” entry) of the generated MSD by first computing an “effective manifest”, then updating its “Main-Class” attribute (if any) with the given main class. The “effective manifest” is computed as follows. If the user specifies a custom manifest, then that becomes the effective manifest. Otherwise, all the manifests found in the input JAR files are merged to create the effective manifest. The updated effective manifest is then saved into the generated MSD JAR file.
USING Stripper
The MSD generator presented here, called “Stripper”, is a command-line application that accepts the following options:
| d | A comma-separated list of directories containing JAR files comprising the input distribution. |
| f | A comma-separated list of JAR file names comprising the input distribution. |
| o | The output file (name of the generated MSD), which must be a JAR file. |
| m | The name of the main class. |
| s | Comma-separated list of directories or ZIP/JAR files containing source code. |
| i | Optional parameter which, if “true”, causes the stripper to include sub-classes of all classes in the output. Any value other than “true” causes sub-classes not to be included. |
I used the Stripper to “strip” itself, by first generating the build-output of my project (using maven “jar-with-dependencies”), copying the resulting JAR to another location, and then running the build output on that JAR:
Note that I did not specify the “-s” option (list of source-code locations), but the Stripper generated a source JAR (“src.zip”). The Stripper implicitly uses all input distribution JAR’s as source-code locations, picking up source entries for output-classes it finds there. In my case, I choose my maven project to emit source-code into my build outputs, so I do not need to specify any explicit source-code locations.
SOURCE CODE
The stripper code (in executable JAR form) can be downloaded from here. The MSD generated by the stripper (the “stripped” stripper) can be downloaded here.

Comments