Stripper revisited: the final version
This is the final update (for now) to the Stripper project about which I have posted lately. In this update, I not only present updated code, but also a detailed description of how Stripper works.
The Stripper is an application that produces minimal and self-contained distributions of java applications. Java applications are typically deployed in the form of one or more JAR files. Of course, they might be distributed in the form of a single ZIP-file, or via Java Web Start, but they usually end up as a set of JAR files deployed in some directory-structure after they are installed.
The fact of the matter is that the constituent JAR files of the application are usually not minimal: more likely than not, they contain unneeded classes that are never used. Application JAR’s are generally produced as the outcome of a build process, and most of these focus on producing self-contained, not minimal distributions. As a result, application JAR’s tend to get bloated with all manner of “baggage” files that serve no purpose (except, of course, to fatten their distribution files).
The Stripper helps to slim down application JAR files after they have been generated by a build process. It does so by statically analyzing a given main class to transitively find all classes that are (or may be) used by this class, and including all such classes (and only such classes) in a “stripped” JAR file. The stripped JAR file is equivalent to the input JAR files, the only difference being that it contains only classes that can ever be used by the main class. (As a bonus, the stripped JAR file is an executable JAR file: running “java –jar stripped_jar_file.jar” launches the main class).
Designing the Stripper
The stripper performs two functions:
- Identify all classes that can ever be loaded by the main class.
- Package these classes into a given output JAR file.
Let us look at each of these in turn.
Identifying reachable classes
The problem of identifying all classes that can be directly or indirectly reached by a given main class is impossible to solve using static-analysis alone. Static analysis enables us to examine a class file to find out all classes it uses (and further examine each used class, transitively, to find out all classes they use). However, it is unable to address the following:
- Classes sometimes dynamically instantiate other classes (such as via Class.forName(“otherClass”).
- Classes sometimes use abstract classes and interfaces, using factories to create actual instances of these at runtime (such as through Spring XML configuration). Static analysis can identify the abstract classes and interfaces, but not the actual instance classes.
So, how do we identify these classes? We use the following heuristics to address the possibility of factory classes (see 2. above). For each class obtained by static analysis:
- Find all descendent classes (those that extend the class or a sub-class of the class). If this is an interface, find all classes implementing the interface.
- Find all super-classes, and their descendent classes.
- Repeat the above process iteratively for all classes found.
We address dynamic instantiation by allowing the user to specify a list of package names of classes that must be included in their entirety. All classes belonging to any such package (or sub-packages thereof) are included in the output. For example, specifying “org.apache.log4j” causes all classes belonging to any package beginning with “org.apache.log4j” to be included in the output.
Finally, all non-class resources are included in the output.
Packaging class and non-class resources
Now that we have a list of resources to include in the output, let us see how to create a JAR file containing these resources. The “easy” way to create a JAR file is to simply use code like this:
ZipOutputStream out = null;
try {
out = new ZipOutputStream(new BufferedOutputStream(
new FileOutputStream(jarFile)));
for (String entryName : entryNames) {
ZipEntry zipEntry = new ZipEntry(entryName);
out.putNextEntry(zipEntry);
out.write(contents.get(entryName));
out.closeEntry();
}
out.finish();
out.flush();
} finally {
if (out != null) {
out.close();
}
}
The problem is that while the above code produces a JAR file containing all the entries, the Java launcher cannot use it to load the main class. (Why it fails to do is a mystery to me at the present time). What does work is the following pre-processing code that first sorts the entries by name, then inserts directory entries each time the directory of a file entry changes:
private List<String> updateEntryNamesWithDirectoryEntries(
List<String> entryNames) {
// If there is a manifest entry, move it to the beginning of entryNames.
if (entryNames.contains("META-INF/MANIFEST.MF"
) {
entryNames.remove("META-INF/MANIFEST.MF"
;
entryNames.add(0, "META-INF/MANIFEST.MF"
;
}
List<String> ret = new ArrayList<String>();
StringBuilder str = new StringBuilder();
String lastDir = null;
for (String entryName : entryNames) {
if (!entryName.contains("/"
) {
if (!ret.contains(entryName)) {
ret.add(entryName);
}
} else if (!entryName.endsWith("/"
) {
String dir = entryName.substring(0, entryName.lastIndexOf("/"
);
if (!dir.equals(lastDir)) {
lastDir = dir;
// Prepare to insert this as a directory-entry.
String directoryEntryName = dir;
String[] tokens = StrUtils.parse(directoryEntryName, "/"
;
str.setLength(0);
for (String token : tokens) {
if (str.length() != 0) {
str.append("/"
;
}
str.append(token);
if (!ret.contains(str.toString() + "/"
) {
ret.add(str.toString() + "/"
;
}
}
}
if (!ret.contains(entryName)) {
ret.add(entryName);
}
}
}
Collections.sort(ret);
return ret;
}
The following code then writes the processed entries (which now contain directory-entries interspersed with file-entries) to the output JAR file:
ZipOutputStream out = null;
try {
out = new ZipOutputStream(new BufferedOutputStream(
new FileOutputStream(jarFile)));
for (String entryName : entryNames) {
ZipEntry zipEntry = new ZipEntry(entryName);
out.putNextEntry(zipEntry);
if (entryName.endsWith("/"
) {
// Directory entry
} else {
// File entry
out.write(contents.get(entryName));
}
out.closeEntry();
}
out.finish();
out.flush();
} finally {
if (out != null) {
out.close();
}
}
Using the Stripper programmatically
While the Stripper is useful as a build-tool or a command-line application, it may also be used programmatically. See the “com.subhajit.codeanalysis.distribution.DistributionManager” class in the provided source code, and the “StripperTest” file shown below to see how to do this:
package com.subhajit.stripper.test;
import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLClassLoader;
import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import java.util.concurrent.ExecutionException;import org.junit.Test;
import com.subhajit.codeanalysis.distribution.DistributionManager;
import com.subhajit.common.classloaders.ParentLastURLClassLoaderX;
import com.subhajit.common.classloaders.URLClassLoaderX;
import com.subhajit.common.util.StrUtils;
import com.subhajit.common.util.streams.FileUtils;public class StripperTest {
private URLClassLoaderX getJavaClassPathBasedClassLoader()
throws MalformedURLException, IOException, ClassNotFoundException {
String[] classpathElements = StrUtils.parse(System
.getProperty("java.class.path", File.pathSeparator);
List<URL> urls = new ArrayList<URL>();
for (String classpathElement : classpathElements) {
File classpathFile = new File(classpathElement);
// This is a hack. Maven includes the current-directory in the
// classpath, leading to badly named classes being found by the code
// analysis utils.
if (!classpathFile.getAbsolutePath().equals(
System.getProperty("user.dir")) {
urls.add(classpathFile.toURI().toURL());
}
}
URLClassLoaderX classLoader = new URLClassLoaderX(urls
.toArray(new URL[0]));
return classLoader;
}@Test
public void testDistributionManager3() throws MalformedURLException,
IOException, ClassNotFoundException, InterruptedException,
ExecutionException {
URLClassLoader classLoader1 = getJavaClassPathBasedClassLoader();
final File tempFile1 = File.createTempFile("tf1", ".jar"
try {
final File tempFile2 = File.createTempFile("tf2", ".jar"
try {
// Create a distribution of the Stripper in a temporary file.
Set<String> classes1 = DistributionManager
.createDistributionWithAdditionalPackages(classLoader1,
tempFile1, null,
"com.subhajit.stripper.Stripper"
assert classes1
.contains("com.subhajit.common.listingprovider.ResourceManagerImpl"
assert classes1
.contains("com.subhajit.common.listingprovider.ListingProvider"
assert classes1
.contains("com.subhajit.common.classloaders.ByteMapListingProvider"
assert classes1
.contains("com.subhajit.common.contenturl.MemoryContent"
// Now, use tempFile1 as the input and generate tempFile2 by
// stripping tempFile1.
ParentLastURLClassLoaderX classLoader2 = new ParentLastURLClassLoaderX(
new URL[] { tempFile1.toURI().toURL() });
Set<String> classes2 = DistributionManager
.createDistributionWithAdditionalPackages(classLoader2,
tempFile1, null,
"com.subhajit.stripper.Stripper"
assert classes2
.contains("com.subhajit.common.listingprovider.ResourceManagerImpl"
assert classes2
.contains("com.subhajit.common.listingprovider.ListingProvider"
assert classes2
.contains("com.subhajit.common.classloaders.ByteMapListingProvider"
assert classes2
.contains("com.subhajit.common.contenturl.MemoryContent"
assert classes1.equals(classes2);
} finally {
if (tempFile2.exists()) {
FileUtils.deleteFile(tempFile2);
}
}
} finally {
if (tempFile1.exists()) {
FileUtils.deleteFile(tempFile1);
}
}
}
}
Download the Stripper
I am providing the following files for download:
| stripper-all.jar | The maven build-output of the stripper project. This is an”unstripped” JAR in the context of this blog post. You can get it here. |
| stripper.jar | The stripped output of the stripper, obtained by running the stripper on “stripper-all.jar”. You can get it here. |
| src.zip | Zip file containing the source-code for all classes in the ‘com.subhajit” packages in stripper.jar. You can get it here. |

Comments