More on fast directory scanning


After posting the "DirScanner" class for fast directory scanning this morning (see my earlier post), some possible improvements to DirScanner occured to me.

One "low hanging fruit" was the common concurrent map shared between all the threads in DirScanner, and its potential to serve as a "point of contention". Note that since all threads write to this map, it is (internally, and partially) locked each time it is updated. The concurrent hash map uses lock striping to alleviate contention, but does not eliminate it altogether.

In the improved version shown below, each thread (of the fixed thread pool) is a custom thread (ThreadWithList) containing a list of files. (Josef, a reader, also mentioned this in his comment to my earlier posting). The "scan0" method, which is called within a thread, can thus safely add to the particular thread's list of files without any synchronization.

There is an instance method named "listOfLists", and each ThreadWithList adds a reference to its instance member "list" to listOfLists.

Once all the work is done, the "scan" method simply iterates through the list and accumulates and returns all of the entries in each list.

Ok, so here is the source code of the DirScanner3 class:

public class DirScanner3 {
    private final ExecutorService executor;
    private final Semaphore sem = new Semaphore(1);
    private final AtomicInteger count = new AtomicInteger();

    private final List<List<File>> listOfLists = new ArrayList<List<File>>();

    /**
     * Custom {@link Thread} returned by the {@link ThreadFactory} used by the
     * {@link ExecutorService} data member <tt>executor</tt>.
     */
    private final class ThreadWithList extends Thread {
        private final List<File> files;

        public ThreadWithList(Runnable r) {
            super(r);
            files = new ArrayList<File>();
            DirScanner3.this.listOfLists.add(files);
        }

        public List<File> files() {
            return this.files;
        }
    }

    private DirScanner3(int threads) {
        super();
        executor = Executors.newFixedThreadPool(threads, new ThreadFactory() {
            public Thread newThread(Runnable r) {
                return new ThreadWithList(r);
            }
        });
    }

    public Collection<File> scan(final File dir) throws InterruptedException,
            ExecutionException {
        sem.acquire();
        executor.submit(new Runnable() {
            public void run() {
                scan0(dir);
            }
        });
        sem.acquire();
        List<File> ret = new ArrayList<File>();
        for (List<File> files : listOfLists) {
            ret.addAll(files);
        }
        return ret;
    }

    private void scan0(File dir) {
        for (final File file : dir.listFiles()) {
            ((ThreadWithList) Thread.currentThread()).files.add(file);
            if (file.isDirectory()) {
                count.incrementAndGet();
                executor.submit(new Runnable() {
                    public void run() {
                        DirScanner3.this.scan0(file);
                    }
                });
            }
        }
        if (count.decrementAndGet() < 0) {
            sem.release();
        }
    }

    public void close() {
        executor.shutdown();
    }

    public static Collection<File> listAllContentsUnder(File dir, int threads)
            throws InterruptedException, ExecutionException {
        DirScanner3 scanner = new DirScanner3(threads);
        try {
            return scanner.scan(dir);
        } finally {
            scanner.close();
        }
    }
}

It is important to note that DirScanner3 cannot be instantiated. Instead, it must be used via the listAllContentsUnder static method. I am doing this to save time clearing each element of listOfLists, and resetting the counter to zero, in the “scan” method. Perhaps this is a bad idea, and I will have to think about this some more.

In the meantime, please enjoy and keep the comments coming.


 

What did you think of this article?




Trackbacks
  • No trackbacks exist for this post.
Comments
  • No comments exist for this post.
Leave a comment

Submitted comments are subject to moderation before being displayed.

 Enter the above security code (required)

 Name

 Email (will not be published)

 Website

Your comment is 0 characters limited to 3000 characters.