Moving many files in Java with minimal memory usage

I need to go through a directory hierarchy containing about 20 million files in Java. I am currently using FileUtils.iterateFiles from Apache Commons-IO. This seems to work by loading the entire list into memory, which is slow (delayed application startup time) and huge hog memory (about 8 GB). Earlier I used my own recursive file iterator that had the same problem.

I need to process only one file at a time (or, on the track, with pride from the front of the list in parallel), so it seems that I do not need to spend all this time, and the memory loads the complete list into Memory.

The Java Iterator class allows me to use the minimum memory iterators I need, but since the built-in functions of the java.io.File class provide only non-terminally initialized arrays, it seems oddly difficult to take advantage of this.

Does anyone have any suggestions on how I can navigate a file hierarchy without loading them into memory in advance?

Thanks to this answer, I now know about the new Java 7 API, which I think will solve my problem, but Java 7 is actually not an option for me at this point.

+4
source share
3 answers

OK, I finished implementing my own iterator to do this (as Amir suggested). It wasn’t quite trivial (although, fortunately, someone already wrote the code to smooth iterators ), but it’s quite simple

It still contains a complete list of one directory (without descendants) in memory, so it is not suitable for a flat layout of the directory (in this case, I think you're out of luck using pure Java before Java 7), but it works so far better for my use.

RecursiveFileIterable.java

 import java.io.File; import java.io.FileFilter; import java.util.ArrayList; import java.util.Arrays; import java.util.Iterator; import java.util.List; public class RecursiveFileIterable implements Iterable<File> { private File file; public RecursiveFileIterable(File f) { file = f; } public RecursiveFileIterable(String filename) { this(new File(filename)); } private class DirectoriesOnlyFilter implements FileFilter { @Override public boolean accept(File pathname) { return pathname.isDirectory(); } } private class NoDirectoriesFilter implements FileFilter { @Override public boolean accept(File pathname) { return !pathname.isDirectory(); } } @Override public Iterator<File> iterator() { List<File> normFiles = Arrays.asList(file .listFiles(new NoDirectoriesFilter())); ArrayList<Iterable<File>> pendingIterables = new ArrayList<Iterable<File>>(); pendingIterables.add(normFiles); File[] subdirs = file.listFiles(new DirectoriesOnlyFilter()); for (File sd : subdirs) pendingIterables.add(new RecursiveFileIterable(sd)); return new FlattenIterable<File>(pendingIterables).iterator(); } } 

FlattenIterable.java

 // from http://langexplr.blogspot.com.au/2007/12/combining-iterators-in-java.html import java.util.Iterator; public class FlattenIterable<T> implements Iterable<T> { private Iterable<Iterable<T>> iterable; public FlattenIterable(Iterable<Iterable<T>> iterable) { this.iterable = iterable; } public Iterator<T> iterator() { return new FlattenIterator<T>(iterable.iterator()); } static class FlattenIterator<T> implements Iterator<T> { private Iterator<Iterable<T>> iterator; private Iterator<T> currentIterator; public FlattenIterator(Iterator<Iterable<T>> iterator) { this.iterator = iterator; currentIterator = null; } public boolean hasNext() { boolean hasNext = true; if (currentIterator == null) { if (iterator.hasNext()) { currentIterator = iterator.next().iterator(); } else { return false; } } while (!currentIterator.hasNext() && iterator.hasNext()) { currentIterator = iterator.next().iterator(); } return currentIterator.hasNext(); } public T next() { return currentIterator.next(); } public void remove() { } } } 
+1
source

Since Java 7 NIO is not an option, you can run "dir / B / AD" (for Windows) and read the file names from the output. If necessary, you can redirect the output to a temp file and read the file names there.

+1
source

I know this is not strictly the answer to your question, but can you reorganize the directory tree to use more directories so that each directory contains fewer files?

+1
source

All Articles