Try using the java-diff-utils library
Example
I use groovy for quick demos of java libraries:
The following are the differences between the two sample files:
$ groovy diff [ChangeDelta, position: 0, lines: [1,11,21,31,41,51] to [1,11,99,31,41,51]] [DeleteDelta, position: 2, lines: [3,13,23,33,43,53]] [InsertDelta, position: 5, lines: [6,16,26,36,46,56]]
files1.csv
1,11,21,31,41,51 2,12,22,32,42,52 3,13,23,33,43,53 4,14,24,34,44,54 5,15,25,35,45,55
file2.csv
1,11,99,31,41,51 2,12,22,32,42,52 4,14,24,34,44,54 5,15,25,35,45,55 6,16,26,36,46,56
diff.groovy
// // Dependencies // ============ import difflib.* @Grapes([ @Grab(group='com.googlecode.java-diff-utils', module='diffutils', version='1.2.1'), ]) // // Main program // ============ def original = new File("file1.csv").readLines() def revised = new File("file2.csv").readLines() Patch patch = DiffUtils.diff(original, revised) patch.getDeltas().each { println it }
Update
According to dbunit, frequently asked questions, the performance of this solution can be improved for very large datasets using the thread revision of the ResultSetTableFactory interface. This is included in the ANT task as follows:
ant.dbunit(driver:driver, url:url, userid:user, password:pass) { compare(src:"dbunit.xml", format:"flat") dbconfig { property(name:"datatypeFactory", value:"org.dbunit.ext.h2.H2DataTypeFactory") property(name:"resultSetTableFactory", value:"org.dbunit.database.ForwardOnlyResultSetTableFactory") } }