How to calculate md5 checksum in a directory using java or groovy?

I want to use java or groovy to get the md5 checksum of the full directory.

I need to copy the directories for the source to the target, checksum and target, and also after deleting the source directories.

I find this script for files, but how to do the same with directories?

import java.security.MessageDigest def generateMD5(final file) { MessageDigest digest = MessageDigest.getInstance("MD5") file.withInputStream(){ is -> byte[] buffer = new byte[8192] int read = 0 while( (read = is.read(buffer)) > 0) { digest.update(buffer, 0, read); } } byte[] md5sum = digest.digest() BigInteger bigInt = new BigInteger(1, md5sum) return bigInt.toString(16).padLeft(32, '0') } 

Is there a better approach?

+4
source share
5 answers

I made a function to calculate the MD5 checksum in a directory:

Firstly, I use FastMD5: http://www.twmacinta.com/myjava/fast_md5.php

Here is my code:

  def MD5HashDirectory(String fileDir) { MD5 md5 = new MD5(); new File(fileDir).eachFileRecurse{ file -> if (file.isFile()) { String hashFile = MD5.asHex(MD5.getHash(new File(file.path))); md5.Update(hashFile, null); } } String hashFolder = md5.asHex(); return hashFolder } 
+3
source

I had the same requirement, and I selected my directory hash as an MD5 hash of the combined streams of all (not directories) files in the directory. As crozin said in the comments on a similar question , you can use SequenceInputStream to act as a stream that combines the load of other threads. I am using Apache Commons Codec for the MD5 algorithm.

Basically, you recurs through a directory tree by adding FileInputStream instances to Vector for files without directories. Vector , then itโ€™s convenient to use the elements() method to provide the Enumeration that the SequenceInputStream loop should go through. For the MD5 algorithm, this looks like a single InputStream .

It turns out that you need the files presented in the same order every time the hash must be the same with the same inputs. The listFiles() method in File does not guarantee ordering, so I sort by file name.

I did this for files managed by SVN and wanted to avoid hashing hidden SVN files, so I applied a flag to avoid hidden files.

The corresponding base code is given below. (Obviously, this may be "hardened.")

 import org.apache.commons.codec.digest.DigestUtils; import java.io.*; import java.util.*; public String calcMD5HashForDir(File dirToHash, boolean includeHiddenFiles) { assert (dirToHash.isDirectory()); Vector<FileInputStream> fileStreams = new Vector<FileInputStream>(); System.out.println("Found files for hashing:"); collectInputStreams(dirToHash, fileStreams, includeHiddenFiles); SequenceInputStream seqStream = new SequenceInputStream(fileStreams.elements()); try { String md5Hash = DigestUtils.md5Hex(seqStream); seqStream.close(); return md5Hash; } catch (IOException e) { throw new RuntimeException("Error reading files to hash in " + dirToHash.getAbsolutePath(), e); } } private void collectInputStreams(File dir, List<FileInputStream> foundStreams, boolean includeHiddenFiles) { File[] fileList = dir.listFiles(); Arrays.sort(fileList, // Need in reproducible order new Comparator<File>() { public int compare(File f1, File f2) { return f1.getName().compareTo(f2.getName()); } }); for (File f : fileList) { if (!includeHiddenFiles && f.getName().startsWith(".")) { // Skip it } else if (f.isDirectory()) { collectInputStreams(f, foundStreams, includeHiddenFiles); } else { try { System.out.println("\t" + f.getAbsolutePath()); foundStreams.add(new FileInputStream(f)); } catch (FileNotFoundException e) { throw new AssertionError(e.getMessage() + ": file should never not be found!"); } } } } 
+9
source

HashCopy is a Java application. It can generate and validate MD5 and SHA on a single file or directory recursively. I'm not sure if it has an API. It can be downloaded from www.jdxsoftware.org.

+2
source

Based on Stuart Rossiter, they respond, but correctly clear the code and hidden files:

 import org.apache.commons.codec.digest.DigestUtils; import java.io.*; import java.nio.file.Files; import java.util.Arrays; import java.util.Comparator; import java.util.List; import java.util.Vector; public class Hashing { public static String hashDirectory(String directoryPath, boolean includeHiddenFiles) throws IOException { File directory = new File(directoryPath); if (!directory.isDirectory()) { throw new IllegalArgumentException("Not a directory"); } Vector<FileInputStream> fileStreams = new Vector<>(); collectFiles(directory, fileStreams, includeHiddenFiles); try (SequenceInputStream sequenceInputStream = new SequenceInputStream(fileStreams.elements())) { return DigestUtils.md5Hex(sequenceInputStream); } } private static void collectFiles(File directory, List<FileInputStream> fileInputStreams, boolean includeHiddenFiles) throws IOException { File[] files = directory.listFiles(); if (files != null) { Arrays.sort(files, Comparator.comparing(File::getName)); for (File file : files) { if (includeHiddenFiles || !Files.isHidden(file.toPath())) { if (file.isDirectory()) { collectFiles(file, fileInputStreams, includeHiddenFiles); } else { fileInputStreams.add(new FileInputStream(file)); } } } } } } 
+1
source

It is unclear what it means to take the md5sum directory. You may need a checksum of the file list; you may need a checksum of file lists and their contents. If you already summarize the file data yourself, I would suggest that you specify a unique representation for the list of directories (keep an eye on the evil characters in the file names), and then calculate the hash each time. You also need to think about how you will handle special files (sockets, pipes, devices, and symbolic links in the unix world, NTFS has file streams, and I believe something like symbolic links).

0
source

Source: https://habr.com/ru/post/1312372/


All Articles