Using hash functions to store files?

A common technique for storing a large number of files / blocks in the file system is to use a hash function to determine the path to the file; for example, a hash (identifier) ​​→ "o238455789" → o23 / 8455/789 (often a hash collision strategy exists)

Does this technique have a name (is it a "template"?) So I can find it with a search for ACM Digital Library or a similar online database of computational literature.

Are there any books / documents that explore the problem / solution?

PS thanks for the helpful notes, but no one turns to the above technique.

+7
design-patterns
source share
4 answers

I think this is what Microsoft did in SQL Server 2008 with the FILESTREAM repository. It allows you to store BLOB data inside SQL Server, but allows you to access files directly from disk, which gives you kick-ass performance.

Microsoft has published a white paper on managing unstructured data that might interest you. In addition, an MSDN article describing FILESTREAM , as well as the pros and cons of storing files, whether BLOB or not BLOB

+3
source share

US Pat. No. 5,742,807 relates to this.
http://www.freepatentsonline.com/5742807.html

Systems and methods for managing a plurality of electronically stored documents in an open document repository use a one-way hash function to calculate the hash for stored documents as an indexing link. The document management index maps the attribute of the original document stored in the repository to the hash and the document. The hash to-location index maps the hash to the address location of the document in the repository file system. An attribute points to a hash, which then points to a place to bind the attribute to a location.

+2
source share

@ Chris Kimpton

This is called indexing. Facing or splitting is more about how to split a file.

+1
source share

Sounds like sharding , but I probably lack subtleties.

And equally, I do not see many articles in it - several on highscalability.com

0
source share

All Articles