I have a web server that saves cache files and stores them for 7 days. File names are md5 hashes, i.e. exactly 32 hexadecimal characters, and are stored in a tree structure that looks like this:
00/ 00/ 00000ae9355e59a3d8a314a5470753d8 . . 00/ 01/
You get the idea.
My problem is that deleting old files takes a lot of time. I have a daily cron job doing
find cache/ -mtime +7 -type f -delete
which takes more than noon. I worry about scalability and the impact on server performance. Also, the cache directory is now a black hole in my system, capturing random innocent du or find .
The standard solution for the LRU cache is a kind of heap. Is there any way to scale this level of file system? Is there any other way to implement this in a way that simplifies management?
Here are the ideas I reviewed:
- Create 7 top directories, one for each business day and one empty directory every day. This increases the search time for the cache file by 7 times, makes it really complicated when the file is overwritten, and I'm not sure what it will do with the delete time.
- Save the files as drops in a MySQL table with indexes by name and date. It seemed promising, but in practice it was always much slower than the FS. Maybe I'm not doing it right.
Any ideas?
linux filesystems caching
itsadok
source share