Actually there is a reason why you received lectures: this is the correct answer to your problem. Here is the background so that maybe you can make some changes to your living environment.
First: directories are stored in the file system; think of them as files, because thatβs exactly what they are. When you iterate through a directory, you must read these blocks from disk. For each entry in the directory, sufficient space is required to store the file name and permissions, as well as information about where this file is located on disk.
Secondly: directories are not saved with any internal ordering (at least not in the file systems where I worked with the directory files). If you have 150,000 entries and 2 subdirectories, these 2 links to subdirectories can be within 150,000. You have to iterate to find them, there is no way around this.
So, let's say that you cannot avoid a large directory. The only real option is to try to save the blocks containing the catalog file in the cache in memory so that you do not get to the disk every time you access them. You can achieve this by regularly repeating the directory in the background thread, but this will overload your disks and interfere with other processes. In addition, you can scan once and track the results.
An alternative is to create a multi-level directory structure. If you look at commercial websites, you will see URLs such as /1/150/15023.html - this means that the number of files in the directory is less. Think of it as the BTree index in a database.
Of course, you can hide this structure: you can create a file system abstraction layer that accepts file names and automatically generates a directory tree where these file names can be found.
kdgregory Jun 23 '09 at 20:44 2009-06-23 20:44
source share