I am looking for a time-efficient method for analyzing a list of files in a tree. There may be hundreds of millions of file paths.
A brute force solution should break each path into a directory separator and move through the tree, adding a directory and file to the entries, performing string comparisons, but this will be exceptionally slow.
Inputs are usually sorted alphabetically, so the list will look like this:
C: \ Users \ Aaron \ AppData \ Amarok \ AFile
C: \ Users \ Aaron \ AppData \ Amarok \ Afile2
C: \ Users \ Aaron \ AppData \ Amarok \ Afile3
C: \ Users \ Aaron \ AppData \ Blender \ alibrary.dll
C: \ Users \ Aaron \ AppData \ Blender \ and_so_on.txt
From this streamlining, my natural reaction is to split the directory lists into groups ... somehow ... before doing slow string comparisons. I'm really not sure. I would be grateful for any ideas.
Edit: It would be better if this tree were lazily loaded from top to bottom, if possible.
source share