At the core of one of our applications, we must combine the lists of key values. Since this merge function is called all the time, it should be as fast as possible. Acceptance of funds for extra speed is possible.
Our application is written in Delphi, so I will refer to some Delphi routines, but I believe that this problem may be of interest regardless of the language used to solve it.
Requirements
- Two lists of input key values ββ("original" and "update") are transmitted as pointers to character arrays, for example.
'Key1=Value1'#13#10'Key2=Value2'#10'Key3=Value3'#13#10#10'Key4=Value4'
. Note that the key and value are separated by "=", and key value pairs can be separated by any combination of characters #13
and #10
. - In pairs of output key values,
#13#10
will always be split. - The order of the key value pairs at the output does not matter.
- If one of the inputs contains a duplicate key, save the duplicate anyway. However, saving only one key is also acceptable, since duplicates should not be there in the first place. If the original and update contain the same key, the value from the update must be saved.
- I deal only with ASCII characters.
My decision
At the heart of my solution is a dictionary that maps a key (string) to a pointer and the length of a memory block containing a value. This map is sorted by keys. It can be reset before use and shared between several calls to the merge procedure, so we save the memory allocation and freeing up memory for the card and its entries. For each list of input key values, complete the following steps:
- Iterate over each character in the input.
- If you encounter a key value separator, remove the key and scan forward to the end of the value.
- If the key exists on the map, update the pointer and the length of the value that we determined by scanning forward.
- Skip all characters
#13
and #10
after the value to go to the beginning of the next key. - Repeat until the end of the entry.
With the map filled, build the output line by iterating over the map, matching the key, key separator, copying the value based on the given position and length and "\ r \ n" for each record. Do not forget the final null terminator.
Ideas for optimization
I tried the following things, measuring performance using the QueryPerformanceCounter Windows API function.
- I initially thought that saving map sorting was too big when the number of keys was small. However, as it turned out, even with two or three keys, saving the sorting of cards led to almost the same performance.
- The map contains the key as a string, that is, I must extract the key from the array of characters and create a string from it using the Delphi SetString procedure. The way I understand Delphi strings should be related to a copy of memory that I would like to avoid. However, saving only the pointer and length for the key, and then comparing them using the CompareString procedure from a Windows block, was much slower than retrieving keys as well as comparing them using CompareStr from SysUtils. I assume this is because the implementation of CompareString is slower. Perhaps there is another procedure for comparing strings that take pointers and lengths as input? However, I did not find it.
- To sort the map, I use the sorting algorithm from Classes.TStringList, which is fast if I'm not mistaken. Maybe a different sorting algorithm is better suited for this scenario?
What other optimizations or even completely different algorithms would you think?
source share