I need to write a program that compares 10'000'000 + entities against each other. Objects are basically flat lines in the database / csv file.
The comparison algorithm should be quite flexible, it is based on a rule engine in which the end user enters rules, and each object is mapped to any other object.
I’m thinking about how I could divide this task into smaller workloads, but I haven’t found anything yet. Because the rules are entered by the end user, pre-sorting the DataSet seems impossible.
What I'm trying to do now matches all the DataSet data in memory and processes each element. But it is not very effective and requires approx. 20 GB of memory (compressed).
Do you have an idea how I can split the load or reduce its size?
thanks
c # algorithm matching
senic
source share