Over the past few days, I tried to accomplish the following task of analyzing a set of objects, and the solutions I came up with rely heavily on memory (in some cases they get OutOfMemory exceptions) or take an incredibly long time to process. Now I think it is a good idea to post it here, as I have no ideas. I will talk in detail about the problem and provide the logic that I have followed so far.
Scenario:
First, we have an object, which we will call Individual , which contains the following properties:
- date of
- Longitude Latitude
Secondly, we have another object, which we will call Group , the definition of which is: A set of persons who, together, meet the following conditions:
All people in the set have a date that within each other does not exceed 10 days. This means that all Persons, when compared with each other, do not differ from each other within 10 days.
The distance between each object is less than Y meters.
There can be N> 1 people in a group if each of the individuals corresponds to the conditions within each other. All people are stored in a database. All groups will also be stored in the database.
A task:
Now consider a new person. The system should check if there is a new person:
Notes:
A new user can be in several existing groups or can create several new groups.
Subgroups of individuals are not allowed, for example, if we have a group containing the person {A, B, C}, there cannot be a group containing {A, B}, {A, C} or {B, C}.
Solution (limited by processing time and memory)
First, we filter the database with all Individuals that match the initial conditions. This will lead to the FilteredIndividuals enumeration containing all the Personalities that, as we know, form a group (of 2) with a new one.
In short, a Powerset is a collection containing all possible subsets of a specific collection. For example, the set of permissions {A, B, C} would be: {[empty], A, B, C, AB, AC, BC, ABC}
Note. The power set will produce a new set with 2 ^ N combinations, where N is the length of the original set.
The idea of ββusing power plants is as follows:
First we create a set of FilteredIndividuals list options. This will give all possible combinations of groups in the FilteredIndividuals list. For the purposes of analysis and by definition, we can omit all combinations in which there are less than two people.
We check whether each combination of conditions matches each other. If they coincide, this means that all Persons in this combination form a group with a new Individual. Then, to avoid SubGroups, we can eliminate all subsets containing combinations of the Proven combination. I do this by creating a power circuit for a proven combination and then removing the new power set from the original one.
At this stage, we have a list of sets that match the conditions for creating the group.
Before formally creating a group, I compare the database with other existing groups that contain the same elements as the new sets: If I find a match, I delete the newly created set and add a new person to the old group. If I do not find a match, then these are new groups. So I add a new Individual to the set and finally create new groups.
This solution works well when the FilteredIndividuals enumeration has less than 52 individuals. After that, memory exceptions are thrown (I know that this is due to the maximum size allowed for data types, but an increment of this size does not help with very large sets. For your consideration, the maximum number of individuals that meet the conditions I'm found: 345).
Note. I have access to the definition of both objects. If there is a new property that would reduce processing time, we can add it.
I use the .NET framework with C #, but if the language is what needs to be changed, we can accept this if we can later convert the results to an object that our main system understands.