It looks like clustering on top of an intelligent , more precisely Apriori matching algorithm. Something like that:
- Combine all the possible associations between actions, i.e., Bush β Prep Breakfast, Prep Breakfast β Eat Breakfast, ..., Bush β Prep Breakfast β Eat breakfast, etc. Each pair, triplet, four, etc. you can find in your data.
- Create a separate attribute from each such sequence. For better performance, add boost 2 for paired attributes, 3 for triplets, and so on.
- At this point, you should have an attribute vector with the corresponding boost vector. You can calculate the vector of functions for each user: set 1 * boost at each position in the vector if this sequence exists in the userβs actions and 0 otherwise). You will get a vector representation of each user.
- This vector uses the clustering algorithm that best suits your needs. Each class found is a group that you use.
Example:
Mark all actions as letters:
a - Brush
b - Breakfast with breakfast
c - Oriental breakfast
d - Take a bath ...
Your attributes will look like
a1: a-> b
a2: a-> c
a3: a-> d
...
a10: b-> a
a11: b-> c
a12: b-> d
...
a30: a-> b-> c-> d
a31: a-> b-> d-> c
...
In this case, the vectors of user functions will be:
attributes = a1, a2, a3, a4, ..., a10, a11, a12, ..., a30, a31, ... user1 = 1, 0, 0, 0, ..., 0, 1, 0, ..., 4, 0, ... user2 = 1, 0, 0, 0, ..., 0, 1, 0, ..., 4, 0, ... user3 = 0, 0, 0, 0, ..., 0, 0, 0, ..., 0, 0, ...
To compare 2 users, a certain distance measure is required. The simplest is the distance from the cosine , that is, simply the cosine value between two feature vectors. If 2 users have exactly the same sequence of actions, their similarity will be 1. If they have nothing in common, their similarity will be 0.
Using a distance measure, use a clustering algorithm (for example, k-means ) to create user groups.