You know the functionality in Excel when you enter 3 rows with a specific pattern and drag the column all the way down. Excel is trying to continue the template for you.
for example
type of...
Excel will continue it with:
The same thing works for some other patterns, such as dates, etc.
I am trying to accomplish a similar thing, but I also want to handle more exceptional cases, such as:
- test-blue-somethingelse
- test-yellow-somethingelse
- test-red-somethingelse
Now, based on these entries, I want to say that the template:
Continuing with [DYNAMIC] with other colors is another matter, it doesnβt really matter to me now. What interests me most is the discovery of the [DYNAMIC] parts in the template.
I need to detect this from a large number of entries in the pool. Suppose you get 10,000 lines with these types of patterns and want to group these lines based on similarities, and also determine how much of the text is constantly changing ([DYNAMIC]).
Document classification can be useful in this scenario, but I'm not sure where to start.
UPDATE:
I forgot to mention that it is also possible to have multiple [DYNAMIC] templates.
For example:
- test_ [dynamic] 12 [Dynamic2]
I do not think this is important, but I plan to implement it in .NET, but any hint of using algorithms will be very useful.
algorithm design-patterns similarity fuzzy
dr. evil
source share