I have many providers in the database, they all differ in some aspects of their data. I would like to make a data validation rule based on previous data.
Example:
A: XZ-4, XZ-23, XZ-217
B: 1276, 1899, 22711
C: 12-4, 12-75, 12
Purpose: if the user enters the string "XZ-217" for provider B, the algorithm should compare the previous data and say: this line is not similar to the previous data of provider B.
Is there any good way / tools to achieve such a comparison? The answer may be some general Perl algorithm or module.
Edit: Agree, “likeness” is hard to define. But I would like to catch an algorithm that could analyze the previous about 100 samples, and then compare the results of the analysis with the new data. The similarity can be based on length, on the use of characters / numbers, patterns for creating strings similar to the beginning / end / average, with some separators.
I believe that this is not an easy task, but, on the other hand, I believe that it is very widely used. Therefore, I hoped that there were already some hints.
source
share