User Recognition Algorithm

let's say you have a large change log in IRC, and you want to know which user is using multiple accounts. As input, you have time when the user connects to the server, as well as some kind of analysis of the text (word frequency, etc.), and as the output, you want the probability of the coincidence of two users to be “consistent”.

Can this be done using ANN? Are there any better algorithms for this task?

PS: using IP addresses is not a decision :)

+6
algorithm
source share
2 answers

The problem with using neural networks is that you need a reliable set of training data, that is, you need to have many examples of people using multiple accounts where you already know what they are doing. Also, if the people you are trying to identify have ever played a role-playing game, they will probably be able to make themselves look a little different if they want.

So, if people act just like them, and you have a pretty good dataset for training, then you have a chance. You should probably start with the methods used by forensic linguistics .

But I suspect that what you are likely to end up doing is identify people who are alike. Perhaps useful for a dating site; not so great for most other things. (For example, I would think that it would be a completely terrible way to try to find members of the Anonymous in other guises.)

+2
source share

This problem is known as “discovery of authorship” (or sometimes in a certain area of ​​“detection of plagiarism”). This can be done using a variety of statistical algorithms, of which neural networks are not the easiest.

Check out the Cavnar and Trenkle algorithm for classifying text. This can be done as a useful basic algorithm for this task. Implementations in different languages ​​are available on the Internet. You may want to turn it into a clustering algorithm instead of a classifier.

+2
source share

All Articles