We have a client who is looking for means to import and categorize large amounts of text data. This data should be classified, and it was suggested that the easiest way to do this would be to look at the description field and try to match the words contained there to find out if a category can be obtained for this particular entry.
It was believed that the best way to do this was to combine the words with keywords contained against each category, and if that was unsuccessful, then use some kind of synonym to see if it could be used instead. So, for example, if the word “car” was in a particular record, then a synonymous search could correspond to this word with the word “car”, which would be held against the category “car”.
Does anyone know about a web service or other dictionary search tools to find synonyms for a specific word? The project manager offered to buy a Google Enterprise Search license for this, but from what I can make out, he does not offer what these guys are looking for.
Any suggestions to get the client that they are looking for will be greatly appreciated.
Thanks! I will look in Wordnet.
Are you aware of any other types of text classification software products? I see there is some discussion of using Bayasian algorithms for this, but I do not see any real examples of this world.
source share