Data structure for storing word associations

I try to implement prediction by analyzing sentences. Consider the following [rather boring] sentences

Call ABC
Call ABC again
Call DEF

I would like to have a data structure for the above sentences as follows:

Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)

In general Word: (Word_it_appears_with, Frequency), ....

Pay attention to the internal redundancy of this type of data. Obviously, if the frequency ABCis 2 at Call, the frequency Callis 2 at ABC. How to optimize this?

The idea is to use this data when entering a new offer. For example, if Callit was entered, it is easy to say from the data that ABCit will most likely be present in the proposal and offer it as the first sentence, and then again DEF.

, , .

+5
3

. , .

+1

:

Map<String, Map<String, Long>>
0

I would consider one of two options:

Option 1:

class Freq {
    String otherWord;
    int freq;
}

Multimap<String, Freq> mymap;

or maybe a table

Table<String, String, int>

Given the above Freq: you can do bi-directional mapping:

class Freq{
    String thisWord;
    int otherFreq;
    Freq otherWord;
}

This will allow you to update data pairs very quickly.

0
source

All Articles