I try to implement prediction by analyzing sentences. Consider the following [rather boring] sentences
Call ABC
Call ABC again
Call DEF
I would like to have a data structure for the above sentences as follows:
Call: (ABC, 2), (again, 1), (DEF, 1)
ABC: (Call, 2), (again, 1)
again: (Call, 1), (ABC, 1)
DEF: (Call, 1)
In general Word: (Word_it_appears_with, Frequency), ....
Pay attention to the internal redundancy of this type of data. Obviously, if the frequency ABCis 2 at Call, the frequency Callis 2 at ABC. How to optimize this?
The idea is to use this data when entering a new offer. For example, if Callit was entered, it is easy to say from the data that ABCit will most likely be present in the proposal and offer it as the first sentence, and then again DEF.
, , .