Extract frequency data from a sorted list of phrases

After you included documentation / past questions in list operations, I came to the conclusion that many of them are related to numbers, while I work with a lot of text.

I have a sorted list of common three-word phrases (trigrams) that appear in a large amount of textual information created using the Mathematica Partition [], Tally [], and Sort [] commands. An example of the type of data I'm working on (I have hundreds of these files):

{{{wa, wa, wa}, 66}, {{i, love, you}, 62}, {{la, la, la}, 50}, {{value, of, life}, 42}, { on, come, on}, 40}, {{come, on, come}, 40}, {{yeah, yeah, yeah}, 38}, {{no, no, no}, 36}, {{we, re, gonna}, 36}, {{you, love, with}, 35}, {{in, love, with}, 32}, {{, the way, you}, 30}, {{i, want, to }, 30}, {{back, to, me}, 29}, <38211 →, {{,, xke}, 1}}

I hope to find this file so that if the input is "meaning out of life", it will return "42". It seems to me that I should lose sight of something obvious, but after I was busy, I hit a brick wall. Mathematica is a heavy number in its documentation, which ... well, no wonder.

+5
source share
4 answers

, Mathematica , , , -, () . ( , ):

trigrams = {{{"wa", "wa", "wa"}, 66}, {{"i", "love", "you"}, 62}, 
 {{"la", "la", "la"}, 50}, {{"meaning", "of", "life"}, 42}, 
 {{"on", "come", "on"}, 40}, {{"come", "on", "come"}, 40}, 
 {{"yeah", "yeah", "yeah"}, 38}, {{"no", "no", "no"}, 36}, 
 {{"we", "re", "gonna"}, 36}, {{"you", "love", "me"}, 35}, 
 {{"in", "love", "with"}, 32}, {{"the", "way", "you"}, 30}, 
 {{"i", "want", "to"}, 30}, {{"back", "to", "me"}, 29}, 
 {{"of", "an", "xke"}, 1}};

-:

Clear[trigramHash];
(trigramHash[Sequence @@ #1] = #2) & @@@ trigrams;

In[16]:= trigramHash["meaning","of","life"]
Out[16]= 42

, , .

Mathematica, , .mx Mathematica. , . :

In[20]:= DumpSave["C:\\Temp\\trigrams.mx",trigramHash]
Out[20]= {trigramHash}

In[21]:= Quit[]

In[1]:= Get["C:\\Temp\\trigrams.mx"]
In[2]:= trigramHash["meaning","of","life"]
Out[2]= 42

DumpSave .mx. , - Mathematica, , ( SubValues - ), .mx . , , , Mathematica ( , ).

+6

, , , , , .

In[1]:= trigrams = {{{"wa", "wa", "wa"}, 66}, {{"i", "love", "you"}, 
    62}, {{"la", "la", "la"}, 50}, {{"meaning", "of", "life"}, 
    42}, {{"on", "come", "on"}, 40}, {{"come", "on", "come"}, 
    40}, {{"yeah", "yeah", "yeah"}, 38}, {{"no", "no", "no"}, 
    36}, {{"we", "re", "gonna"}, 36}, {{"you", "love", "me"}, 
    35}, {{"in", "love", "with"}, 32}, {{"the", "way", "you"}, 
    30}, {{"i", "want", "to"}, 30}, {{"back", "to", "me"}, 
    29}, {{"of", "an", "xke"}, 1}};

In[2]:= trigramRules = Rule @@@ trigrams;

( ) ,

.
In[3]:= trigram[seq__String] := {seq} /. trigramRules

In[4]:= trigram["meaning", "of", "life"]

Out[4]= 42

, , Dispatch. , , define trigramRules,

trigramRules = Dispatch[Rule @@@ trigrams]
+5

.

In[262]:= str = "meaning, of, life"; ReadList[
 StringToStream[str], Word, WordSeparators -> {",", " "}]

Out[262]= {"meaning", "of", "life"}

, 42 ( , ...)

--- ---

"" , . , . () .

--- ---

--- 2 ---

, ReadList. , , . , , , , .

str = "meaning, of, life";
commaposns = StringPosition[str, ", "];
substrposns = 
  Partition[
   Join[{1}, 
    Riffle[commaposns[[All, 1]] - 1, commaposns[[All, 2]] + 1], {-1}],
    2];
substrs = Map[StringTake[str, #] &, substrposns]

Out[259]= {"meaning", "of", "life"}

Bottom line (almost literally): I can find intricate approaches, like everyone else, and better than most.

--- end of editing ---

Daniel Lichtblau

+5
source

Quite an old question .. but now we have Association

lookup = Association[Rule @@@ trigrams];
lookup[{"come", "on", "come"}]

40

or even

lookup = Association[
   Rule[StringJoin@Riffle[#1, " "], #2] & @@@ trigrams]

lookup["meaning of life"]

42

+1
source

All Articles