First, I wanted to write a very simple natural language parser and related patterns. and I want to do it in JavaScript. I got a degree in artificial intelligence almost 20 years ago, and I remember the prologue, whisper, elise, recursion, phrases of nouns and verbs ... a little refresh, and I'll be fine.
A few days later I realized 2 things.
- I wasn’t really after NLP, just a tokenization proposal
- It will be much more complicated than I imagined.
I found several resources on the website, some for the site, some Python, etc., but they seem to work the opposite of what I, for example, they template and fill in the blanks, or you generate a model and then request it on natural language.
I want to be able to check something that the user enters, see if it matches a specific pattern and extract the corresponding bits. For example, here is a simple match tree:
var match = [ "&&", ["||", "my", "the" ], "%%item", [ "||", [ "&&", "isnt", "%%what" ], [ "&&", "is" [ "||", "broken?", ["&&", "not", "%%what"] ] ], [ "&&", [ "||", "doesnt", [ "&&", "does", "not" ] ], "%%what" ] ] ];
So I want this to match the things the user enters, for example:
- my computer does not work
- my keyboard is not working
- the latch is broken
- printer does not print
and return an array of key phrases that I wanted to extract from the above:
[ {"item": "computer", "what": "work"} ] [ {"item": "keyboard", "what": "working"} ] [ {"item": "latch"} ] [ {"item": "printer", "what": "printing"} ]
So I thought about traversing the tree, but then got stuck with JavaScript and how to do it. The simplest, but not the most elegant way would be to generate every possible line from a match and compare it with the input, this can also allow me to intercept items such as mobile phone .
So to my question:
Does anyone know of any good resources for doing what I need, or can help with recursion and tree traversal
edit
It turns out that this is a calculator for reverse Polish notation on steroids. I did a lot of hard work, and I have this that works pretty well, as long as the token matches exactly one word, but now I'm stuck again.
http://jsfiddle.net/54jCq/
Passing by reference seems to be really hard work. Now I need to expand part of the token to have multiple word tokens so that I can use the same notation as above, but my [mobile phone] is not [ringing]
Initially, I thought that I could just push the token branch out of the tree, and then again transfer the list of words to itself with the new tree containing the RPN function, and the rest of the tree (with the token disabled) until it matches (in which If all the words (if any) ) that you added are a token, and you are finished for this branch), or we are running out of words (in this case, all words are a token for this branch.
If you start by not picking a word, you can match 0-n words, but I just can't get it to work.
It seems so easy in my head :)
Can someone please look and see if they can decide how to do this.