I am currently working on a poker hand history parser as part of my bachelor project. I did some research in a couple of days and came across some good parser generators (of which I chose JavaCC, since the project itself will be encoded in Java).
Although the grammar of the hand history is fairly simple and straightforward, the problem of ambiguity arises due to the valid character set in the player’s alias.
Suppose we have a string in the following format:
Seat 5: myNickname (1500 in chips)
The myNickname token can contain any character, as well as spaces. This means that both (1500 in chip and Seat 5: are valid nicknames, which ultimately leads to an ambiguity problem. There are no restrictions on the player’s alias except for the length (4-12 characters).
I need to parse and save some data along with the player’s alias (for example, the position of the place and the number of chips in this particular case), so my question is: what are my options here?
I would like to do this using JavaCC, something like this:
SeatRecord seat() : { Token seatPos, nickname, chipStack; } { "Seat" seatPos=<INTEGER> ":" nickname=<NICKNAME> "(" chipStack=<INTEGER> "in chips)" { return new SeatRecord(seatPos.image, nickname.image, chipStack.image); } }
What does not work now (due to the indicated problem)
I also searched for GLR parsers (which apparently handle ambiguous grammars), but they mostly seem abandoned or poorly documented, with the exception of Bison, but that doesn't support the GLR parser for Java and can be too complicated (aside from the problem ambiguities, the grammar itself is quite simple, as I mentioned)
Or should I stick with the tokenization of the string itself and use indexOf(), lastIndexOf() , etc. to analyze the data I need? I would go for it only if it was the only option left, since it would be too ugly IMHO, and I could skip some cases (which would lead to an incorrect analysis).