Need to parse a sentence using regex in java

I would like to create a regex for the following:

<action>::=Action(<entity><entity><Asset>) 

I would like to have tokens such as:

 Action( <entity> <entity> <Asset> ) 

the entity and the asset have <> around them, and the action is followed by "(". However, ")" is an independent token. I am using the following:

 ([a-zA-Z]+\\()|((<.*?>)|([a-zA-Z]*))|(\\))? 

but does not show ")" as a token? What am I doing wrong?

+4
source share
3 answers

Try this regex:

 ([a-zA-Z]*\\()|(<[a-zA-Z]*>)|(\\)) 
+1
source

This should work for your example:

 (\\w+\\()(<\\w+?>)(<\\w+?>)(<\\w+?>)(\\)) 

demo version of fiddle.re

0
source

Something is actually wrong with your regular expression, or at least it causes the expression to act in an unpredictable way (for me).

An expression can be decomposed as such:

 ([a-zA-Z]+\\()| (?# matches alphabetical characters and an opening round-bracket) ((<.*?>)| (?# non-greedily matches anything between brackets) ([a-zA-Z]*))| (?# 3rd pattern: may match an empty string) (\\))? (?# 4th pattern: optionally matches a closing round bracket) 

Since the operator | is never greedy, the third template is started (matching an empty string) before the 4th template you really need.

The proof of this is that the tokens that you actually get with your regular expression:

 '' '' '' 'Action(' '<entity>' '<entity>' '<Asset>' '' '' 

So you probably want something like this:

 ([a-zA-Z]+\\()| (?# matches alphabetical characters and an opening round-bracket) (<.*?>)| (?# non-greedily matches anything between brackets) (\\)) (?# matches a closing round bracket) 

Please note that I deleted the statement ? from the 4th template, which was strangely set outside the brackets, and also removed an empty line.

0
source

All Articles