Creating a regular expression to parse a file - Java

I have a log file that I wanted to parse in the following format:

225:org.powertac.common.Competition::0::new::game-0 287:org.powertac.common.Competition::0::withSimulationBaseTime::1255132800000 288:org.powertac.common.Competition::0::withTimezoneOffset::-6 288:org.powertac.common.Competition::0::withLatitude::45 289:org.powertac.common.Competition::0::withBootstrapTimeslotCount::336 289:org.powertac.common.Competition::0::withBootstrapDiscardedTimeslots::24 290:org.powertac.common.Competition::0::withMinimumTimeslotCount::1400 290:org.powertac.common.Competition::0::withExpectedTimeslotCount::1440 291:org.powertac.common.Competition::0::withTimeslotLength::60 291:org.powertac.common.Competition::0::withSimulationRate::720 292:org.powertac.common.Competition::0::withTimeslotsOpen::24 292:org.powertac.common.Competition::0::withDeactivateTimeslotsAhead::1 300:org.powertac.du.DefaultBrokerService$LocalBroker::1::new::default broker 300:org.powertac.du.DefaultBrokerService$LocalBroker::1::setLocal::true 2074:org.powertac.common.RandomSeed::2::init::CompetitionControlService::0::game-setup::5354386935242895562 2157:org.powertac.common.TimeService::null::setCurrentTime::2009-10-10T00:00:00.000Z 2197:org.powertac.common.RandomSeed::3::init::AccountingService::0::interest::-8975848432442556652 2206:org.powertac.common.RandomSeed::4::init::TariffMarket::0::fees::-6239716112490883981 2213:org.powertac.common.msg.BrokerAccept::null::new::1 2214:org.powertac.common.msg.BrokerAccept::null::new::1::null 2216:org.powertac.common.RandomSeed::5::init::org.powertac.du.DefaultBrokerService::0::pricing::8741252857248937781 2226:org.powertac.common.TariffSpecification::6::new::1::CONSUMPTION 2231:org.powertac.common.Rate::7::new 2231:org.powertac.common.Rate::7::withValue::-0.5 2232:org.powertac.common.Rate::7::setTariffId::6 

the template is as follows: for a new object:

 <id>:<classname>::<order_of_execution>::<new>::<args> 

to call the method:

  <id>:<classname>::<order_of_execution>::<method_name>::<args> 

for inner class:

  <id>:<classname$innerclass>::<order_of_execution>::<method_name or new>::<args> 

to call init :

  <id>:<classname>::<order_of_execution>::<init>::<args> 

I need a regular expression that handles all cases, and I could get each value as shown in cases. If I want to create a new object, I would use the Reflection API in Java . So for example:

 2231:org.powertac.common.Rate::7::new 

will be parsed on "2231", "org.powertac.common.Rate", "7", "new", args = {}. How could I come up with such a regular expression?

+4
source share
3 answers

Use Matcher with capture groups:

 String s = "225:org.powertac.common.Competition::0::new::game-0"; Pattern p = Pattern.compile("([^:]+):([^:]+)::([\\d]+)::([^:]+)::(.+)"); Matcher m = p.matcher(s); if (m.find()) { String id = m.group(1); String className = m.group(2); int orderOfExecution = Integer.valueOf(m.group(3)); String methodNameOrNew = m.group(4); String[] arguments = m.group(5).split("::"); } 

Or is it easier using java.util.Scanner , with a separator set to ::? :

 Scanner scanner = new Scanner(s); scanner.useDelimiter("::?"); int id = scanner.nextInt(); String className = scanner.next(); int orderOfExecution = scanner.nextInt(); String methodNameOrNew = scanner.next(); scanner.useDelimiter("$").skip("::"); String[] arguments = scanner.next().split("::"); 
+2
source

Do not try to drag all this into one regular expression. Make one regex expression for each pattern and for each line match it to each regex until you find the matching pattern. Then you can analyze accordingly.

pseudo code:

 for line in file: if re.match(patNew, line): parseNew(line) elif re.match(patMethod, line): parseMethod(line) ... 

The regular expression matching <id>:<classname>::<order_of_execution>::<new>::<args> would look something like this:

 ([0-9]+):(.*?)::([0-9]+)::(new)(?:::(.*))? 
+1
source

Since values ​​are separated by colons and cannot contain colons, there is no need for escaping or quoting, so all you need is simple

 (.*):(.*)::(.*)::(.*)::(.*) 

If arguments should be optional, use

 (.*):(.*)::(.*)::([^:]*)(?:::(.*))? 

Values ​​are in groups 1 through 5. For example, to find out if logging is a constructor call, check if group 4 is equal to "new."

-1
source

All Articles