Efficiently extract numbers from java string (already tested guava and regular expressions)

Trying to efficiently extract some numbers from a string and tried

  • java.util.regex.Matcher
  • com.google.common.base.Splitter

Results:

  • through regex: 24417 ms
  • via Google Splitter: 17730 ms

Is there an even faster way you can recommend?

I know similar questions asked before, for example, How to extract multiple integers from String in Java? but my emphasis is on making it fast (but supported / simple), as it happens a lot.


EDIT: Here are my final results that are related to those from Andrea Ligios below:

  • Regular Expression (without brackets): 18857
  • Google delimiter (without superflous trimResults () method): 15329
  • Martijn Courteaux answer below: 4073

import org.junit.Test; import com.google.common.base.CharMatcher; import com.google.common.base.Splitter; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import java.util.regex.Matcher; import java.util.regex.Pattern; public class Sample { final static int COUNT = 50000000; public static final String INPUT = "FOO-1-9-BAR1"; // I want 1, 9, 1 @Test public void extractNumbers() { long startTime = System.currentTimeMillis(); for (int i = 0; i < COUNT; i++) { // Output is list of 1, 9, 1 Demo.extractNumbersViaGoogleSplitter(INPUT); } System.out.println("Total execution time (ms) via Google Splitter: " + (System.currentTimeMillis() - startTime)); startTime = System.currentTimeMillis(); for (int i = 0; i < COUNT; i++) { // Output is list of 1, 9, 1 Demo.extractNumbersViaRegEx(INPUT); } System.out.println("Total execution time (ms) Regular Expression: " + (System.currentTimeMillis() - startTime)); } } class Demo { static List<Integer> extractNumbersViaGoogleSplitter(final String text) { Iterator<String> iter = Splitter.on(CharMatcher.JAVA_DIGIT.negate()).trimResults().omitEmptyStrings().split(text).iterator(); final List<Integer> result = new ArrayList<Integer>(); while (iter.hasNext()) { result.add(Integer.parseInt(iter.next())); } return result; } /** * Matches all the numbers in a string, as individual groups. eg * FOO-1-BAR1-1-12 matches 1,1,1,12. */ private static final Pattern NUMBERS = Pattern.compile("(\\d+)"); static List<Integer> extractNumbersViaRegEx(final String source) { final Matcher matcher = NUMBERS.matcher(source); final List<Integer> result = new ArrayList<Integer>(); if (matcher.find()) { do { result.add(Integer.parseInt(matcher.group(0))); } while (matcher.find()); return result; } return result; } } 
+4
source share
2 answers

This is a very fast algorithm:

 public List<Integer> extractIntegers(String input) { List<Integer> result = new ArrayList<Integer>(); int index = 0; int v = 0; int l = 0; while (index < input.length()) { char c = input.charAt(index); if (Character.isDigit(c)) { v *= 10; v += c - '0'; l++; } else if (l > 0) { result.add(v); l = 0; v = 0; } index++; } if (l > 0) { result.add(v); } return result; } 

This code took my car 3672 milliseconds, for runs "FOO-1-9-BAR1" and 50,000,000. I'm on the 2.3 GHz core.

+9
source

EDIT: for the sake of knowledge, I ran different solutions on the same (old) machine, with 5,000,000 iterations (one zero removed from the OP question), here are the results:

Total execution time (ms) via the Martijn Courteaux algorithm: 2562

Total execution time (ms) via Char comparison: 6891

Total lead time (ms) Regular expression (with parenthesis): 12937

Total lead time (ms) Regular expression (without brackets): 12297


This is about twice as fast as the regular expression:

  startTime = System.currentTimeMillis(); for (int i = 0; i < COUNT; i++) { // Output is list of 1, 9, 1 Demo.extractNumbersViaCharComparison(INPUT); } System.out.println("Total execution time (ms) via Char comparison: " + (System.currentTimeMillis() - startTime)); 

[...]

  static List<Integer> extractNumbersViaCharComparison(final String text) { final List<Integer> result = new ArrayList<Integer>(); char[] chars = text.toCharArray(); StringBuilder sB = new StringBuilder(); boolean previousWasDigit = false; for (int i = 0; i < chars.length; i++) { if (Character.isDigit(chars[i])){ previousWasDigit = true; sB.append(chars[i]); } else { if (previousWasDigit){ result.add(Integer.valueOf(sB.toString())); previousWasDigit = false; sB = new StringBuilder(); } } } if (previousWasDigit) result.add(Integer.valueOf(sB.toString())); return result; } 

By the way, another solution is much more elegant +1

+1
source

All Articles