Here's a complete solution that does not include third-party libraries such as ANTLR or JavaCC. Please note that although it is extensible, its capabilities are still limited. If you want to create much more complex expressions, it is better to use a grammar generator.
First write a tokenizer that splits the input string into tokens. Here are the types of tokens:
private static enum TokenType { WHITESPACE, AND, OR, EQUALS, LEFT_PAREN, RIGHT_PAREN, IDENTIFIER, LITERAL, EOF }
Token class itself:
private static class Token { final TokenType type; final int start; // start position in input (for error reporting) final String data; // payload public Token(TokenType type, int start, String data) { this.type = type; this.start = start; this.data = data; } @Override public String toString() { return type + "[" + data + "]"; } }
To simplify tokenization, create a regular expression that reads the following token from the input line:
private static final Pattern TOKENS = Pattern.compile("(\\s+)|(AND)|(OR)|(=)|(\\()|(\\))|(\\w+)|\'([^\']+)\'");
Note that there are many groups in it, one group for TokenType in the same order (first comes WHITESPACE , then AND , etc.). Finally, the tokenizer method:
private static TokenStream tokenize(String input) throws ParseException { Matcher matcher = TOKENS.matcher(input); List<Token> tokens = new ArrayList<>(); int offset = 0; TokenType[] types = TokenType.values(); while (offset != input.length()) { if (!matcher.find() || matcher.start() != offset) { throw new ParseException("Unexpected token at " + offset, offset); } for (int i = 0; i < types.length; i++) { if (matcher.group(i + 1) != null) { if (types[i] != TokenType.WHITESPACE) tokens.add(new Token(types[i], offset, matcher.group(i + 1))); break; } } offset = matcher.end(); } tokens.add(new Token(TokenType.EOF, input.length(), "")); return new TokenStream(tokens); }
I am using java.text.ParseException . Here we apply the Matcher regex to the end of the input. If it does not match the current position, we throw an exception. Otherwise, we search for the matching group and create a token from it, ignoring the WHITESPACE tokens. Finally, we add an EOF token that indicates the end of the input. The result is returned as a special TokenStream object. Here is the TokenStream class that will help us do the parsing:
private static class TokenStream { final List<Token> tokens; int offset = 0; public TokenStream(List<Token> tokens) { this.tokens = tokens; } // consume next token of given type (throw exception if type differs) public Token consume(TokenType type) throws ParseException { Token token = tokens.get(offset++); if (token.type != type) { throw new ParseException("Unexpected token at " + token.start + ": " + token + " (was looking for " + type + ")", token.start); } return token; } // consume token of given type (return null and don't advance if type differs) public Token consumeIf(TokenType type) { Token token = tokens.get(offset); if (token.type == type) { offset++; return token; } return null; } @Override public String toString() { return tokens.toString(); } }
So, we have a tokenizer, hura. You can check it right now using System.out.println(tokenize("Acct1 = 'Y' AND (Acct2 = 'N' OR Acct3 = 'N')"));
Now write a parser that will create a tree view of our expression. First, the Expr interface for all tree nodes:
public interface Expr { public boolean evaluate(Map<String, String> data); }
Its the only method used to evaluate an expression for a given dataset and return true if the dataset matches.
The most basic expression is EqualsExpr , which is similar to Acct1 = 'Y' or 'Y' = Acct1 :
private static class EqualsExpr implements Expr { private final String identifier, literal; public EqualsExpr(TokenStream stream) throws ParseException { Token token = stream.consumeIf(TokenType.IDENTIFIER); if(token != null) { this.identifier = token.data; stream.consume(TokenType.EQUALS); this.literal = stream.consume(TokenType.LITERAL).data; } else { this.literal = stream.consume(TokenType.LITERAL).data; stream.consume(TokenType.EQUALS); this.identifier = stream.consume(TokenType.IDENTIFIER).data; } } @Override public String toString() { return identifier+"='"+literal+"'"; } @Override public boolean evaluate(Map<String, String> data) { return literal.equals(data.get(identifier)); } }
The toString() method is for information only; it can be deleted.
Next, we will define the SubExpr class, which is either EqualsExpr or something more complicated in parentheses (if we see parentheses):
private static class SubExpr implements Expr { private final Expr child; public SubExpr(TokenStream stream) throws ParseException { if(stream.consumeIf(TokenType.LEFT_PAREN) != null) { child = new OrExpr(stream); stream.consume(TokenType.RIGHT_PAREN); } else { child = new EqualsExpr(stream); } } @Override public String toString() { return "("+child+")"; } @Override public boolean evaluate(Map<String, String> data) { return child.evaluate(data); } }
Next is AndExpr , which is a collection of SubExpr expressions joined by an AND operator:
private static class AndExpr implements Expr { private final List<Expr> children = new ArrayList<>(); public AndExpr(TokenStream stream) throws ParseException { do { children.add(new SubExpr(stream)); } while(stream.consumeIf(TokenType.AND) != null); } @Override public String toString() { return children.stream().map(Object::toString).collect(Collectors.joining(" AND ")); } @Override public boolean evaluate(Map<String, String> data) { for(Expr child : children) { if(!child.evaluate(data)) return false; } return true; } }
I use the Java-8 Stream API in toString for short. If you cannot use Java-8, you can rewrite it with a for loop or remove toString completely.
Finally, we define OrExpr , which is a collection of AndExpr associated with OR (usually OR has a lower priority than AND ). It is very similar to AndExpr :
private static class OrExpr implements Expr { private final List<Expr> children = new ArrayList<>(); public OrExpr(TokenStream stream) throws ParseException { do { children.add(new AndExpr(stream)); } while(stream.consumeIf(TokenType.OR) != null); } @Override public String toString() { return children.stream().map(Object::toString).collect(Collectors.joining(" OR ")); } @Override public boolean evaluate(Map<String, String> data) { for(Expr child : children) { if(child.evaluate(data)) return true; } return false; } }
And the last parse method:
public static Expr parse(TokenStream stream) throws ParseException { OrExpr expr = new OrExpr(stream); stream.consume(TokenType.EOF);
So, you can parse your expressions to get Expr objects, and then evaluate them by the lines of your CSV file. I assume that you can parse a CSV string into Map<String, String> . Here is a usage example:
Map<String, String> data = new HashMap<>(); data.put("Acct1", "Y"); data.put("Acct2", "N"); data.put("Acct3", "Y"); data.put("Acct4", "N"); Expr expr = parse(tokenize("Acct1 = 'Y' AND (Acct2 = 'Y' OR Acct3 = 'Y')")); System.out.println(expr.evaluate(data));