C # Tokenizer - saving delimiters

Question

C # Tokenizer - saving delimiters

I am working on porting code from JAVA to C #, and part of the JAVA code uses a tokenizer, but I understand that the resulting array from stringtokenizer in Java will also have delimiters (in this case +, -, /, *, (,)) as tokens. I tried using the C # Split () function, but it seems to eliminate the delimiters itself. In the end, it will parse the string and run it as a calculation. I did a lot of research and did not find references on this topic.

Does anyone know how to get the actual delimiters in the order in which they were found in the split array?

Code for tokens:

public CalcLexer(String s) { char[] seps = {'\t','\n','\r','+','-','*','/','(',')'}; tokens = s.Split(seps); advance(); }

Testing:

 static void Main(string[] args) { CalcLexer myCalc = new CalcLexer("24+3"); Console.ReadLine(); }

"24 + 3" will lead to the following conclusion: "24", "3" I am looking for a way out of "24", "+", "3"

In the nature of full disclosure, this project is part of the class assignment and uses the following complete source code:

http://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcParser.java.txt http://www.webber-labs.com/mpl/source%20code/Chapter%20Seventeen/CalcLexer .java.txt

+5

c # stringtokenizer

Ipster Jul 15 '09 at 21:52

source share

3 answers

This will exit:

 string s = "24+3"; string seps = @"(\t)|(\n)|(\+)|(-)|(\*)|(/)|(\()|(\))"; string[] tokens = System.Text.RegularExpressions.Regex.Split(s, seps); foreach (string token in tokens) Console.WriteLine(token);

+4

Shane Castle Jul 15 '09 at 22:08

source share

If you need a very flexible, powerful, reliable and extensible solution, you can use the C # port for ANTLR . There is some initial overhead (the link is setup information for VS2008) , which is likely to lead to an excess amount for such a tiny project. Here is an example of a calculator with variable support .

Probably superfluous for your class, but if you are interested in learning about the "real" solutions to this type of real-world problem, look at them. I even have a Visual Studio grammar package , or you can use ANTLRWorks separately.

+1

Sam Harwell Jul 15 '09 at 22:02

source share

Pavel Minaev · Accepted Answer · 2009-07-15 22:04

You can use Regex.Split with statements with zero width. For example, the following partition on +-*/ :

 Regex.Split(str, @"(?=[-+*/])|(?<=[-+*/])");

This effectively says: “A split at this point if any of -+*/ follows or precedes it. The assembled string itself will be of zero length, so you will not lose any part of the input string.

C # Tokenizer - saving delimiters

More articles: