Java source code

For the system software development course, I am working on a complete assembler for the assembler developed by the instructor. I am currently working on a tokenizer. While doing some searches, I came across the Java StringTokenizer class ... but I can see that it is essentially deprecated. However, it is much easier to use the String.split method with regular expressions.

Is there any reason why I should avoid using it? Is there anything else possible in typical Java libraries that are well suited for this task that I don’t know about?

EDIT: More.

The reason I am considering String.split complicated is because my knowledge of regular expressions is about what I know about them. Although it would be useful for my general knowledge as a software developer to know them, I'm not sure I want to spend time right now, especially if there is a simpler alternative.

In terms of my use of the tokenizer: it will go through a text file containing assembler code and break it into tokens, passing the text and token to the parser. Separators include space (spaces, tabs, newlines), the start character of the comment '|' (which may occur on a separate line or after another text) and a comma to separate operands in the instruction.

I would write it more mathematically, but my knowledge of formal languages ​​is a little rusty.

EDIT 2: Question more clearly

I saw the documentation for the StringTokenizer class. This works well for my purposes, but its use is not recommended. Besides String.split , is there anything in the standard java libraries that would be useful?

+4
source share
5 answers

Don't be afraid of regex, get a regex editor like the following eclipse plugin,
http://brosinski.com/regex/update , and you can test expressions without compilation or even before writing your program.

If you need a sitelink, here are some very useful sites:

Although I think the above suggestion for using JavaCC sounds like the right approach.
Another option would be ANTLR .

Here is a post comparing ANTLR vs JavaCC experience.

+1
source

I believe the java.util.Scanner class has replaced StringTokenizer. The scanner allows you to process tokens one at a time, and String.split () splits the entire string (which can be large if you parse the source code file). Using Scanner, you can check each token, decide what action to take, and then cancel this token.

+3
source

From the documentation:

StringTokenizer is an inherited class that is retained for compatibility reasons, although its use is not recommended in new code. It is recommended that anyone looking for this functionality use the split method for String or the java.util.regex package.

The following example shows how the String.split method can be used to break a string into basic markers:

  String[] result = "this is a test".split("\\s"); for (int x=0; x<result.length; x++) System.out.println(result[x]); 

outputs the following result:

  this is a test 
+2
source

If you are building an assembler, I would use JavaCC to create a parser / compiler.

+2
source

Something is out of date when there is a better alternative, or these methods are dangerous in some situations. So the answer is: yes, you can use it, but there is a better way to achieve what you need.

Btw, what complicates the split process?

0
source

All Articles