What is the correct regular expression used for numbers and strings?

I am trying to create a simple IDE and colorize my JTextPane based

  • Lines ("")
  • Comments (// and / * * /)
  • Keywords (public, int ...)
  • Numbers (integers such as 69 and floating as 1.5)

The way I break the source code by overwriting the insertString and removeString methods inside StyledDocument.

After much testing, I filled out the comments and keywords.

Q1: Regarding coloring the strings, I am breaking the strings based on this regular expression:

Pattern strings = Pattern.compile("\"[^\"]*\""); Matcher matcherS = strings.matcher(text); while (matcherS.find()) { setCharacterAttributes(matcherS.start(), matcherS.end() - matcherS.start(), red, false); } 

This works 99% of the time, unless my line contains a certain type of line that has the code "\ inside code". This will ruin the entire color coding. Can someone fix my regex to fix my mistake?

Q2: Regarding integers and decimal coloring, numbers are determined based on this regular expression:

 Pattern numbers = Pattern.compile("\\d+"); Matcher matcherN = numbers.matcher(text); while (matcherN.find()) { setCharacterAttributes(matcherN.start(), matcherN.end() - matcherN.start(), magenta, false); } 

Using the regular expression "\ d +", I only process integers, not floats. Also, integers that are part of another line are matched, which is not what I want in the IDE. Which correct expression is used for integer color coding?

Below is a screenshot: enter image description here

Thanks for any help in advance!

+6
source share
5 answers

For strings, this is probably the fastest regular expression -

"\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""

formatted:

  " [^"\\]* (?: \\ . [^"\\]* )* " 

For integers and decimal numbers, the only reliable expression I know is this -

"(?:\\d+(?:\\.\\d*)?|\\.\\d+)"

Formatted by:

  (?: \d+ (?: \. \d* )? | \. \d+ ) 

As a note, if you do each regardless of the beginning of the line, you could possibly overlap the glare.

+3
source

Try:

  • \\b\\d+(\\.\\d+)?\\b for int, float and double,
  • "(?<=[{(,=\\s+]+)".+?"(?=[,;)+ }]+)" for strings,
+2
source
  • Match a line that ignores \ "situations

    ".*?(?<!\\)"

The above will begin the match after he sees, " and he will continue to match on something until he reaches the next, " which is not preceded by \ . This is achieved using the lookbehind function, which is very well described at http://www.regular-expressions.info/lookaround.html

  1. Match all numbers with and without decimal points

(\d+)(\.\d+)? will give you at least one digit followed by a period and any number of other digits in excess of 1.

  1. The question of matching numbers inside strings can be achieved in two ways:

    • a Modification above so that they exist with spaces on both sides \W(\d+)(\.\d+)?\W , which I think will not be satisfactory in mathematical situations (i.e. 10 + 10) or at the end of the expression (i.e. 10;).

    • b Make it a priority. If after sorting the string will be checked, then this part of the string will first be colored in pink, and then immediately overwritten in red. Line color takes precedence.

+1
source

For Integer, go to

 (?<!(\\^|\\d|\\.))[+-]?(\\d+(\\.\\d+)?)(?!(x|\\d|\\.)) 
+1
source

R1: I believe there is no regular expression based response for unescaped characters " in the middle of the current line. You will need to actively process the text to eliminate or bypass false positives for characters that are not intended to match, based on your specific syntax rules ( which you did not specify).

However: If you just want to ignore the escaped, \" as java does, then I believe that you can just include a couple of escape quotes in the center as a group, and the greedy * will take care of the rest: \"((\\\\\")|[^\"])*\"

R2: I believe that the following regular expression will work to find both integers and fractions: \\d+(\.\\d+)?

You can expand it to find other kinds of numbers. For example, \\d+([\./]\\d+)? , will additionally correspond to digits like "1/4".

0
source

All Articles