How to determine floating point number using regex

What is a good regular expression for handling floating point numbers (e.g. Java Float)

The answer should be consistent with the following objectives:

1) 1. 2) .2 3) 3.14 4) 5e6 5) 5e-6 6) 5E+6 7) 7.e8 8) 9.0E-10 9) .11e12 

So he should

  • ignore previous characters
  • requires the first character to the left of the decimal point to be non-zero
  • allow 0 or more digits on either side of the decimal point
  • allow number without decimal point
  • allow scientific notation
  • allow capital or lowercase letters 'e'
  • allow positive or negative indicators

For those who are wondering, yes, this is a homework problem. We got it as a task in my CS class for graduates in compilers. I have already included my answer for the class and will post it as the answer to this question.

[Afterword] My decision did not receive full credit because it did not process more than 1 digit to the left of the decimal. The assignment mentioned accessing Java floats, although none of the examples had more than 1 digit to the left of the decimal. I will post the accepted answer in his own post.

+12
floating-point regex
Feb 19 '10 at 2:48
source share
7 answers

[This is a response from the professor]

Definition:

N = [1-9]
D = 0 | N
E = [eE] [+ -]? D +
L = 0 | (ND *)

Then floating point numbers can be matched with:

((L. D * |. D +) E?) | (LE)

It was also acceptable to use D + rather than L, and add [+ -] ?.

A common mistake was to write D *. D *, but this may be the same as ".".

[Change]
Someone asked about a leading sign; I should have asked him why this was ruled out, but it didn’t work out. Since this was part of a grammar lecture, I assume that either it made the problem simpler (unlikely), or there was a small detail in the parsing where you divide the set of problems, so that the floating point value, regardless of the sign, is equal to focus ( perhaps).

If you parse an expression, for example.

-5.04e-10 + 3.14159E10

the sign of a floating point value is part of the operation applied to the value, and not an attribute of the number itself. In other words,

subtract (5.04e-10)
add (3.14159E10)

to form the result of the expression. Although I'm sure mathematicians can argue about this, remember that this was from a parsing lecture.

+7
Mar 24 '10 at 17:04
source share

Just make both the decimal point and the E-then-exponent part optional:

 [1-9][0-9]*\.?[0-9]*([Ee][+-]?[0-9]+)? 

I do not understand why you do not want the presenter [+-]? could capture a possible sign, but whatever! -)

Edit : in fact, there can be no digits to the left of the decimal point (in this case, I believe that there should be a decimal point and 1 + digits after it!), Therefore a vertical panel (alternative):

 (([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)? 
+23
Feb 19 '10 at 2:53 on
source share
+4
Feb 19 '10 at 4:27
source share

Here is what I have included.

 (([1-9]+\.[0-9]*)|([1-9]*\.[0-9]+)|([1-9]+))([eE][-+]?[0-9]+)? 

To simplify the discussion, I will name the sections

 ( ([1-9]+ \. [0-9]* ) | ( [1-9]* \. [0-9]+ ) | ([1-9]+)) ( [eE] [-+]? [0-9]+ )? -------------------------------------------------------- ----------------------  AB 

A: Meets all e / E parameters
B: corresponds to scientific notation

Destruction A we get three parts

  ( ([1-9]+ \. [0-9]* ) | ( [1-9]* \. [0-9]+ ) | ([1-9]+) ) ----------1---------- ---------2---------- ---3---- 

Part 1: allows 1 or more digits from 1 to 9, decimal, 0 or more digits after decimal (target 1)
Part 2: Allows 0 or more digits from 1 to 9, decimal, 1 or more digits after decimal (target 2)
Part 3: Allows 1 or more digits from 1 to 9 without decimal (see No. 4 in the list of goals)




Breaking B, we get 4 main parts

  ( [eE] [-+]? [0-9]+ )? ..--1- --2-- --3--- -4- .. 

Part 1: requires the entry of upper or lower case "e" for scientific notation (for example, goals 8 and 9)
Part 2: allows an optional positive or negative sign for the exponent (e.g. goals 4, 5 and 6)
Part 3: allows 1 or more digits for the exhibitor (target 8)
Part 4: allows scientific notation to be optional as a group (goal 3)

+2
Feb 19 '10 at 2:57
source share
 '([-+])?\d*(\.)?\d+(([eE]([-+])?)?\d+)?' 

This is a regular expression that I came up with when trying to solve this problem in Matlab. In fact, it will not correctly determine numbers like (1.), but some additional changes can solve the problem ... well, maybe the following will fix it:

 '([-+])?(\d+(\.)?\d*|\d*(\.)?\d+)(([eE]([-+])?)?\d+)?' 
+1
Nov 27 '13 at 14:43
source share

@Kelly S. French: there is no sign, because it is added by the unary minus (negation) in the parser, so it is not necessary to detect it as part of a float.

+1
Apr 15 '14 at 20:03
source share

@Kelly S. French, this regex matches all your test cases.

 ^[+-]?(\d+\.\d+|\d+\.|\.\d+|\d+)([eE][+-]?\d+)?$ 

Source: perldoc perlretut

+1
05 Oct '17 at 10:41 on
source share



All Articles