Any way to improve this regex?

I’m kind of new to regular expressions, so I would really appreciate a little feedback on this. It will be heavily used on my site, so any strange edge cases can completely destroy the chaos. The idea is to enter the quantity of an ingredient in a recipe in whole units or fractions. Due to my autocomplete mechanism, a number is enough (since it has a pop-up drop-down menu). These lines are valid:

1 1/2 1 1/2 4 cups 4 1/2 cups 10 3/4 cups sliced 

The numeric part of the string should be its own group, so I can parse it with the fractions parser. Everything after the numerical part should be the second group. First I tried this:

 ^\s*(\d+|\d+\/\d+|\d+\s*\d+\/\d+)\s*(.*)$ 

It almost works, but "1 1/2 cups" will be parsed as (1) (1/2 cup) instead of (1 1/2) and (cup). Scratching my head a little, I decided that this was due to the ordering of my OR offer. (1) satisfies the condition \ d + and (. *), Satisfies the rest. So I changed this to:

 ^\s*(\d+\/\d+|\d+\s*\d+\/\d+|\d+)\s*([az].*)$ 

It almost works, but allows for oddities such as “1 1/2/4 cups” or “1/2 3 cups”. So I decided to force the letter as the first character after the actual numeric expression:

 ^\s*(\d+\/\d+|\d+\s*\d+\/\d+|\d+)\s*($|[az].*)$ 

Note. I run this in case insensitive mode. Here are my questions:

  • Can expression be improved? I don't like the OR list for a number, fraction, compound fraction, but I couldn't think of a way to resolve integers, fractions, or compound fractions.

  • It would be nice if I could return a group for each word after the numeric component. For example, a group for (10 3/4), a group for (cups) and a group for (chopped). After the words can be any number of words. Is it possible?

Thanks!

+6
javascript regex
source share
2 answers

Well, it seems to me that you don't need OR conditions at all (but see below).

For a numeric bit, you can leave with:

 \d+(\s+\d+/\d+) 

which would handle all these fractional values.

I would still keep your decimal separator with the OR clause, as it can complicate things. Therefore, I think you could leave with something like:

 ^\s*((\d+\s)?(\d+/\d+)?|\d+(\.\d+)?)\s*([az].*)?$ | | | | | | | | | +--- start of alpha section. | | | +------ optional white space. | | +------------------ decimal (nn[.nn]) | +------------------------------------- fractional ([nn ][nn/nn]) +----------------------------------------- optional starting space. 

although this allows an empty fractional sum, so you might be better off with what you have (integer, fractional, and decimal in separate OR sentences).

I prefer the ([az].*)?$ Construct to ($|[az].*)$ On my own, but it may just be a disgust to my past to have multiple end-of-line markers in my RE :-)


But to be honest, I think you are probably trying to lure a fly with a thermonuclear warhead here.

You really need to limit your input. I saw recipes calling a pinch of salt and a handful of sultanas . I personally believe that you can limit yourself to what you allow. I would have a free form field for the quantity and a drop-down list for the type of food (in fact, I would probably just allow a free form for the party if I did not offer the possibility of finding recipes based on what's in the fridge).

+3
source share

I believe this regex should do what you want:

 /^\s*(\d+ \d+\/\d+|\d+\/\d+|\d+)\s*(.*)/ 

To match specific words, you just have to do a space separation after parsing. There are some things you don't want to do with regular expressions;)

+1
source share

All Articles