Split string through regex in javascript?

I have this text structure:

1.6.1 Members................................................................ 12 1.6.2 Accessibility.......................................................... 13 1.6.3 Type parameters........................................................ 13 1.6.4 The T generic type aka <T>............................................. 13 

I need to create JS objects:

 { num:"1.6.1", txt:"Members" }, { num:"1.6.2", txt:"Accessibility" } ... 

It's not a problem.

The problem is that I want to extract values ​​through a Regex split through a positive lookahead:

Divide with the first time when you see the next character is a letter

enter image description here

What I tried:

 '1.6.1 Members........... 12'.split(/\s(?=(?:[\w\. ])+$)/i) 

This works fine:

 ["1.6.1", "Members...........", "12"] // I don't care about the 12. 

But if I have 2 words or more:

 '1.6.3 Type parameters................ 13'.split(/\s(?=(?:[\w\. ])+$)/i) 

Result:

["1.6.3", "Type", "parameters................", "13"] // again, I do not care 13.

Of course, I can join them, but I want the words to be together.

Question:

How can I improve regular expression NOT for word separation?

Desired Result:

["1.6.3", "Type parameters"]

or

["1.6.3", "Type parameters........"] // I will remove add-ons later

or

["1.6.3", "Type parameters........13"] // I will remove add-ons later

N.B.

I know that I can do split through "" or another simpler solution, but I am looking (for pure knowledge) for improvement for my solution that uses a positive perspective .

Full online example:

nb2:

The text may also contain a capital letter in the middle.

+7
javascript regex
source share
3 answers

You can use this regex:

 /^(\d+(?:\.\d+)*) (\w+(?: \w+)*)/gm 

And get the desired matches using mapped group # 1 and mapped group # 2.

Regex Online Version

Update: For String#split you can use this regex:

 / +(?=[AZ\d])/g 

Regex demo

Update 2:. If uppercase is also present in section names that require more complex regular expression:

 var re = /(\D +(?=[az]))| +(?=[az\d])/gmi; var str = '1.6.3 Type Foo Bar........................................................ 13'; var m = str.split( re ); console.log(m[0], ',', m.slice(1, -1).join(''), ',', m.pop() ); //=> 1.6.3 , Type Foo Bar........................................................ , 13 
+3
source share

EDIT:. Since you added 1.6.1 The .net 4.5 framework.... to the requirements, we can customize the answer to this question:

 ^([\d.]+) ((?:[^.]|\.(?!\.))+) 

And if you want to allow sequences up to three points in the header, as in 1.6.1 She said... Boo!........... , this is easy to set up there ( {3} quantifier):

 ^([\d.]+) ((?:[^.]|\.(?!\.{3}))+) 

Original:

 ^([\d.]+) ([^.]+) 

In regex demo, see Groups in the right pane.

To get groups 1 and 2, follow these steps:

 var myregex = /^([\d.]+) ((?:[^.]|\.(?!\.))+)/mg; var theMatchObject = myregex.exec(yourString); while (theMatchObject != null) { // the numbers: theMatchObject[1] // the title: theMatchObject[1] theMatchObject = myregex.exec(yourString); } 

OUTPUT

 Group 1 Group 2 1.6.1 Members 1.6.2 Accessibility 1.6.3 Type parameters 1.6.4 The T generic type aka <T>** 1.6.1 The .net 4.5 framework 

Explanation

  • ^ claims that we are the beginning of the line
  • The brackets in ([\d.]+) Write numbers and dots into group 1
  • The brackets in ((?:[^.]|\.(?!\.))+) Capture in group 2 ...
  • [^.] one char that is not a point, | OR...
  • \.(?!\.) point not followed by a point ...
  • + one or more times
+2
source share

You can also use this template:

 var myStr = "1.6.1 Members................................................................ 12\n1.6.2 Accessibility.......................................................... 13\n1.6.3 Type parameters........................................................ 13\n1.6.4 The T generic type aka <T>............................................. 13"; console.log(myStr.split(/ (.+?)\.{2,} ?\d+$\n?/m)); 

About the search method:

I do not think that's possible. Since the only way to skip a character (here is a space between two words), it should coincide with it on the occasion of the previous occurrence of a space (between a number and the first word). In other words, you are using the fact that characters cannot be matched more than once.

But if, except for the place where you want to separate, the whole template is enclosed in lookahead, and since the substring corresponding to this subpattern in lookahead is not part of the result of the comparison (in other words, this is only a check and the corresponding characters are not eaten using the regular expression mechanism) , you cannot skip the following spaces, and the regex engine will continue on to the next space.

+1
source share

All Articles