Can one regular expression capture group capture a phrase without some of its middle characters?

Question

Can one regular expression capture group capture a phrase without some of its middle characters?

I am working on XML that contains lists of regular expressions that will be used as capture groups. Why this is so is a long story, not something that I can change.

I just came across a situation where I want to write a name that spans two lines, i.e. Bob\nJones . Is there a way to capture this whole name into one capture group without using any other capture groups in Perl using regex? Basically, I want for $1 = "Bob Jones" , replacing \n space.

I think this is not possible, and the correct way would be to use group capturing for the first and last name (which I cannot do in my case), but I decided that I would ask anyway before I give it up. Any ideas?

+4

regex perl

Eli Apr 29 '11 at 15:47

source share

2 answers

No.

+6

Svante Apr 29 '11 at 15:55

source share

David W. · Answer 1 · 2011-04-29T16:17:03+0000

You might want to take a look at some of the modules of the XML parser. XML :: Simple is pretty ... well ... simple and can parse an XML file better than you can with regular expressions. As you have discovered, sooner or later you will reach the point where regular expressions start to become quite confusing when you try to analyze every possible combination.

I want a standard Perl installation to ship with XML, HTML, and LWP modules. A significant number of my Perl scripts always require HTML access or parsing XML files, and sometimes it is not possible to download and compile the modules you need from CPAN . I believe that XML :: Simple needs several other XML modules to work ( XML :: SAX comes to mind), but there is no C code compilation.

This means that you can put the XML :: Simple module in a directory with your Perl script. The @INC array by default contains the current directory. (Or you can use use lib pragma).

Can one regular expression capture group capture a phrase without some of its middle characters?

More articles: