How can I get some memories from a Perl regex?

Question

How can I get some memories from a Perl regex?

The purpose of a regular expression search is to identify all instances of a template class from C ++ header files. Class instances can be formed, for example:

CMyClass<int> myClassInstance; CMyClass2< int, int > myClass2Instacen;

The search is performed by loading the entire file in a line:

 open(FILE, $file); $string = join('',<FILE>); close(FILE);

And the following expression is used to define instances of a class, even if an instance of the class contains more than one line string:

 $search_string = "\s*\w[^typename].*<(\s*\w\s*,?\n?)*)>\s*\w+.*"; $string =~ m/$search_string/;

The problem is that the search returns one hit, even if there are more instances of the class in the files.

Is it possible to get all the hits using this approach from one of the regex backreferences variables?

+4

regex perl multiline

Smoller May 04, '09 at 13:27

source share

3 answers

Sinan Ünür · Answer 1 · 2009-05-04T14:01:12+0000

First, if you are going to share files, you should use File :: Slurp . Then you can do:

 my $contents = read_file $file;

read_file will be compressed on error.

Secondly, [^ typename] does not exclude only the string 'typename', but also any string containing any of these characters. Other than that, it’s not obvious to me that the template you use will consistently match the things you want to match, but I cannot comment on it right now.

Finally, to get all the matches in the file one by one, use the g modifier in the loop:

 my $source = '3 5 7'; while ( $source =~ /([0-9])/g ) { print "$1\n"; }

Now that I have the opportunity to take a look at your template, I'm still not sure what to do with [^ typename], but here is an example program that captures the part between angle brackets (as it seems to be the only thing that you capture above):

 use strict; use warnings; use File::Slurp; my $pattern = qr{ ^ \w+ <\s*((?:\w+(?:,\s*)?)+)\s*> \s* \w+\s*; }mx; my $source = read_file \*DATA; while ( $source =~ /$pattern/g ) { my $match = $1; $match =~ s/\s+/ /g; print "$match\n"; } __DATA__ CMyClass<int> myClassInstance; CMyClass2< int, int > myClass2Instacen; C:\Temp> t.pl int int, int

Now I suspect you would prefer the following:

 my $pattern = qr{ ^ ( \w+ <\s*(?:\w+(?:,\s*)?)+\s*> \s* \w+ ) \s*; }mx;

which gives:

 C:\Temp> t.pl CMyClass<int> myClassInstance CMyClass2< int, int > myClass2Instacen

Gavin miller · Answer 2 · 2009-05-04T13:49:34+0000

What you need is the \G modifier. It will start the next match of your string after the last match.

Here is the documentation from the Perl Doc (SO has problems with the link, so you have to copy and paste):

http://perldoc.perl.org/perlfaq6.html#What-good-is- '% 5cG'-in-a-regular-expression% 3f

si28719e · Answer 3 · 2009-05-04T16:06:16+0000

I would do something like this,

 #!/usr/bin/perl -w use strict; use warnings; local(*F); open(F,$ARGV[0]); my $text = do{local($/);}; my (@hits) = $text =~ m/([az]{3})/gsi; print "@hits\n";

Assuming you have a text file like

  / home / user $ more a.txt
 a bb dkl jidij lksj lai suj ldifk kjdfkj bb
 bb kdjfkal idjksdj fbb kjd fkjd fbb kadfjl bbb
 bb bb bbd i

this will output all the images from the regex:

 /home/user$ ./a.pl a.txt dkl jid lks lai suj ldi kjd fkj kdj fka idj ksd fbb kjd fkj fbb kad fjl bbb bbd

and a specific solution to your problem using the same approach might look like this:

 #!/usr/bin/perl -w use strict; use warnings; my $text = <<ENDTEXT; CMyClass<int> myClassInstance; CMyClass2< int, int > myClass2Instacen; CMyClass35< int, int > myClass35Instacen; ENDTEXT my $basename = "MyClass"; my (@instances) = $text =~ m/\s*(${basename}[0-9]*\s*\<.*? (?=\>\s*${basename}) \>\s*${basename}.*?;)/xgsi; for(my $i=0; $i<@instances; $i++){ print $i."\t".$instances[$i]."\n\n"; }

Of course, you probably have to tweak the regex a bit to fit all the edge cases in your data, but that should be a good start.

How can I get some memories from a Perl regex?

More articles: