Regexp-replace: multiple substitutions in a match

Question

Regexp-replace: multiple substitutions in a match

I am converting our MVC3 project to use T4MVC. And I would like to replace the java script to work with T4MVC. Therefore I need to replace

"~/Scripts/DataTables/TableTools/TableTools.min.js" "~/Scripts/jquery-ui-1.8.24.min.js"

IN

 Scripts.DataTables.TableTools.TableTools_min_js Scripts.jquery_ui_1_8_24_min_js

I am using Notepad ++ as a regexp tool at the moment, and it uses POSIX regular expressions. I can find the name of the script and replace it with these regular expressions:

Find: \("~/Scripts/(.*)"\)

Replace with \(Scripts.\1\)

But I can’t understand how to replace periods and dashes in file names with underscores and replace slashes in periods.

I can verify that js-filename has a period or dash in the name using this

  \("~/Scripts/(?=\.*)(?=\-*).*"\)

But how do I replace groups within a group?

You must have an inanimate replacement within the group, and these replacements are performed in order, so subsequent slashes converted to a point will not subsequently be converted to underscores.

This is an uncritical problem, I have already completed all the replacements manually, but I thought I was good at regexp, so this problem bothers me!

ps the preferred tool is Notepad ++, but any POSIX regexp solution will do -)

pps Here you can get a sample of the replacement material. And here is the target text

+6

regex replace notepad ++ t4mvc

trailmax Oct 08 '12 at 12:00

source share

4 answers

Here is the Vanilla Notepad ++ solution, but it is certainly not the most elegant. I was able to perform the conversion with several passes over the file.

First pass

Replace . and - on _ .

Find: ("~/Scripts[^"]*?)[.-]

Replace with: \1_

Unfortunately, I could not find a way to match only . or - because this would require lookbehind, which apparently is not supported by Notepad ++. In this regard, every time you perform a replacement, only the first . or - the script name will be replaced (since matches cannot overlap). Therefore, you must run this replacement several times until more replacements are made (in your example input, which will be 8 times).

Second pass

Replace / with . .

Find: ("~/Scripts[^"]*?)/

Replace with: \1.

This is basically the same as the first pass, only with different characters (you will have to do this 3 times for the example file). Performing passes in this order ensures that no dash becomes an underscore.

Third pass

Remove surrounding characters.

Find: "~/(Scripts[^"]*?)"

Replace with: \1

Now this will match all script names that are still surrounded by "~/ and " , fixing what is in between and just outputting it.

Note that including these surrounding characters in the search patterns of the first two passes avoids conversion . to strings that already have a new format.

As I said, this is not the most convenient way to do this. Moreover, one and two passes must be performed manually several times. But it still saves a lot of time for large files, and I can't think of a way to get them all - only in the right lines - in one pass, without lookbehind capabilities. Of course, I would really like to welcome suggestions for improving this solution :). Hopefully I could at least give you (and everyone who has a similar problem) a starting point.

+3

Martin ender Oct 10 '12 at 21:28

source share

If, as your question indicates, you want to use N ++, then use N ++ Python Script. Install a script and assign a key combination, then you have a one-pass solution requiring only opening, changing and saving ... it couldn't be much easier.

I think part of the problem is that N ++ is not a regular expression tool and using a special regular expression tool, or even a search / replace solution, is sometimes justified. You may be better, both in speed and in time, using a tool designed for word processing and editing.

[Script Change] :: Changed to match the expected changes / conclusions.

 # Substitute & Replace within matched group. from Npp import * import re def repl(m): return "(Scripts." + re.sub( "[-.]", "_", m.group(1) ).replace( "/", "." ) + ")" editor.pyreplace( '(?:[(].*?Scripts.)(.*?)(?:"?[)])', repl )

Install :: Plugins → Plugin Manager → Python script
New Script :: Plugins → Python script → script -name.py
Select the target tab.
Run :: Plugins → Python script → Scripts → script -name

[Edit: extended single-line PythonScript command]

Need for a new regex module for Python (which I hope will replace re), I played and compiled it for use with the N ++ PythonScript plugin and decided to test it on your sample set.

Two commands on the console ended with the correct results in the editor.

 import regex as re editor.setText( (re.compile( r'(?<=.*Content[(].*)((?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+))+(?=.*[)]".*)' ) ).sub(lambda m: {'omit':'','toDot':'.','toUnderscore':'_'}[[ key for key, value in m.groupdict().items() if value != None ][0]], editor.getText() ) )

Very sweet!

What else sets us apart from using regex instead of re was that I was able to build an expression in Expresso and use it as is! This allows you to get a detailed explanation, just copy the fragment of the string r'' into Expresso.

Short text:

 Match a prefix but exclude it from the capture. [.*Content[(].*] [1]: A numbered capture group. [(?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+)], one or more repetitions Select from 3 alternatives [omit]: A named capture group. [["~]+?([~])[/]|["]] Select from 2 alternatives ["~]+?([~])[/] Any character in this class: ["] [toUnderscore]: A named capture group. [[-.]+] [toDot]: A named capture group. [[/]+] Match a suffix but exclude it from the capture. [.*[)]".*]

The damage to the command is pretty elegant, we say that Scintilla sets the full buffer contents for the results of the compiled regular expression substitution command, essentially using a "switch" on behalf of a group that is not empty.

Hopefully Dave (author of PythonScript) will add a regular expression module to the ExtraPythonLibs part of the project.

+3

Thell Oct 10 '12 at 23:24

source share

Alternatively, you can use a script that would do this and avoid copying the insert and the rest of the manual work as a whole. Consider using the following script:

 $_.gsub!(%r{(?:"~/)?Scripts/([a-z0-9./-]+)"?}i) do |i| 'Scripts.' + $1.split('/').map { |i| i.gsub(/[.-]/, '_') }.join('.') end

And run it like this:

 $ ruby -pi.bak script.rb *.ext

All files with the extension .ext will be edited in place, and the source files will be saved with the extension .ext.bak . If you use version control (and you should), you can easily view the changes using some visual demarcation tool, correct them if necessary, and subsequently commit them.

+2

detunized Oct 12 '12 at 14:30

source share

Nick · Accepted Answer · 2012-10-11T00:23:04+0000

I would just use a site like RegexHero

You can pass the code to the target line field, and then put (?<=(~/Script).*)[.-](?=(.*"[)]")) In the Regular Expression field with _ in the Replacement String field Replacement String
Once the replacement is complete, click on the Final String at the bottom and select Move to target string and start a new expression .
From there, insert (?<=(<script).*)("~/)(?=(.*[)]" ))|(?<=(Url.).*)(")(?=(.*(\)" ))) in the Regular Expression field and leave the Replacement String field blank.
Once the replacement is complete, click on the Final String at the bottom and select Move to target string and start a new expression .
From there, paste (?<=(Script).*)[/](?=(.*[)]")) Regular Expression field and . Replacement String field.

After that, the Final String field will have what you are looking for. I'm not sure if the upper limit is on how much text you can parse, but it can be broken if this is a problem. I am sure that there may be better ways to do this, but this is usually the way I do such things. One of the reasons I like this site is because I don’t need to install anything, so I can do it anywhere quickly.

Edit 1: in the comments, I went from step 3 to step 5 and added new steps 3 and 4. I had to do it this way because the new Step 5 would replace / with "~/Scripts with a . , Violating deletion "~/ . I also had to change the code of step 5 to account for the changed beginning of Script

Regexp-replace: multiple substitutions in a match

First pass

Second pass

Third pass

More articles: