Replace some diacritics with the regular expression perl

I want to replace some of the diacritics contained in the file with their ASCII equivalent. Please note that I do not want to delete all diacritics: only those that are before the first "@" character of each line.

In the simplified version of the file below (a.glo) there are four "é" (in bold) to replace with "e". I use a (possibly ugly) regex:

(\\glossaryentry\{(\w|\s|\.)*)(é|è|ê|ë|É|È|Ê|Ë|ē)+ 

and he works with an online tester, for example, www.regex101.com/and notepad ++!

But nothing changes when you type in the Windows command prompt:

 perl -pi -i.bak -e "s/(\\glossaryentry\{(\w|\s|\.)*)(é|è|ê|ë|É|È|Ê|Ë|ē)+/$1e/g" a.glo 

(fwiw, on my system, perl - v.5.20.2)

a.glo:

\ glossaryentry {AHRF @ {\ memgloterm {AHRF}} {\ memglodesc {Annales historiques de la Révolution française}} {\ memgloref {}} | memjustarg} {1}

\ glossaryentry {Ass. pl é n. @ {\ memgloterm {Ass. plén.}} {\ memglodesc {Assemblée plénière}} {\ memgloref {}} | memjustarg} {1}

\ glossaryentry {Ch. r é un. @ {\ memgloterm {Ch. réun.}} {\ memglodesc {Chambres réunies}} {\ memgloref {}} | memjustarg} {1}

\ glossaryentry {chron. @ {\ memgloterm {chron.}} {\ memglodesc {chronique}} {\ memgloref {}} | memjustarg} {1}

\ glossaryentry {CO. min @ {\ memgloterm {Circ. min.}} {\ memglodesc {Circulaire ministérielle}} {\ memgloref {}} | memjustarg} {1}

\ glossaryentry { é d. @ {\ memgloterm {éd.}} {\ memglodesc {édition, édité par}} {\ memgloref {}} | memjustarg} {1}

\ glossaryentry {Int J Semiot Law @ {\ memgloterm {Int J Semiot Law}} {\ memglodesc {International Journal of Semiotics of Law - Revue internationale de sémiotique juridique}} {\ memgloref {}} | memjustarg} {1}

\ glossaryentry {Oxford J Legal Studies @ {\ memgloterm {Oxford J Legal Studies}} {\ memglodesc {Oxford Journal of Legal Studies}} {\ memgloref {}} | memjustarg} {1}

\ glossaryentry {pr é c. @ {\ memgloterm {préc.}} {\ memglodesc {précité}} {\ memgloref {}} | memjustarg} {1}

\ glossaryentry {Rev. adm. @ {\ memgloterm {Rev. adm.}} {\ memglodesc {Revue Administrative}} {\ memgloref {}} | memjustarg} {1}

+4
source share
1 answer

I tried this in a window window, it works.
I think the file should be open in the correct encoding.
I saved your sample text as ANSI text.

perl -pi -i.bak -e "s/(\\glossaryentry\{[\w\s.]*)[\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+/$1e/g" a.glo

  # (\\glossaryentry\{[\w\s.]*)[\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+ ( # (1 start) \\ glossaryentry \{ [\w\s.]* ) # (1 end) [\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+ 
+2
source

All Articles