Regex matching to remove certain uses of a period character

Question

Regex matching to remove certain uses of a period character

I have Fortran 77 source files that I am trying to convert from the non-standard STRUCTURE and RECORD syntax to the standard Fortran 90 TYPE . One tricky aspect of this is the different way you look at structural elements.

Non-standard:

 s.member = 1

Standard:

 s%member = 1

So, I need to catch all the use of periods in these scenarios and replace them with % characters. It's not so bad if you haven't thought about all the ways you can use periods (decimal points in numbers, file names in include statements, punctuation in comments, Fortran 77 relational operators, and maybe others). I did some preprocessing to fix relational operators to use Fortran 90 characters, and I don't really care about the grammar of the comments, but I don't have a suitable translation approach . in % for the above cases. It seems I should be able to do this with sed, but I'm not sure how to match the instances I need to fix. Here are the rules I was thinking about:

Step by step:

If a line starts with <whitespace>include , we should not do anything with this line; pass it to the output, so we won’t mess up the file name inside the include statement.
The following lines are statements that have no symbolic equivalents, so you need to leave them alone: .not. .and. .or. .eqv. .neqv. .not. .and. .or. .eqv. .neqv.
Otherwise, if we find a period that is surrounded by 2 non-numeric characters (so this is not a decimal point), then this should be the operator that I want to replace. Change this period to % .

I am not a Fortran native speaker, so here are a few examples:

 include 'file.inc' ! We don't want to do anything here. The line can ! begin with some amount of whitespace if x == 1 .or. y > 2.0 ! In this case, we don't want to touch the periods that ! are part of the logical operator ".or.". We also don't ! want to touch the period that is the decimal point ! in "2.0". if a.member < 4.0 .and. b.othermember == 1.0 ! We don't want to touch the periods ! inside the numbers, but we need to ! change the "a." and "b." to "a%" ! and "b%".

What is a good way to solve this problem?

Edit: I really found some additional operators that contain a point in them that have no symbolic equivalents. I updated the list of rules above.

+4

regex fortran sed

Jason r Oct 28 '11 at 17:36

source share

6 answers

If the codebase is really HUUGE (and I really think very much whether it really is), I would just take an editor, for example Vim (vertical and block selection are your friends) a * and highlight the second day to do it manually * . One fine day, I think you will end up with most, if not all . Afternoon is a lot of time. Just imagine how many cases you could cover in just these 2 hours.

Just trying to write a parser for something like this will take you a lot more time.

Of course, the question forces itself ... if the code is F77, which all compilers still support, and the code works ... why are you so interested in changing it?

+2

Roook Oct 28 '11 at 20:29

source share

I don’t know much about regular expressions, so I’m probably going to try to solve this problem from the other side. If you grep for the STRUCTURE keyword, you get a list of all STRUCTURES used in the code. Once you have this, for each STRUCTURE S you can simply replace all instances of S. with S% .

This way you don't have to worry about things like .true. , .and. .neq. and their relatives. The main problem is to analyze STRUCTURE ads.

+2

ev-br Oct 29 '11 at 17:01

source share

Although the regex is below:

 (?<!')\b([^.\s]+)(?<!\.(?:not|and|or|eqv|neqv))(?<=\D)\.(?=\D)(?!(?:not|and|or|eqv|neqv)\.)([^.\s]+)\b(?!')

Replace $1%$2

Works well for your examples, I would not recommend using it with your current task. This, of course, will not cover all your affairs. Now, if you care about 80% coverage or something you could use, but you should probably back up your sources. With the limited set of input cases that I had, I am sure that there will be cases where the regular expression replaces what should not.

Good luck :)

0

FailedDev Oct 28 '11 at 18:57

source share

This sed oneliner may begin

 sed -r '/^\s*include/b;/^\s*! /b;G;:a;s/^(\.(not|and|or|eqv|neqv)\.)(.*\n.*)/\3\1/;ta;s/^\.([^0-9]{2,})(.*\n.*)/\2%\1/;ta;s/^(.)(.*\n.*)/\2\1/;ta;s/\n//'

0

potong Oct 28 '11 at 19:19

source share

Based on your examples, I assume that this will be enough to protect quoted strings, and then replace periods with alphabets on both sides.

 perl -pe '1 while s%(\x27[^\x27]+)\.([^\x27]+\x27)% $1@ @::@@$2%; s/([az])\.([az])/$1%$2/g; s/@@::@@/./g' file.f

I propose this solution for Perl not because sed is not a good enough tool for this, but because it avoids the question of minor but disgusting differences between sed dialects. The ability to use the hexadecimal code for single quotes is a good bonus.

0

tripleee Oct 29 '11 at 19:46

source share

Stefano borini · Accepted Answer · 2011-10-28T18:42:57+0000

You cannot do this with regular expression, and it is not so simple.

If I had to do what you need, I will probably do it manually, unless the code base is huge. If the first applies, replace all [a-zA-Z0-9] first. [A-zA-Z] for something very strange that is never guaranteed to compile, something like "@WHATEVER @", then go on to search for all these records and replace them manually after manual control.

If the amount of code is huge, you need to write a parser. I would suggest you use python to tokenize the base fortran constructors, but remember that fortran is not a simple parsing language. Work "for the routine" and try to find all the variable names used, using them as a filter. If you come across something like a.whatever , and know that a is in the list of local or global vars, apply this change.

Regex matching to remove certain uses of a period character

More articles: