For non-MATLAB readers: not sure which family they belong to, but MATLAB regular expressions are fully described here . The comment character is MATLAB % (percent), and its line separator is ' (apostrophe). The string delimiter inside the string is written as a double apostrophe ( 'this is how you write "it''s" in a string.' ). To complicate matters, matrix transposed operators are also apostrophes ( A' (Hermitian) or A.' (regular)).
Now, for dark reasons (which I will not develop :), I am trying to interpret MATLAB code in my own MATLAB language.
I'm currently trying to remove all trailing comments in an array of row cells, each of which contains a MATLAB line of code. At first glance, this may seem simple:
>> str = 'simpleCommand(); % simple trailing comment'; >> regexprep(str, '%.*$', '') ans = simpleCommand();
But of course, something like this might come:
>> str = ' fprintf(''%d%*c%3.0f\n'', value, args{:}); % Let' do this! '; >> regexprep(str, '%.*$', '') ans = fprintf(' %
Obviously, we need to exclude all comment characters that are inside the lines from coincidence, and also considering that one apostrophe (or apostrophe point) immediately after the operator is an operator, not a line separator.
Based on the assumption that the number of open / close characters in front of the comment character should be even (which, as I know, is incomplete due to the transpose matrix operator), I created the following dynamic regular expression to handle this type of case:
>> str = { 'myFun( {''test'' ''%''}); % let' ' 'sprintf(str, ''%*8.0f%*s%c%3d\n''); % it' ' 'sprintf(str, ''%*8.0f%*s%c%3d\n''); % let' ' 'sprintf(str, ''%*8.0f%*s%c%3d\n''); ' 'A = A.'';%tight trailing comment' }; >> >> C = regexprep(str, '(^.*)(?@mod(sum(\1==''''''''),2)==0;)(%.*$)', '$1')
but
C = 'myFun( {'test' '%'}); ' %// sucess 'sprintf(str, '%*8.0f%*s%c%3d\n'); ' %// sucess 'sprintf(str, '%*8.0f%*s%c%3d\n'); ' %// sucess 'sprintf(str, '%*8.0f%*s%c' %// FAIL 'A = A.';' %// success (although I'm not sure why)
so I'm almost there, but not quite yet :)
Unfortunately, I have exhausted the amount of time I can spend thinking about this, and he needs to continue with other things, so maybe someone who has more time is friendly enough to think about these issues:
- Are comment characters inside lines the only exception I need to look for?
- What is the right and / or more efficient way to do this?