Search for a function name and counting its LOC

So, you know with a bat, this is the project that I was assigned to. I am not looking for an answer in the code, but directed more.

I was told that I had to go through the file and count the actual lines of code, while writing down the names of the functions and individual lines of code for the functions. The problem I am facing is determining how to read from a file, in order to determine if a string is the beginning of a function.

So far, I can only think about the presence of a string array of data types (int, double, char, etc.), look for it in the string and then look for the brackets, and then look for the absence of a semicolon (so that I know that this not just a function declaration).

So my question is, how should I do this, or are there other methods that you would recommend?

The code I will be counting on will be in C ++.

+4
source share
6 answers

Three approaches come to mind.

  • Use regular expressions. This is pretty similar to what you are thinking. Look for strings that look like function definitions. This is pretty quick to do, but it may go wrong.

    char *s = "int main() {" 

    is not a function definition, but certainly looks like.

     char * /* eh? */ s ( int /* comment? // */ a ) // hello, world /* of confusion { 

    - definition of a function, but not like one.

    Good: write fast, can work even in the conditions of syntax errors; bad: can easily skip things that look (or don't look) like a β€œnormal” case.

    Option: first run the code, for example, GNU indent. This will take care of some (but not all) misfires.

  • Use the correct lexer and parser. This is a much more thorough approach, but you can reuse the open source lexer / parser (e.g. from gcc).

    Good: it will be 100% accurate (it will never skip gaps). Bad: one semicolon is missing, and this leads to errors.

  • See if your compiler can get debug output. This is option (2), but using your lexer / parser compiler instead of your own.

+7
source

Your idea can work in 99% (or more) cases. Only a real C ++ compiler can execute 100%, in which case I would compile in debug mode ( g++ -S prog.cpp ) and get the function names and line numbers from the debug information of the assembly output ( prog.s ).

My thoughts for a 99% solution:

  • Ignore comments and lines.
  • Document that you are ignoring preprocessor directives ( #include , #define , #if ).
  • Anything between the top of { and } is the body of the function, with the exception of the following typedef , class , struct , union , namespace and enum .
  • If you have a class , struct or union , you should look for method bodies inside it.
  • A function name is sometimes difficult to find, for example. in long(*)(char) f(int); .
  • Make sure your parser works with template functions and template classes.
+4
source

I use PCRE and regex to write function names

 "(?<=[\\s:~])(\\w+)\\s*\\([\\w\\s,<>\\[\\].=&':/*]*?\\)\\s*(const)?\\s*{" 

and then filter the names, such as "if", "while", "do", "for", "switch". Note that the name of the function is (\ w +), group 1.
Of course, this is not an ideal solution, but a good one.

+3
source

Find a suitable SLOC counting program, such as SLOCCounter . You can not only calculate SLOC, but you also have something nasty to compare your results. (Update: here is a long list of them.)

Interestingly, the number of incompatible semicolons in the C / C ++ program is a decent amount of SLOC.

+2
source

I feel that manually executing the syntax will be quite a challenge. I would probably use an existing tool such as RSM to redirect the output to a csv file (assuming you are on windows), and then analyze the csv file to gather the necessary information.

+2
source

How about writing a shell script for this? Perhaps the AWK program.

+2
source

All Articles