How to get the first #include statement in C ++ files using Python regex?

I want to get the first #include statement from a .cpp file using Python regex as quickly as possible.

For instance,

 /* Copyright: This file is protected #include <bad.h> */ // Include files: #undef A_MACRO #include <stddef.h> // defines NULL #include "logger.h" // Global static pointer used to ensure a single instance of the class. Logger* Logger::m_pInstance = NULL; 

should return #include <stddef.h>

I know that one way is to delete all comments and then get the first line from the remaining texts. But this does not seem to be so fast, as it has to go through the whole file. If I need only the first #include statement, is there any efficient way I can do it with Python regex?

[Update 1] Several people mentioned that this is not a good solution for using regular expressions. I understand that this is not a typical example of using a regular expression. But is there a better way to get rid of leading comments than regular expression? Any suggestion would be appreciated.

[Update 2] Thanks for the answers. But no one seemed to satisfy me. My requirements are simple: (1) do not skip the whole file to get the first line. (2) Leading comments must be handled correctly.

+4
source share
3 answers

You can use the library called CppHeaderParser as follows:

 import sys import CppHeaderParser cppHeader = CppHeaderParser.CppHeader("test.cpp") print("List of includes:") for incl in cppHeader.includes: print " %s" % incl 

For it to work, you must do

 pip install cppheaderparser 

It outputs:

 List of includes: <stddef.h> // defines NULL "logger.h" 

Of course, not the best result, but this is the beginning.

+4
source

How about using the C preprocessor itself?

If you run gcc -E foo.cpp (where foo.cpp is your sample input file), you will get:

 # 1 "foo.cpp" # 1 "<built-in>" 1 # 1 "<built-in>" 3 # 326 "<built-in>" 3 # 1 "<command line>" 1 # 1 "<built-in>" 2 # 1 "foo.cpp" 2 # 1 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/include/stddef.h" 1 3 4 

Lines up to # 1 "foo.cpp" 2 are patterns and can be ignored. (See what your C preprocessor generates).

When you get to # 1 some-other-file ... you know that you pressed #include.

You will get the full path name (not the way it appears in the #include statement), but you can also indicate where #include appeared, looking back at the last line marker.

In this case, the last line marker is # 1 foo.cpp 2 and appears 9 lines back, so #include for stddef.h is on line 9 of foo.cpp .

So now you can go back to the original file and take line 9.

+1
source

Should it be a regex? The code below stops in the first line, processes nested comments and does not interrupt in the case of // /*This is a comment .

 incomment = False with open(r'myheader.h') as f: for line in f: if not incomment: line = line.split('//')[0] if line.startswith('#include'): print line break if '/*' in line: incomment = True if '*/' in line: incomment = False 
0
source

All Articles