Using libclang for parsing in C ++ in Python

After some research and a few questions, I ended up exploring the libclang library to parse C ++ source files in Python.

Based on C ++ source

int fac(int n) { return (n>1) ? n∗fac(n−1) : 1; } for (int i = 0; i < linecount; i++) { sum += array[i]; } double mean = sum/linecount; 

I am trying to identify fac tokens as a function name, n as a variable name, i as a variable name, mean as a variable name, and also each one position. I was interested in eventually tokenizing them.

I read some very useful articles ( eli , Gaetan's ), as well as some questions 35113197 , 13236500 .

However, given that I am new to Python and struggling to understand the basics of libclang, I would really appreciate some sample code that implements the above for me to understand and understand.

+2
c ++ python parsing libclang
source share
1 answer

It is not immediately clear from the libclang API which approach to extracting the token is suitable. However, it rarely happens that you will ever need (or want to) go down to this level - the cursor layer is usually much more useful.

However, if this is what you need, a minimal example might look something like this:

 import clang.cindex s = ''' int fac(int n) { return (n>1) ? n*fac(n-1) : 1; } ''' idx = clang.cindex.Index.create() tu = idx.parse('tmp.cpp', args=['-std=c++11'], unsaved_files=[('tmp.cpp', s)], options=0) for t in tu.get_tokens(extent=tu.cursor.extent): print t.kind 

Which (for my clang version) produces

 TokenKind.KEYWORD TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.KEYWORD TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.PUNCTUATION TokenKind.KEYWORD TokenKind.PUNCTUATION TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.LITERAL TokenKind.PUNCTUATION TokenKind.PUNCTUATION TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.LITERAL TokenKind.PUNCTUATION TokenKind.PUNCTUATION TokenKind.LITERAL TokenKind.PUNCTUATION TokenKind.PUNCTUATION 
+4
source share

All Articles