Using libclang for parsing in C ++ in Python

Question

Using libclang for parsing in C ++ in Python

After some research and a few questions, I ended up exploring the libclang library to parse C ++ source files in Python.

Based on C ++ source

int fac(int n) { return (n>1) ? n∗fac(n−1) : 1; } for (int i = 0; i < linecount; i++) { sum += array[i]; } double mean = sum/linecount;

I am trying to identify fac tokens as a function name, n as a variable name, i as a variable name, mean as a variable name, and also each one position. I was interested in eventually tokenizing them.

I read some very useful articles ( eli , Gaetan's ), as well as some questions 35113197 , 13236500 .

However, given that I am new to Python and struggling to understand the basics of libclang, I would really appreciate some sample code that implements the above for me to understand and understand.

+2

c ++ python parsing libclang

nk-fford Apr 23 '16 at 8:40

source share

1 answer

Andrew walker · Accepted Answer · 2016-04-24T10:31:06+0000

It is not immediately clear from the libclang API which approach to extracting the token is suitable. However, it rarely happens that you will ever need (or want to) go down to this level - the cursor layer is usually much more useful.

However, if this is what you need, a minimal example might look something like this:

 import clang.cindex s = ''' int fac(int n) { return (n>1) ? n*fac(n-1) : 1; } ''' idx = clang.cindex.Index.create() tu = idx.parse('tmp.cpp', args=['-std=c++11'], unsaved_files=[('tmp.cpp', s)], options=0) for t in tu.get_tokens(extent=tu.cursor.extent): print t.kind

Which (for my clang version) produces

 TokenKind.KEYWORD TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.KEYWORD TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.PUNCTUATION TokenKind.KEYWORD TokenKind.PUNCTUATION TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.LITERAL TokenKind.PUNCTUATION TokenKind.PUNCTUATION TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.IDENTIFIER TokenKind.PUNCTUATION TokenKind.LITERAL TokenKind.PUNCTUATION TokenKind.PUNCTUATION TokenKind.LITERAL TokenKind.PUNCTUATION TokenKind.PUNCTUATION

Using libclang for parsing in C ++ in Python

More articles: