Fast PHP analysis in C #

I have a requirement for parsing PHP files in C #. We essentially need some of the developers in another country to upload PHP files, and after loading we should check the php files and get a list of all methods and classes / functions, etc.

I was thinking about using a regular expression, but I can’t train if the function belongs to a class, etc., so I was wondering if there is already something β€œthere” that will parse the PHP files and spit out its functions (I ' m, trying to avoid writing a full implementation of AST).

Does anyone have any ideas? I looked at Coco / R but could not find the PHP grammar file. I am using .NET 2.0 and C #.

+7
c # php parsing
source share
2 answers

Why do this in C #? In PHP, this is trivial. Use the token_get_all() function, and it will break the PHP file into a token stream, which you can use to finally define a list of classes and methods by writing a state machine.

Whatever you do, do not try to do it with regular expressions. It will be incredibly tiring and error prone.

Edit: There are three main possibilities for this:

  • Do it in PHP. This will be the fastest (for development) and easiest option;
  • Run the PHP script command line to do this or create a series of tokens that can be interpreted by the C # program. This is the next easy way:
  • Use Phalanger , a PHP port in the .Net framework. This may be more manageable since it is still .Net code; or
  • Use Quercus , the PHP port for the Java virtual machine.

Everything that will be connected with writing a PHP parser (a lot of work) or using really flash regular expressions, which will be an unreliable support nightmare.

To worry about the alleged "security flaws" of PHP, there are several issues:

  • Any framework or technology stack can have security flaws. The fact that your administrator only allows .Net to protest effectively against Java simply indicates an irrational prejudice. I say this as a longtime Java developer: Java, .Net and PHP may have security flaws;
  • You can run PHP from the command line so that it does not serve HTTP requests, which reduces the problem of security flaws to zero;
  • If you are worried about internal security risks (from someone who has access to the mailbox), just limit the PHP CLI executable to only the executable only the group in which only your program is located.
+18
source share

You may be able to use ctags for your purpose. I'm not sure how you could integrate it with C # since ctags is written in C.

Alternatively, if you know your parsers, you can take a look at the grammar files in the PHP source. In particular, zend_ini_parser.y and zend_language_parser.y .

Finally, although this is not the best solution, you could probably get away with a bunch of regular expressions at home. PHP grammar is pretty strict on classes and functions. You just need to keep track of a bit of state, so you know which class the function belongs to.

+1
source share

All Articles