Best practice for HTML escaping user-provided data using PHP (and ZF)

Note. I use the Zend Framework, but I think most of them relate to PHP coding in general.

I'm trying to choose a strategy for writing view scripts, possibly using a template engine. Motives: clarity and security . I'm just not happy with writing .phtml scripts. This syntax is horribly detailed to do the most often required thing - the output of a variable:

<?php echo $this->escape($this->myVariable); ?> 

In addition to the duration of the IMHO code, the template author does not need to remember (and disturb) the evacuation call record every time he / she wants to output a variable. Forgetting a call will almost certainly lead to an XSS vulnerability.

I have two possible solutions to this problem:

Solution 1: automatic screening template engine

I think that at least Smarty has the ability to automatically escape html objects when outputting variables. There are points against Smarty , but perhaps at least some of them are considered in the upcoming 3.0 - I have not tested it yet.

XML-based mechanisms such as PHPTAL will also delete any default data. However, they may seem strange to a beginner. Maybe worth a try?

Solution 2: avoid data in the model

Of course, another option would be to avoid the necessary data already in the Model (or even the controller?). The model should already know the type of content (mostly plain text or HTML text) for each field, so it would be logical to avoid the data. A view can view all data as safe HTML. This would allow, for example. changing the field data type from plain text to HTML without touching the script view - only by changing the model.

But then again, this does not seem like good MVC practice. In addition, there are problems with this approach:

  • sometimes the view only wants to print the first n characters, and we don’t want to finish trimming the data foo & bar like foo &am (first ran away from it like foo &amp; bar )
  • Perhaps the view wants to build a URL with varName = $ varName in querystring - again, escaping in the Model will be bad.

(These problems can be solved by providing two versions of the data or by canceling it in the template. It seems bad to me.)

Ideas? Am I missing something? What do you consider "best practice"?

PS. This article focuses on finding a common solution for any custom text data that may contain < or > or any other characters. Thus, filtering data before storing it in the database is not a solution.

Update:

Thanks for all the comments. I have done some more research and will be evaluating Twig and possibly the Open Power Pattern . Both seem interesting: Twig looks very simple, but the project is young. On the XML side, the OPT syntax looks a little better than PHPTAL. Both Twig and OPT are well documented.

+4
source share
4 answers
  • Filter as soon as possible. You must ensure that all text input is UTF-8-compatible so that your word processing functions work predictably.

    But do not try to filter out “dangerous” characters or fragments! This does not work. Correct or reject invalid input data. There is nothing wrong with the < or ' characters.

  • Escape as late as possible. Add SQL escaping to the SQL query function (or better, use prepared statements). HTML escape in your HTML templates. Quoted-Printable-escape in your email generation functions, launching a shell when running CLI commands, etc.

    Do not allow shielded data to spread throughout your application, as longer shielded data has a greater chance that you will mix it with non-exclusive data or break the shielding during processing.

+10
source

This is not a general solution, but one very useful thing in this situation is the Hungarian style notation. The Hungarian notation used all the time is just annoying for me, but this is where the metadata in the variable name is very valuable. It’s good practice to name your variables a prefix that says what to expect from it ... i.e. $ rawUserInput, $ escapedUserInput, etc.

This does not completely solve the problem, but it is good coding practice. Then when you see a piece of code that says

 'SELECT * from table where username = ' + $rawUserName 

it’s immediately obvious that there is an injection vulnerability because you know that the original prefix means that you did not slip away from it.

+2
source

But then again, this does not seem like good MVC practice.

I completely agree that the model is not suitable for such a representation and saves both HTML and an unprocessed version of each variable, which will facilitate their synchronization. Forget solution 2.

This leaves you with alternative engine templates, or stick with PHP and learn to constantly bear the load of the htmlspecialchars call. I am open to the idea of ​​alternative templates, but the ones I have tried so far, I really have not been satisfied.

(Many discard the PHP syntax and implement their own languages ​​with a limited expression, which means that you lose the advantage of a language that you already know and get stuck in a nodding language, which makes more complex presentation logic impossible, so you end up doing it yourself in PHP with strings full of HTML, which doesn't win at all.)

So, at the moment I am proposing Solution 0a to be added to the heap: define a global function with a short name in order to remove the pain from HTML escaping:

 <?php function h($s) { echo(htmlspecialchars($s, ENT_QUOTES)); } ?> ... My lovely variable is <?php h($this->myVariable); ?>. 

I have no idea why PHP does not define a shortcut for this, which, as you say, is the most common use case. Now they have dropped short tags for XML-PI-style tags, why is there one with a different name to do the right thing, for example say <?phph ?

+2
source

There are a dozen ways to do this. Here are a few:

  • You can write your own kind of class, as described in the Zend Framework Manual, and avoid any variables when they are assigned or requested from a view.
  • In the case of data sets, you can send them to a custom ArrayIterator that the output comes out when extracting items from it, as well as any other material that you want to automate on the output.
  • Or you can use the View Script approach .
  • Or, if you do not want your template authors to write any PHP or template syntax, you could ask them to write structured HTML and then paste the values ​​through the DomDocument extension.

As for PHP in the template, which is verbose, well ... it may not offer the shortest notation, but again, it does , it provides the notation, and there is no overhead for it. Even for non-PHP template authors, it should be easy to recognize a few method calls in PHP than the (often weird) template language, which basically reinterprets a subset of what PHP can do out of the box.

You can also use the Alternative PHP syntax and NowDoc or HereDoc in your templates to get rid of the <?php and echo calls so that you can get something like

 <?php // get some partial block done first foreach($this->books as $book): $loopdata = << LOOPDATA <li> {$book->title} - {$book->author} - {$book->publisher}</li> LOOPDATA; endforeach; // render entire template echo << HTML <h1>{$this->title}</h1> <ul>{$loopdata}</ul> HTML; 

Personally, I don't find this too attractive, but as you can see, there are many ways to write your templates with PHP. Just pick one.

0
source

All Articles