2010
07.15

Everyone who has worked with PHP should be familiar with the bare fundamentals of its syntax: an opening PHP tag followed by code and (optionally) followed by a closing PHP tag. Incredibly though, there isn’t just one set of tags that can be used to invoke the PHP interpreter on a block of code: there are a total of four separate sets! Each set of tags has a slightly different set of behaviors and awareness of those differences is crucial in preventing certain types of security vulnerabilities.

My goal here is to lay out the four different sets of tags, describe what makes them special, and finally to explain why knowing about them is so important.

Standard” PHP

This is the syntax that most people are familiar with; accordingly, it’s what you see most often within PHP code. The opening tag is defined to be <?php and the closing tag is ?>. This set of tags is always usable within PHP.

1
2
3
4
5
<?php

// code goes here

?>

PHP Short Tags

This is another fairly well-known syntax. The opening tag is shorted to just require <?, matching up nicely the closing tag. There’s even an abbreviated syntax for echoing using the opening tag <?=. Unfortunately, these types of tags conflict with XML documents due to the documents’ use of <?xml.

PHP can be configured to accept or ignore these tags using the short_open_tag directive in php.ini. As a result, they’re considered non-portable; if you’re writing code that’s intended to be used by others or that may be used in environments where you can’t control PHP settings, you’re encouraged to forgo short tags in favor of standard tags.

1
2
3
4
5
6
<?

// code goes here

?>
<?= $var ?>

ASP-style” Short Tags

This is a less well-known syntax. It behaves in a similar fashion to regular short tags; the only difference is that the opening and closing tags are <% and %>, respectively. This change avoids conflicting with XML documents.

PHP can be configured to accept or ignore these tags using the asp_tags directive in php.ini. As a result, they’re also considered non-portable. They are used much less frequently than regular PHP short tags.

1
2
3
4
5
6
7
<%

// code goes here

%>

<%= $var %>

PHP <script> Tags

This is possibly the least well known of the four tags: I’ve never used it personally and I’ve only seen it referenced in very old books on PHP. It behaves just like the normal PHP tags though. All PHP installations will parse PHP code written in this format; there is no way to disable that behavior.

1
2
3
4
5
<script language="php">

// code goes here

</script>

Why Different Tags Matter

This example is based on an actual security vulnerability I encountered in a live application.

Lets say you write an application that wants to read in user-supplied data via include/require (generally a bad idea, but there are applications out there that do this). You, as a smart PHP programmer, realize that you have a potential security vulnerability on your hands: all a user needs to do is write an opening PHP tag and they can execute arbitrary code! So, you decide to filter their input: you reject their input if it contains <% or <? anywhere in it. Maybe you even turn asp_tags off in php.ini, since you never plan to use them anyway. That takes care of <%, <?, <?php, and any special echoing syntax that they might provide. You’re safe and secure now, right?

Oh, wait. There’s a fourth set of tags you forgot about.

<script language="php"> is not very well-known and it doesn’t look like the other sets of tags. So, it’s both harder to remember and harder to defend against than any of the other tags. A blacklist like the one above would still allow the <script> tags through, allowing for arbitrary PHP code execution.

Now, although this is a very good argument for why developers shouldn’t rely on blacklists for security and why applications should not call include/require on files created from user input, both of those things do happen. Security-conscious developers need to be aware of all of these risks and loopholes so they can be properly mitigated.

Comments