<?xml version="1.0"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>nikic's blog</title>
    <link>http://nikic.github.com/</link>
    <atom:link href="http://nikic.github.com/rss.xml" rel="self" type="application/rss+xml" />
    <description>NikiC's blog about PHP</description>
    <language>en-us</language>
    <pubDate>Wed, 26 Dec 2012 06:53:55 -0800</pubDate>
    <lastBuildDate>Wed, 26 Dec 2012 06:53:55 -0800</lastBuildDate>

    
    <item>
      <title>Cooperative multitasking using coroutines (in PHP!)</title>
      <link>http://nikic.github.com/2012/12/22/Cooperative-multitasking-using-coroutines-in-PHP.html</link>
      <pubDate>Sat, 22 Dec 2012 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/12/22/Cooperative-multitasking-using-coroutines-in-PHP</guid>
      <description>&lt;p&gt;One of the large new features in PHP 5.5 will be support for generators and coroutines. Generators are already sufficiently covered by the &lt;a href='http://de2.php.net/manual/en/language.generators.overview.php'&gt;documentation&lt;/a&gt; and various other blog posts (like &lt;a href='http://blog.ircmaxell.com/2012/07/what-generators-can-do-for-you.html'&gt;this one&lt;/a&gt; or &lt;a href='http://sheriframadan.com/2012/10/test-drive-php-5-5-a-sneak-peek/#generators'&gt;this one&lt;/a&gt;. Coroutines on the other hand have received relatively little attention. The reason is that coroutines are both a lot more powerful and a lot harder to understand and explain.&lt;/p&gt;

&lt;p&gt;In this article I&amp;#8217;d like to guide you through an implementation of a task scheduler using coroutines, so you can get a feeling for the stuff that they allow you to do. I&amp;#8217;ll start off with a few introductory sections. If you feel like you already got a good grasp of the basics behind generators and coroutines, then you can jump straight to the &amp;#8220;Cooperative multitasking&amp;#8221; section.&lt;/p&gt;

&lt;h2 id='generators'&gt;Generators&lt;/h2&gt;

&lt;p&gt;The basic idea behind generators is that a function doesn&amp;#8217;t return a single value, but returns a &lt;em&gt;sequence of values&lt;/em&gt; instead, where every value is emitted one by one. Or in other words, generators allow you to implement iterators more easily. A very simple example of this concept is the &lt;code&gt;xrange()&lt;/code&gt; function:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;xrange&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$start&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$end&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$step&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$start&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;=&lt;/span&gt; &lt;span class='nv'&gt;$end&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;+=&lt;/span&gt; &lt;span class='nv'&gt;$step&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;xrange&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1000000&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$num&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='nv'&gt;$num&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;xrange()&lt;/code&gt; function shown above provides the same functionality as the built-in &lt;code&gt;range()&lt;/code&gt; function. The only difference is that &lt;code&gt;range()&lt;/code&gt; will return an array with one million numbers in the above case, whereas &lt;code&gt;xrange()&lt;/code&gt; returns an iterator that will emit these numbers, but never actually compute an array with all of them.&lt;/p&gt;

&lt;p&gt;The advantages of this approach should be evident. It allows you to work with large datasets without loading them into memory all at once. You can even work with &lt;em&gt;infinite&lt;/em&gt; data-streams.&lt;/p&gt;

&lt;p&gt;All this can also be done without generators, by manually implementing the &lt;code&gt;Iterator&lt;/code&gt; interface. Generators only make it (a lot) more convenient, because you no longer have to implement five different methods for every iterator.&lt;/p&gt;

&lt;h2 id='generators_as_interruptible_functions'&gt;Generators as interruptible functions&lt;/h2&gt;

&lt;p&gt;To go from generators to coroutines it&amp;#8217;s important to understand how they work internally: Generators are interruptible functions, where the &lt;code&gt;yield&lt;/code&gt; statements constitute the interruption points.&lt;/p&gt;

&lt;p&gt;Sticking to the above example, if you call &lt;code&gt;xrange(1, 1000000)&lt;/code&gt; no code in the &lt;code&gt;xrange()&lt;/code&gt; function is actually run. Instead PHP just returns an instance of the &lt;code&gt;Generator&lt;/code&gt; class which implements the &lt;code&gt;Iterator&lt;/code&gt; interface:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='nv'&gt;$range&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;xrange&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1000000&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$range&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// object(Generator)#1&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$range&lt;/span&gt; &lt;span class='nx'&gt;instanceof&lt;/span&gt; &lt;span class='nx'&gt;Iterator&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// bool(true)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The code is only run once you invoke one of the iterator methods on the object. E.g. if you call &lt;code&gt;$range-&amp;gt;rewind()&lt;/code&gt; the code in the &lt;code&gt;xrange()&lt;/code&gt; function will be run until the first occurrence of &lt;code&gt;yield&lt;/code&gt; in the control flow. In this case it means that &lt;code&gt;$i = $start&lt;/code&gt; and then &lt;code&gt;yield $i&lt;/code&gt; are run. Whatever was passed to the &lt;code&gt;yield&lt;/code&gt; statement can then be fetched using &lt;code&gt;$range-&amp;gt;current()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To continue executing the code in the generator you need to call the &lt;code&gt;$range-&amp;gt;next()&lt;/code&gt; method. This will again resume the generator until a yield statement is hit. Thus, using a succession of &lt;code&gt;-&amp;gt;next()&lt;/code&gt; and &lt;code&gt;-&amp;gt;current()&lt;/code&gt; calls, you can get all values from the generator, until at some point no &lt;code&gt;yield&lt;/code&gt; is hit anymore. For the &lt;code&gt;xrange()&lt;/code&gt; this happens once &lt;code&gt;$i&lt;/code&gt; exceeds &lt;code&gt;$end&lt;/code&gt;. In this case control flow will reach the end of the function, thus leaving no more code to run. Once this happens the &lt;code&gt;-&amp;gt;valid()&lt;/code&gt; method will return false and as such the iteration ends.&lt;/p&gt;

&lt;h2 id='coroutines'&gt;Coroutines&lt;/h2&gt;

&lt;p&gt;The main thing that coroutines add to the above functionality is the ability to send values back to the generator. This turns the one-way communication from the generator to the caller into a two-way channel between the two.&lt;/p&gt;

&lt;p&gt;Values are passed into the coroutine by calling its &lt;code&gt;-&amp;gt;send()&lt;/code&gt; method instead of &lt;code&gt;-&amp;gt;next()&lt;/code&gt;. An example of how this works is the following &lt;code&gt;logger()&lt;/code&gt; coroutine:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;logger&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$fileName&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$fileHandle&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;fopen&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$fileName&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nb'&gt;fwrite&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$fileHandle&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$logger&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;logger&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;__DIR__&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;/log&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nv'&gt;$logger&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;Foo&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nv'&gt;$logger&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;Bar&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As you can see &lt;code&gt;yield&lt;/code&gt; isn&amp;#8217;t used as a statement here, but as an expression, i.e. it has a return value. The return value of &lt;code&gt;yield&lt;/code&gt; is whatever was passed to &lt;code&gt;-&amp;gt;send()&lt;/code&gt;. In this example &lt;code&gt;yield&lt;/code&gt; will first return &lt;code&gt;&amp;#39;Foo&amp;#39;&lt;/code&gt; and then &lt;code&gt;&amp;#39;Bar&amp;#39;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The above is an example where the &lt;code&gt;yield&lt;/code&gt; acts as a mere receiver. It is possible to combine both usages, i.e. to both send &lt;em&gt;and&lt;/em&gt; receive. Here is an example of how this works:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;gen&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$ret&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;yield1&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$ret&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='nv'&gt;$ret&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;yield2&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$ret&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$gen&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;gen&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;current&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;    &lt;span class='c1'&gt;// string(6) &amp;quot;yield1&amp;quot;&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;ret1&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt; &lt;span class='c1'&gt;// string(4) &amp;quot;ret1&amp;quot;   (the first var_dump in gen)&lt;/span&gt;
                              &lt;span class='c1'&gt;// string(6) &amp;quot;yield2&amp;quot; (the var_dump of the -&amp;gt;send() return value)&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;ret2&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt; &lt;span class='c1'&gt;// string(4) &amp;quot;ret2&amp;quot;   (again from within gen)&lt;/span&gt;
                              &lt;span class='c1'&gt;// NULL               (the return value of -&amp;gt;send())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The exact order of the outputs can be a bit hard to understand at first, so make sure that you get why it comes out in exactly this way. There are two things I&amp;#8217;d like to especially point out: First, the use of parentheses around the &lt;code&gt;yield&lt;/code&gt; expression is no accident. These parentheses are required for technical reasons (though I have been considering adding an exception for assignments, just like it exists in Python). Secondly, you may have noticed that &lt;code&gt;-&amp;gt;current()&lt;/code&gt; is used without calling &lt;code&gt;-&amp;gt;rewind()&lt;/code&gt; first. If this is done then the rewind operation is performed implicitly.&lt;/p&gt;

&lt;h2 id='cooperative_multitasking'&gt;Cooperative multitasking&lt;/h2&gt;

&lt;p&gt;If reading the above &lt;code&gt;logger()&lt;/code&gt; example you thought &amp;#8220;Why would I use a coroutine for this? Why can&amp;#8217;t I just use a normal class?&amp;#8221;, then you were totally right. The example demonstrates the basic usage, but there aren&amp;#8217;t really any advantages to using a coroutine in this context. This is the case for a lot of coroutine examples. As already mentioned in the introduction coroutines are a very powerful concept, but their applications are rare and often sufficiently complicated, making it hard to come up with simple and non-contrived examples.&lt;/p&gt;

&lt;p&gt;What I decided to go for in this article is an implementation of cooperative multitasking using coroutines. The problem we&amp;#8217;re trying to solve is that you want to run multiple tasks (or &amp;#8220;programs&amp;#8221;) concurrently. But a processor can only run one task at a time (not considering multi-core for the purposes of this post). Thus the processor needs to switch between the different tasks and always let one run &amp;#8220;for a little while&amp;#8221;.&lt;/p&gt;

&lt;p&gt;The &amp;#8220;cooperative&amp;#8221; part of the term describes how this switching is done: It requires that the currently running task &lt;em&gt;voluntarily&lt;/em&gt; passes back control to the scheduler, so it can run another task. This is in contrast to &amp;#8220;preemptive&amp;#8221; multitasking where the scheduler can interrupt the task after some time whether it likes it or not. Cooperative multitasking was used in early versions of Windows (pre Win95) and Mac OS, but they later switched to using preemption. The reason should be fairly obvious: If you rely on a program to pass back control voluntarily, badly-behaving software can easily occupy the whole CPU for itself, not leaving a share for other tasks.&lt;/p&gt;

&lt;p&gt;At this point you should see the connection between coroutines and task scheduling: The &lt;code&gt;yield&lt;/code&gt; instruction provides a way for a task to interrupt itself and pass control back to the scheduler, so it can run some other task. Furthermore the &lt;code&gt;yield&lt;/code&gt; can be used for communication between the task and the scheduler.&lt;/p&gt;

&lt;p&gt;For our purposes a &amp;#8220;task&amp;#8221; will be a thin wrapper around the coroutine function:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Task&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$taskId&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$coroutine&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$sendValue&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$beforeFirstYield&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$taskId&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Generator&lt;/span&gt; &lt;span class='nv'&gt;$coroutine&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskId&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$taskId&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;coroutine&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$coroutine&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;getTaskId&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskId&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;setSendValue&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$sendValue&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;sendValue&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$sendValue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;run&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;beforeFirstYield&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;beforeFirstYield&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;false&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;coroutine&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;current&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$retval&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;coroutine&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;sendValue&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;sendValue&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$retval&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;isFinished&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;coroutine&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;valid&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;A task will be a coroutine tagged with a task ID. Using the &lt;code&gt;setSendValue()&lt;/code&gt; method you can specify which value will be sent into it on the next resume (you&amp;#8217;ll see what we need this for a bit later). The &lt;code&gt;run()&lt;/code&gt; function really does nothing more than call the &lt;code&gt;send()&lt;/code&gt; method on the coroutine. To understand why the additional &lt;code&gt;beforeFirstYield&lt;/code&gt; flag is needed consider the following snippet:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;gen&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;foo&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;bar&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$gen&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;gen&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;something&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;

&lt;span class='c1'&gt;// As the send() happens before the first yield there is an implicit rewind() call,&lt;/span&gt;
&lt;span class='c1'&gt;// so what really happens is this:&lt;/span&gt;
&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;rewind&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;something&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;

&lt;span class='c1'&gt;// The rewind() will advance to the first yield (and ignore its value), the send() will&lt;/span&gt;
&lt;span class='c1'&gt;// advance to the second yield (and dump its value). Thus we loose the first yielded value!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;By adding the additional &lt;code&gt;beforeFirstYield&lt;/code&gt; condition we can ensure that the value of the first yield is also returned.&lt;/p&gt;

&lt;p&gt;The scheduler now has to do little more than cycle through the tasks and run them:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Scheduler&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$maxTaskId&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$taskMap&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[];&lt;/span&gt; &lt;span class='c1'&gt;// taskId =&amp;gt; task&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$taskQueue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskQueue&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SplQueue&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Generator&lt;/span&gt; &lt;span class='nv'&gt;$coroutine&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$tid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;maxTaskId&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nv'&gt;$task&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Task&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$coroutine&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskMap&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskQueue&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;enqueue&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;run&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskQueue&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;isEmpty&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$task&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskQueue&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;dequeue&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
            &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;run&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;

            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;isFinished&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='nb'&gt;unset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskMap&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;getTaskId&lt;/span&gt;&lt;span class='p'&gt;()]);&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;newTask()&lt;/code&gt; method creates a new task (using the next free task id) and puts it in the task map. Furthermore it schedules the task by putting it in the task queue. The &lt;code&gt;run()&lt;/code&gt; method then walks this task queue and runs the tasks. If a task is finished it is dropped, otherwise it is rescheduled at the end of the queue.&lt;/p&gt;

&lt;p&gt;Lets try out the scheduler with two simple (and very pointless) tasks:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;task1&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;=&lt;/span&gt; &lt;span class='mi'&gt;10&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;This is task 1 iteration &lt;/span&gt;&lt;span class='si'&gt;$i&lt;/span&gt;&lt;span class='s2'&gt;.&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;task2&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;=&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;This is task 2 iteration &lt;/span&gt;&lt;span class='si'&gt;$i&lt;/span&gt;&lt;span class='s2'&gt;.&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$scheduler&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;task1&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;task2&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;

&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;run&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Both tasks will just &lt;code&gt;echo&lt;/code&gt; a message and then pass control back to the scheduler with &lt;code&gt;yield&lt;/code&gt;. This is the resulting output:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;This is task 1 iteration 1.
This is task 2 iteration 1.
This is task 1 iteration 2.
This is task 2 iteration 2.
This is task 1 iteration 3.
This is task 2 iteration 3.
This is task 1 iteration 4.
This is task 2 iteration 4.
This is task 1 iteration 5.
This is task 2 iteration 5.
This is task 1 iteration 6.
This is task 1 iteration 7.
This is task 1 iteration 8.
This is task 1 iteration 9.
This is task 1 iteration 10.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The output is exactly as expected: For the first 5 iterations the tasks alternate, then the second task finishes and only the first task continues to run.&lt;/p&gt;

&lt;h2 id='communicating_with_the_scheduler'&gt;Communicating with the scheduler&lt;/h2&gt;

&lt;p&gt;Now that the scheduler works we can turn to the next point on the agenda: Communication between the tasks and the scheduler. We will use the same method that processes use to talk to the operating system: Through system calls. The reason we need syscalls is that the operating system is on a different privilege level than the processes. So in order to perform privileged actions (like killing another process) there has to be some way to pass control back to the kernel, so it can perform said actions. Internally this is once again implemented using interruption instructions. Historically the generic &lt;code&gt;int&lt;/code&gt; instruction was used, nowadays there are more specialized and faster &lt;code&gt;syscall&lt;/code&gt;/&lt;code&gt;sysenter&lt;/code&gt; instructions.&lt;/p&gt;

&lt;p&gt;Our task scheduling system will reflect this design: Instead of simply passing the scheduler into the task (and thus allowing it to do whatever it wants) we will communicate via system calls passed through the &lt;code&gt;yield&lt;/code&gt; expression. The &lt;code&gt;yield&lt;/code&gt; here will act both as an interrupt and as a way to pass information to (and from) the scheduler.&lt;/p&gt;

&lt;p&gt;To represent a system call I&amp;#8217;ll use a small wrapper around a callable:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;SystemCall&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$callback&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;callable&lt;/span&gt; &lt;span class='nv'&gt;$callback&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;callback&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$callback&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__invoke&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt; &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$callback&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;callback&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// Can&amp;#39;t call it directly in PHP :/&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$callback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;It will behave just like any callable (using &lt;code&gt;__invoke&lt;/code&gt;), but tells the scheduler to pass the calling task and itself into the function. To handle it we have to slightly modify the scheduler&amp;#8217;s &lt;code&gt;run&lt;/code&gt; method:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;run&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskQueue&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;isEmpty&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$task&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskQueue&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;dequeue&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
        &lt;span class='nv'&gt;$retval&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;run&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;

        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$retval&lt;/span&gt; &lt;span class='nx'&gt;instanceof&lt;/span&gt; &lt;span class='nx'&gt;SystemCall&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$retval&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='k'&gt;continue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;isFinished&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nb'&gt;unset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskMap&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;getTaskId&lt;/span&gt;&lt;span class='p'&gt;()]);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The first system call will do nothing more than return the task ID:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;getTaskId&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SystemCall&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt; &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;setSendValue&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;getTaskId&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
        &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;});&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;It does so by setting the tid as next send value and rescheduling the task. For system calls the scheduler does not automatically reschedule the task, we need to do it manually (you&amp;#8217;ll see why a bit later). Using this new syscall we can rewrite the previous example:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;task&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$max&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$tid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;getTaskId&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt; &lt;span class='c1'&gt;// &amp;lt;-- here&amp;#39;s the syscall!&lt;/span&gt;
    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;=&lt;/span&gt; &lt;span class='nv'&gt;$max&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;This is task &lt;/span&gt;&lt;span class='si'&gt;$tid&lt;/span&gt;&lt;span class='s2'&gt; iteration &lt;/span&gt;&lt;span class='si'&gt;$i&lt;/span&gt;&lt;span class='s2'&gt;.&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$scheduler&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;task&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;10&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;task&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;

&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;run&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This will give the same output as with the previous example. Notice how the system call is basically done like any other call, but with a prepended &lt;code&gt;yield&lt;/code&gt;. Two more syscalls for creating new tasks and killing them again:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Generator&lt;/span&gt; &lt;span class='nv'&gt;$coroutine&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SystemCall&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
        &lt;span class='k'&gt;function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt; &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;use&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$coroutine&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;setSendValue&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$coroutine&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
            &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;killTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SystemCall&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
        &lt;span class='k'&gt;function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt; &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;use&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;setSendValue&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;killTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
            &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;killTask&lt;/code&gt; function needs an additional method in the scheduler:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;killTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nb'&gt;isset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskMap&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;]))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;false&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='nb'&gt;unset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskMap&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;]);&lt;/span&gt;

    &lt;span class='c1'&gt;// This is a bit ugly and could be optimized so it does not have to walk the queue,&lt;/span&gt;
    &lt;span class='c1'&gt;// but assuming that killing tasks is rather rare I won&amp;#39;t bother with it now&lt;/span&gt;
    &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskQueue&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;getTaskId&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;===&lt;/span&gt; &lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nb'&gt;unset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskQueue&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;]);&lt;/span&gt;
            &lt;span class='k'&gt;break&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;A small script to test the new functionality:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;childTask&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$tid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;getTaskId&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
    &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Child task &lt;/span&gt;&lt;span class='si'&gt;$tid&lt;/span&gt;&lt;span class='s2'&gt; still alive!&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;task&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$tid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;getTaskId&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
    &lt;span class='nv'&gt;$childTid&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;childTask&lt;/span&gt;&lt;span class='p'&gt;()));&lt;/span&gt;

    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;=&lt;/span&gt; &lt;span class='mi'&gt;6&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Parent task &lt;/span&gt;&lt;span class='si'&gt;$tid&lt;/span&gt;&lt;span class='s2'&gt; iteration &lt;/span&gt;&lt;span class='si'&gt;$i&lt;/span&gt;&lt;span class='s2'&gt;.&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;killTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$childTid&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$scheduler&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;task&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;run&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This will print the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Parent task 1 iteration 1.
Child task 2 still alive!
Parent task 1 iteration 2.
Child task 2 still alive!
Parent task 1 iteration 3.
Child task 2 still alive!
Parent task 1 iteration 4.
Parent task 1 iteration 5.
Parent task 1 iteration 6.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The child is killed after three iterations, so that&amp;#8217;s when the &amp;#8220;Child is still alive&amp;#8221; messages end. One should probably point about that this is not a real parent/child relationship, because the child can continue running even after the parent finished. Or the child could kill the parent. One could modify the scheduler to have a more hierarchic task structure, but I won&amp;#8217;t implement that in this article.&lt;/p&gt;

&lt;p&gt;There are many more process management calls one could implement, for example &lt;code&gt;wait&lt;/code&gt; (which waits until a task has finished running), &lt;code&gt;exec&lt;/code&gt; (which replaces the current task) and &lt;code&gt;fork&lt;/code&gt; (which creates a clone of the current task). Forking is pretty cool and you can actually implement it with PHP&amp;#8217;s coroutines, because they support cloning.&lt;/p&gt;

&lt;p&gt;But I&amp;#8217;ll leave these for the interested reader. Instead lets get to the next topic!&lt;/p&gt;

&lt;h2 id='nonblocking_io'&gt;Non-Blocking IO&lt;/h2&gt;

&lt;p&gt;A really cool application of our task management system obviously is &amp;#8230; a web server. There could be one task listening a socket for new connections and whenever a new connection is made it would create a new task handling that connection.&lt;/p&gt;

&lt;p&gt;The hard part about this is that normally socket operations like reading data are blocking, i.e. PHP will wait until the client has finished sending. For a web-server that&amp;#8217;s obviously not good at all: It would mean that it can only handle a single connection at a time.&lt;/p&gt;

&lt;p&gt;The solution is to make sure that the socket is &amp;#8220;ready&amp;#8221; before actually reading/writing to it. To find out which sockets are ready to read from or write to the &lt;a href='http://de3.php.net/manual/en/function.stream-select.php'&gt;&lt;code&gt;stream_select&lt;/code&gt;&lt;/a&gt; function can be used.&lt;/p&gt;

&lt;p&gt;First, lets add two new syscalls, which will cause a task to wait until a certain socket is ready:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;waitForRead&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SystemCall&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
        &lt;span class='k'&gt;function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt; &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;use&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitForRead&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;waitForWrite&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SystemCall&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
        &lt;span class='k'&gt;function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt; &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;use&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitForWrite&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;These syscalls are just proxies to the respective methods in the scheduler:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='c1'&gt;// resourceID =&amp;gt; [socket, tasks]&lt;/span&gt;
&lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$waitingForRead&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[];&lt;/span&gt;
&lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$waitingForWrite&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[];&lt;/span&gt;

&lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;waitForRead&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;isset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForRead&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;]))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForRead&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;][&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;][]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForRead&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;]];&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;waitForWrite&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;isset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForWrite&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;]))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForWrite&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;][&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;][]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForWrite&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;]];&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;waitingForRead&lt;/code&gt; and &lt;code&gt;waitingForWrite&lt;/code&gt; properties are just arrays containing the sockets to wait for and the tasks that are waiting for them. The interesting part is the following method, which actually checks whether the sockets are ready and reschedules the respective tasks:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;ioPoll&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$timeout&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$rSocks&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[];&lt;/span&gt;
    &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForRead&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='k'&gt;list&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$rSocks&lt;/span&gt;&lt;span class='p'&gt;[]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='nv'&gt;$wSocks&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[];&lt;/span&gt;
    &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForWrite&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='k'&gt;list&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$wSocks&lt;/span&gt;&lt;span class='p'&gt;[]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='nv'&gt;$eSocks&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[];&lt;/span&gt; &lt;span class='c1'&gt;// dummy&lt;/span&gt;

    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nb'&gt;stream_select&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$rSocks&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$wSocks&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$eSocks&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$timeout&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$rSocks&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;list&lt;/span&gt;&lt;span class='p'&gt;(,&lt;/span&gt; &lt;span class='nv'&gt;$tasks&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForRead&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;];&lt;/span&gt;
        &lt;span class='nb'&gt;unset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForRead&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;]);&lt;/span&gt;

        &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tasks&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$wSocks&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;list&lt;/span&gt;&lt;span class='p'&gt;(,&lt;/span&gt; &lt;span class='nv'&gt;$tasks&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForWrite&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;];&lt;/span&gt;
        &lt;span class='nb'&gt;unset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;waitingForWrite&lt;/span&gt;&lt;span class='p'&gt;[(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;]);&lt;/span&gt;

        &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tasks&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;stream_select&lt;/code&gt; function takes arrays of read, write and except sockets to check (we&amp;#8217;ll ignore that last category). The arrays are passed by reference and the function will only leave those elements in the arrays that changed state. We can then walk over those arrays and reschedule all tasks associated with them.&lt;/p&gt;

&lt;p&gt;In order to regularly perform the above polling action we&amp;#8217;ll add a special task in the scheduler:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;ioPollTask&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;taskQueue&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;isEmpty&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;ioPoll&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;ioPoll&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This task needs to be registered at some point, e.g. one could add &lt;code&gt;$this-&amp;gt;newTask($this-&amp;gt;ioPollTask())&lt;/code&gt; to the start of the &lt;code&gt;run()&lt;/code&gt; method. Then it will work just like any other task, performing the polling operation once every full task cycle (this isn&amp;#8217;t necessarily the best way to handle it). The &lt;code&gt;ioPollTask&lt;/code&gt; will call &lt;code&gt;ioPoll&lt;/code&gt; with a &lt;code&gt;0&lt;/code&gt; second timeout, which means that &lt;code&gt;stream_select&lt;/code&gt; will return right away (rather than waiting).&lt;/p&gt;

&lt;p&gt;Only if the task queue is empty we use a &lt;code&gt;null&lt;/code&gt; timeout, which means that it will wait until some socket becomes ready. If we wouldn&amp;#8217;t do this the polling task would just run again and again and again until a new connection is made. This would result in 100% CPU usage. It&amp;#8217;s much more efficient to let the operating system do the waiting instead.&lt;/p&gt;

&lt;p&gt;Writing the server is relatively easy now:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;server&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$port&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Starting server at port &lt;/span&gt;&lt;span class='si'&gt;$port&lt;/span&gt;&lt;span class='s2'&gt;...&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='nv'&gt;$socket&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;@&lt;/span&gt;&lt;span class='nx'&gt;stream_socket_server&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;tcp://localhost:&lt;/span&gt;&lt;span class='si'&gt;$port&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$errNo&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$errStr&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Exception&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$errStr&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$errNo&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='nx'&gt;stream_set_blocking&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;waitForRead&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='nv'&gt;$clientSocket&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;stream_socket_accept&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;handleClient&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$clientSocket&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;handleClient&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;waitForRead&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='nv'&gt;$data&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;fread&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;8192&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='nv'&gt;$msg&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Received following request:&lt;/span&gt;&lt;span class='se'&gt;\n\n&lt;/span&gt;&lt;span class='si'&gt;$data&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='nv'&gt;$msgLength&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;strlen&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$msg&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='nv'&gt;$response&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;lt;&amp;lt;&amp;lt;RES&lt;/span&gt;
&lt;span class='s'&gt;HTTP/1.1 200 OK\r&lt;/span&gt;
&lt;span class='s'&gt;Content-Type: text/plain\r&lt;/span&gt;
&lt;span class='s'&gt;Content-Length: $msgLength\r&lt;/span&gt;
&lt;span class='s'&gt;Connection: close\r&lt;/span&gt;
&lt;span class='s'&gt;\r&lt;/span&gt;
&lt;span class='s'&gt;$msg&lt;/span&gt;
&lt;span class='s'&gt;RES;&lt;/span&gt;

    &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;waitForWrite&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='nb'&gt;fwrite&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$response&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='nb'&gt;fclose&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$scheduler&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;server&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;8000&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;run&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This will accept connections to &lt;code&gt;localhost:8000&lt;/code&gt; and just send back a HTTP response with whatever it was sent. Doing anything &amp;#8220;real&amp;#8221; would be a lot more complicated (properly handling HTTP requests is way outside the scope of this article). The above snippet just demos the general concept.&lt;/p&gt;

&lt;p&gt;You can try the server out using something like &lt;code&gt;ab -n 10000 -c 100 localhost:8000/&lt;/code&gt;. This will send 10000 requests to it with 100 of them arriving concurrently. Using these numbers I get a median response time of 10ms. But there is an issue with a few requests being handled &lt;em&gt;really&lt;/em&gt; slowly (like 5 seconds), that&amp;#8217;s why the total throughput is only 2000 reqs/s (with a 10ms response time it should be more like 10000 reqs/s). With higher concurrency count (e.g. &lt;code&gt;-c 500&lt;/code&gt;) it mostly still works well, but some connections will throw a &amp;#8220;Connection reset by peer&amp;#8221; error. As I know very little about this low-level socket stuff I didn&amp;#8217;t try to figure out what the issue is.&lt;/p&gt;

&lt;h2 id='stacked_coroutines'&gt;Stacked coroutines&lt;/h2&gt;

&lt;p&gt;If you would try to build some larger system using our scheduling system you would soon run into a problem: We are used to breaking up code into smaller functions and calling them. But with coroutines this is no longer possible. E.g. consider the following code:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;echoTimes&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$msg&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$max&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;=&lt;/span&gt; &lt;span class='nv'&gt;$max&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='si'&gt;$msg&lt;/span&gt;&lt;span class='s2'&gt; iteration &lt;/span&gt;&lt;span class='si'&gt;$i&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;task&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nx'&gt;echoTimes&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;foo&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;10&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// print foo ten times&lt;/span&gt;
    &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;---&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='nx'&gt;echoTimes&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;bar&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// print bar five times&lt;/span&gt;
    &lt;span class='nx'&gt;yield&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// force it to be a coroutine&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$scheduler&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;task&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;
&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;run&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This code tries to put the recurring &amp;#8220;output n times&amp;#8221; code into a separate coroutine and then invoke it from the main task. But this won&amp;#8217;t work. As mentioned at the very beginning of this article calling a generator (or coroutine) will not actually do anything, it will only return an object. This also happens in the above case. The &lt;code&gt;echoTimes&lt;/code&gt; calls won&amp;#8217;t do anything than return an (unused) coroutine object.&lt;/p&gt;

&lt;p&gt;In order to still allow this we need to write a small wrapper around the bare coroutines. I&amp;#8217;ll call this a &amp;#8220;stacked coroutine&amp;#8221; because it will manage a stack of nested coroutine calls. It will be possible to call sub-coroutines by yielding them:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$retval = (yield someCoroutine($foo, $bar));&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The subcoroutines will also be able to return a value, again by using &lt;code&gt;yield&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;yield retval(&amp;quot;I&amp;#39;m a return value!&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;retval&lt;/code&gt; function does nothing more than returning a wrapper around the value which will signal that it&amp;#8217;s a return value:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;CoroutineReturnValue&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;value&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;getValue&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;retval&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;CoroutineReturnValue&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;In order to turn a coroutine into a stacked coroutine (which supports subcalls) we&amp;#8217;ll have to write another function (which is &lt;em&gt;obviously&lt;/em&gt; yet-another-coroutine):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;stackedCoroutine&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Generator&lt;/span&gt; &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$stack&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SplStack&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(;;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$value&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;current&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;

        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt; &lt;span class='nx'&gt;instanceof&lt;/span&gt; &lt;span class='nx'&gt;Generator&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$stack&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;push&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='nv'&gt;$gen&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='k'&gt;continue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='nv'&gt;$isReturnValue&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt; &lt;span class='nx'&gt;instanceof&lt;/span&gt; &lt;span class='nx'&gt;CoroutineReturnValue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;valid&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;||&lt;/span&gt; &lt;span class='nv'&gt;$isReturnValue&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$stack&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;isEmpty&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='k'&gt;return&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;

            &lt;span class='nv'&gt;$gen&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$stack&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;pop&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
            &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$isReturnValue&lt;/span&gt; &lt;span class='o'&gt;?&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;getValue&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;:&lt;/span&gt; &lt;span class='k'&gt;NULL&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='k'&gt;continue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;key&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This function acts as a simple proxy between the caller and the currently running subcoroutine. This is handled in the &lt;code&gt;$gen-&amp;gt;send(yield $gen-&amp;gt;key() =&amp;gt; $value);&lt;/code&gt; line. Additionally it checks whether a return value is a generator, in which case it will start running it and pushes the previous coroutine on the stack. Once it gets a &lt;code&gt;CoroutineReturnValue&lt;/code&gt; it will pop the stack again and continue executing the previous coroutine.&lt;/p&gt;

&lt;p&gt;In order to make the stacked coroutines usable in tasks the &lt;code&gt;$this-&amp;gt;coroutine = $coroutine;&lt;/code&gt; line in the &lt;code&gt;Task&lt;/code&gt; constructor needs to be replaced with &lt;code&gt;$this-&amp;gt;coroutine = stackedCoroutine($coroutine);&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now we can improve the webserver example from above a bit by grouping the wait+read (and wait+write and wait+accept) actions into functions. To group the related functionality I&amp;#8217;ll use a class:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;CoSocket&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;socket&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;accept&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;waitForRead&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;socket&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;retval&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;CoSocket&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;stream_socket_accept&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;)));&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;read&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$size&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;waitForRead&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;socket&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;retval&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;fread&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$size&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;write&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;waitForWrite&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;socket&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='nb'&gt;fwrite&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;close&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='o'&gt;@&lt;/span&gt;&lt;span class='nb'&gt;fclose&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;socket&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Now the server can be rewritten a bit cleaner:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;server&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$port&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Starting server at port &lt;/span&gt;&lt;span class='si'&gt;$port&lt;/span&gt;&lt;span class='s2'&gt;...&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='nv'&gt;$socket&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;@&lt;/span&gt;&lt;span class='nx'&gt;stream_socket_server&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;tcp://localhost:&lt;/span&gt;&lt;span class='si'&gt;$port&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$errNo&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$errStr&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Exception&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$errStr&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$errNo&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='nx'&gt;stream_set_blocking&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='nv'&gt;$socket&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;CoSocket&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;newTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
            &lt;span class='nx'&gt;handleClient&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;accept&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt;
        &lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;handleClient&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$data&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;read&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;8192&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;

    &lt;span class='nv'&gt;$msg&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Received following request:&lt;/span&gt;&lt;span class='se'&gt;\n\n&lt;/span&gt;&lt;span class='si'&gt;$data&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='nv'&gt;$msgLength&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;strlen&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$msg&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='nv'&gt;$response&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;lt;&amp;lt;&amp;lt;RES&lt;/span&gt;
&lt;span class='s'&gt;HTTP/1.1 200 OK\r&lt;/span&gt;
&lt;span class='s'&gt;Content-Type: text/plain\r&lt;/span&gt;
&lt;span class='s'&gt;Content-Length: $msgLength\r&lt;/span&gt;
&lt;span class='s'&gt;Connection: close\r&lt;/span&gt;
&lt;span class='s'&gt;\r&lt;/span&gt;
&lt;span class='s'&gt;$msg&lt;/span&gt;
&lt;span class='s'&gt;RES;&lt;/span&gt;

    &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;write&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$response&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nv'&gt;$socket&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;close&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;h2 id='error_handling'&gt;Error handling&lt;/h2&gt;

&lt;p&gt;As a good programmer you obviously noticed that the above examples all lack error handling. Pretty much every socket operation is fallible and can produce errors. I obviously did this because error handling is really tedious (especially for sockets!) and would easily blow up the code size by a few factors.&lt;/p&gt;

&lt;p&gt;But still I&amp;#8217;d like to cover how error handling for coroutines works in general: Coroutines provide the ability to throw exceptions &lt;em&gt;inside&lt;/em&gt; them using the &lt;code&gt;throw()&lt;/code&gt; method. As of this writing this method does not yet exist in PHP&amp;#8217;s implementation, but I will commit it later today.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;throw()&lt;/code&gt; method takes an exception and throws it at the current suspension point in the coroutine. Consider this code:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;gen&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Foo&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;try&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;catch&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Exception&lt;/span&gt; &lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Exception: &lt;/span&gt;&lt;span class='si'&gt;{&lt;/span&gt;&lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;getMessage&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;&lt;span class='si'&gt;}&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;Bar&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$gen&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;gen&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;rewind&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;                     &lt;span class='c1'&gt;// echos &amp;quot;Foo&amp;quot;&lt;/span&gt;
&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;throw&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Exception&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;Test&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt; &lt;span class='c1'&gt;// echos &amp;quot;Exception: Test&amp;quot;&lt;/span&gt;
                                    &lt;span class='c1'&gt;// and &amp;quot;Bar&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This is really awesome for our purposes, because we can make system calls and subcoroutine calls throw exceptions. For the system calls the &lt;code&gt;Scheduler::run()&lt;/code&gt; method needs a small adjustment:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$retval&lt;/span&gt; &lt;span class='nx'&gt;instanceof&lt;/span&gt; &lt;span class='nx'&gt;SystemCall&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;try&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$retval&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;catch&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Exception&lt;/span&gt; &lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;setException&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='k'&gt;continue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;And the &lt;code&gt;Task&lt;/code&gt; class needs to handle &lt;code&gt;throw&lt;/code&gt; calls too:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Task&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='c1'&gt;// ...&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$exception&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;setException&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$exception&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;exception&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$exception&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;run&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;beforeFirstYield&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;beforeFirstYield&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;false&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;coroutine&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;current&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;elseif&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;exception&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$retval&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;coroutine&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;throw&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;exception&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;exception&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$retval&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nv'&gt;$retval&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;coroutine&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;sendValue&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;sendValue&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$retval&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='c1'&gt;// ...&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Now we can start throwing exceptions from system calls! E.g. for the &lt;code&gt;killTask&lt;/code&gt; call, lets throw an exception if the passed task ID is invalid:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;killTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SystemCall&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
        &lt;span class='k'&gt;function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Task&lt;/span&gt; &lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Scheduler&lt;/span&gt; &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;use&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;killTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tid&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='nv'&gt;$scheduler&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;schedule&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$task&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;InvalidArgumentException&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;Invalid task ID!&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Try it out:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;task&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;try&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nx'&gt;killTask&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;500&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;catch&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Exception&lt;/span&gt; &lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;Tried to kill task 500 but failed: &amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;getMessage&lt;/span&gt;&lt;span class='p'&gt;(),&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Sadly this won&amp;#8217;t work properly yet, because the &lt;code&gt;stackedCoroutine&lt;/code&gt; function doesn&amp;#8217;t handle the exception correctly. To fix it the function needs some modifications:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;stackedCoroutine&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Generator&lt;/span&gt; &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$stack&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SplStack&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='nv'&gt;$exception&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(;;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;try&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$exception&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;throw&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$exception&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
                &lt;span class='nv'&gt;$exception&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
                &lt;span class='k'&gt;continue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;

            &lt;span class='nv'&gt;$value&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;current&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;

            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt; &lt;span class='nx'&gt;instanceof&lt;/span&gt; &lt;span class='nx'&gt;Generator&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='nv'&gt;$stack&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;push&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
                &lt;span class='nv'&gt;$gen&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
                &lt;span class='k'&gt;continue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;

            &lt;span class='nv'&gt;$isReturnValue&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt; &lt;span class='nx'&gt;instanceof&lt;/span&gt; &lt;span class='nx'&gt;CoroutineReturnValue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;valid&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;||&lt;/span&gt; &lt;span class='nv'&gt;$isReturnValue&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$stack&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;isEmpty&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                    &lt;span class='k'&gt;return&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
                &lt;span class='p'&gt;}&lt;/span&gt;

                &lt;span class='nv'&gt;$gen&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$stack&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;pop&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
                &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$isReturnValue&lt;/span&gt; &lt;span class='o'&gt;?&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;getValue&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;:&lt;/span&gt; &lt;span class='k'&gt;NULL&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
                &lt;span class='k'&gt;continue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;

            &lt;span class='k'&gt;try&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='nv'&gt;$sendValue&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;key&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;catch&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Exception&lt;/span&gt; &lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;throw&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
                &lt;span class='k'&gt;continue&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;

            &lt;span class='nv'&gt;$gen&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;send&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$sendValue&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;catch&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Exception&lt;/span&gt; &lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$stack&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;isEmpty&lt;/span&gt;&lt;span class='p'&gt;())&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;

            &lt;span class='nv'&gt;$gen&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$stack&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;pop&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
            &lt;span class='nv'&gt;$exception&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$e&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;h2 id='wrapping_up'&gt;Wrapping up&lt;/h2&gt;

&lt;p&gt;In this article we built a task scheduler using cooperative multitasking, including the ability to perform &amp;#8220;system calls&amp;#8221;, doing non-blocking IO operations and handling errors. The really cool thing about all this is that the resulting code for the tasks looks totally synchronous, even though it is performing a lot of asynchronous operations. If you want to read data from a socket you don&amp;#8217;t have to pass some callback or register an event listener. Instead you write &lt;code&gt;yield $socket-&amp;gt;read()&lt;/code&gt;. Which is basically what you would normally do too, just with a &lt;code&gt;yield&lt;/code&gt; in front of it.&lt;/p&gt;

&lt;p&gt;When I first heard about all this I found this concept &lt;em&gt;totally awesome&lt;/em&gt; and that&amp;#8217;s what motivated me to implement it in PHP. At the same time I find coroutines really scary. There is a thin line between awesome code and a total mess and I think coroutines sit exactly on that line. It&amp;#8217;s hard for me to say whether writing async code in the way outlined above is really beneficial.&lt;/p&gt;

&lt;p&gt;In any case, I think it&amp;#8217;s an interesting topic and I hope you found it interesting too. Comments welcome :)&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Are PHP developers functophobic?</title>
      <link>http://nikic.github.com/2012/08/10/Are-PHP-developers-functophobic.html</link>
      <pubDate>Fri, 10 Aug 2012 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/08/10/Are-PHP-developers-functophobic</guid>
      <description>&lt;p&gt;There is this one thing that I noticed recently and that concerns me: PHP devs don&amp;#8217;t use functions.&lt;/p&gt;

&lt;p&gt;Now, that was overly general, so let me clarify: PHP developers who have reached a certain degree of sophistication basically stop using plain functions - instead everything goes all classes and methods. At least that&amp;#8217;s the observation I made when looking at various open-source libraries and frameworks. The only type of function you&amp;#8217;ll find in any of the &amp;#8220;high-quality&amp;#8221; libs are anonymous functions. But that&amp;#8217;s pretty much it.&lt;/p&gt;

&lt;p&gt;If you look at other languages, the landscape is different. For example Python code will to a large part be class definitions, but there will always be normal functions in between.&lt;/p&gt;

&lt;p&gt;So in PHP functions are basically used in two cases: If the programmer doesn&amp;#8217;t yet use OOP or if the script isn&amp;#8217;t worth using OOP for. The notion of &amp;#8220;helper function&amp;#8221; or something like that does not seem to exist.&lt;/p&gt;

&lt;p&gt;This somewhat concerns me because I think that sticking everything into classes isn&amp;#8217;t really the right thing to do. This kind of behavior usually leads to Class-Oriented Programming, where you are basically just using classes for the sake of using them. (Reminder: Using classes does &lt;em&gt;not&lt;/em&gt; mean your code is object-oriented!)&lt;/p&gt;

&lt;p&gt;I also think that the reluctance to define helper functions is what leads to the push for more and more functions in the PHP core. E.g. I recently have seen a few people asking for some kind of &lt;code&gt;array_rand_value&lt;/code&gt; function. This function would basically be the same as &lt;code&gt;array_rand&lt;/code&gt;, but it would return a value instead of a key. So it could be defined in userland code in two lines:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;array_rand_value&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;array&lt;/span&gt; &lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;empty&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nb'&gt;array_rand&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;)];&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This is a tiny small function that you can easily define yourself. But then arises what I think is the main problem: Where should you put it? How do I incorporate it with the autoloader?&lt;/p&gt;

&lt;p&gt;PHP to large parts inherited the Java OOP model and one thing that came with it is having a one-to-one mapping between classes and files. PHP does not enforce this, but it is a very common convention. PHP&amp;#8217;s autoloading support and the PSR-0 standard are emphasizing this further.&lt;/p&gt;

&lt;p&gt;Other languages similar to PHP do not have this convention. For example in Python it is very normal to define multiple related classes and functions in one file. In Python a file is rather a module, i.e. a group of related components. In such a setup it is obviously much easier to define small functions in between.&lt;/p&gt;

&lt;p&gt;Generally I think that the one-to-one file mapping so common in PHP is problematic. Apart from making it hard to use functions it also adds an additional cost to creating small classes:&lt;/p&gt;

&lt;p&gt;Object oriented programming done right usually leads to a large number of small classes. This is great for code reuse, maintainability and testing.&lt;/p&gt;

&lt;p&gt;But in PHP you have to create a new file for every one of those classes. And this is really getting in my way. I&amp;#8217;d have no problem batch-defining ten tiny classes in one file. But I feel like creating ten distinct files for them is counterproductive (and hampers maintainability).&lt;/p&gt;

&lt;p&gt;Another tangentially related &amp;#8220;standard&amp;#8221; practice I feel badly about is the excessive use of doccomments. In most cases phpdoc comments just comment the obvious - at the same increasing the code size by a factor of two or three. I have no problem with doccomments &lt;em&gt;where necessary&lt;/em&gt;, but in most cases (at least with well designed code) the behavior should be obvious from the method and parameter names.&lt;/p&gt;

&lt;p&gt;So, basically, what I&amp;#8217;m thinking about here is being more &amp;#8220;compact&amp;#8221;: Instead of defining a file for every class, put related (smaller) classes into one file. Instead of defining a class for everything, just use a function. Instead of cluttering the code with lots of useless doccomments, just leave them out unless really necessary.&lt;/p&gt;

&lt;p&gt;But it&amp;#8217;s just a thought ;) Maybe I got it all wrong.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>How to add new (syntactic) features to PHP</title>
      <link>http://nikic.github.com/2012/07/27/How-to-add-new-syntactic-features-to-PHP.html</link>
      <pubDate>Fri, 27 Jul 2012 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/07/27/How-to-add-new-syntactic-features-to-PHP</guid>
      <description>&lt;p&gt;Several people have recently asked me where you should start if you want to add some new (syntactic) feature to PHP. As I&amp;#8217;m not aware of any existing tutorials on that matter, I&amp;#8217;ll try to illustrate the whole process in the following. At the same time this is a general introduction to the workings of the Zend Engine. So upfront: I apologize for this overly long post.&lt;/p&gt;

&lt;p&gt;This post assumes that you already have some basic knowledge of C and also know the fundamental concepts of the PHP implementation (like zvals). If not, you should read up on them beforehand.&lt;/p&gt;

&lt;p&gt;As an example I&amp;#8217;ll use the addition of an &lt;code&gt;in&lt;/code&gt; operator which you might already know from other languages like Python. It works as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$words = [&amp;#39;hello&amp;#39;, &amp;#39;world&amp;#39;, &amp;#39;foo&amp;#39;, &amp;#39;bar&amp;#39;];
var_dump(&amp;#39;hello&amp;#39; in $words); // true
var_dump(&amp;#39;foo&amp;#39; in $words);   // true
var_dump(&amp;#39;blub&amp;#39; in $words);  // false

$string = &amp;#39;PHP is fun!&amp;#39;;
var_dump(&amp;#39;PHP&amp;#39; in $string);    // true
var_dump(&amp;#39;Python&amp;#39; in $string); // false&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So basically, for arrays the &lt;code&gt;in&lt;/code&gt; operator is the same as the &lt;code&gt;in_array&lt;/code&gt; function (but without the needle/haystack problem) and for strings it&amp;#8217;s like doing a &lt;code&gt;false !== strpos($str2, $str1)&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id='prerequisites'&gt;Prerequisites&lt;/h2&gt;

&lt;p&gt;Before we can get going, you&amp;#8217;ll have to first check out and compile PHP. To do so, you need a few tools. Most of them are probably already installed on your system, but you may need to install &amp;#8220;re2c&amp;#8221; and &amp;#8220;bison&amp;#8221; using the package manager of your choice. On Ubuntu you&amp;#8217;d do this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo apt-get install re2c
$ sudo apt-get install bison&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Next, clone php-src from git and compile it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// get source code
$ git clone http://git.php.net/repository/php-src.git
$ cd php-src
// create new branch for in operator
$ git checkout -b addInOperator
// build ./configure script
$ ./buildconf
// configure PHP in debug mode and with thread safety
$ ./configure --disable-all --enable-debug --enable-maintainer-zts
// compile (4 is the number of cores you have)
$ make -j4&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The PHP binary should now be available in &lt;code&gt;sapi/cli/php&lt;/code&gt;. You can try to do a few things:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sapi/cli/php -v
$ sapi/cli/php -r &amp;#39;echo &amp;quot;Hallo World!&amp;quot;;&amp;#39;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now that you have a (hopefully) working PHP compile, we&amp;#8217;ll take a look at what PHP actually does when it runs a script.&lt;/p&gt;

&lt;h2 id='the_life_of_a_php_script'&gt;The life of a PHP script&lt;/h2&gt;

&lt;p&gt;To run a script PHP goes through three main phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tokenization&lt;/li&gt;

&lt;li&gt;Parsing &amp;amp; Compilation&lt;/li&gt;

&lt;li&gt;Execution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the following I&amp;#8217;ll explain what exactly is done in each phase, how it is implemented and what we need to change in order to get the &lt;code&gt;in&lt;/code&gt; operator working.&lt;/p&gt;

&lt;h2 id='tokenization'&gt;Tokenization&lt;/h2&gt;

&lt;p&gt;In the first phase PHP reads in the source code and breaks it down into smaller units called &amp;#8220;tokens&amp;#8221;. For example the PHP code &lt;code&gt;&amp;lt;?php echo &amp;quot;Hello World!&amp;quot;;&lt;/code&gt; would be broken down to the following tokens:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;T_OPEN_TAG (&amp;lt;?php )
T_ECHO (echo)
T_WHITESPACE ( )
T_CONSTANT_ENCAPSED_STRING (&amp;quot;Hello World!&amp;quot;)
&amp;#39;;&amp;#39;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As you can see the raw source code was broken down into semantically meaningful tokens. The process of doing so is referred to as tokenization, lexing or scanning and is implemented in the &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_language_scanner.l'&gt;&lt;code&gt;zend_language_scanner.l&lt;/code&gt;&lt;/a&gt; file of the &lt;code&gt;Zend/&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;If you open the file and scroll down a bit (to somewhere around line 1000), you&amp;#8217;ll find a large number of token definitions that look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;ST_IN_SCRIPTING&amp;gt;&amp;quot;exit&amp;quot; {
    return T_EXIT;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The meaning should be rather obvious: If &lt;code&gt;exit&lt;/code&gt; is encountered in the source code, the lexer should tag it as &lt;code&gt;T_EXIT&lt;/code&gt;. The content between &lt;code&gt;&amp;lt;&lt;/code&gt; and &lt;code&gt;&amp;gt;&lt;/code&gt; is the state that the text should be matched in. &lt;code&gt;ST_IN_SCRIPTING&lt;/code&gt; is the normal state for PHP code. Some examples of other states are &lt;code&gt;ST_DOUBLE_QUOTE&lt;/code&gt; (in double quoted string), &lt;code&gt;ST_HEREDOC&lt;/code&gt; (in heredoc string), etc.&lt;/p&gt;

&lt;p&gt;Another thing that can be done in the scanning routines is specifying a &amp;#8220;semantic&amp;#8221; value (also called &amp;#8220;lower value&amp;#8221; or &amp;#8220;lval&amp;#8221; for short). Here is an example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;ST_IN_SCRIPTING,ST_VAR_OFFSET&amp;gt;{LABEL} {
    zend_copy_value(zendlval, yytext, yyleng);
    zendlval-&amp;gt;type = IS_STRING;
    return T_STRING;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;{LABEL}&lt;/code&gt; matches a PHP identifier (it is defined as &lt;code&gt;[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*&lt;/code&gt;) and the code then returns the token &lt;code&gt;T_STRING&lt;/code&gt;. Additionally it copies the text of the token into &lt;code&gt;zendlval&lt;/code&gt;. So if the lexer encounters an identifier like &lt;code&gt;FooBarClass&lt;/code&gt; it&amp;#8217;ll set &lt;code&gt;FooBarClass&lt;/code&gt; as the lval. The same is also done for strings, numbers, variable names, etc.&lt;/p&gt;

&lt;p&gt;Luckily the &lt;code&gt;in&lt;/code&gt; operator does not require in-depth knowledge of the lexer. We just have to add this code snippet somewhere in the file (analogous to the &lt;code&gt;exit&lt;/code&gt; token above):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;ST_IN_SCRIPTING&amp;gt;&amp;quot;in&amp;quot; {
    return T_IN;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Furthermore we have to let the engine know that we added a new token. For this open the &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_language_parser.y'&gt;&lt;code&gt;zend_language_parser.y&lt;/code&gt;&lt;/a&gt; and insert the following line somewhere among its peers:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;%token T_IN &amp;quot;in (T_IN)&amp;quot;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now you should compile PHP again using &lt;code&gt;make -j4&lt;/code&gt; (you have to run it in the top level &lt;code&gt;php-src&lt;/code&gt; folder, not in &lt;code&gt;Zend/&lt;/code&gt;). This will generate a new lexer using re2c and compile it. To test that it worked, you can try running something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sapi/cli/php -r &amp;#39;in&amp;#39;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This should give you a nice parse error:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Parse error: syntax error, unexpected &amp;#39;in&amp;#39; (T_IN) in Command line code on line 1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A last thing we have to do is regenerate the data used by the &lt;a href='http://php.net/tokenizer'&gt;tokenizer extension&lt;/a&gt; (which exposes the internal lexer to userland PHP code). For this you have to &lt;code&gt;cd&lt;/code&gt; into the &lt;code&gt;ext/tokenizer&lt;/code&gt; folder and execute &lt;code&gt;./tokenizer_data_gen.sh&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you run &lt;code&gt;git diff --stat&lt;/code&gt; now, you will see something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Zend/zend_language_parser.y       |    1 +
Zend/zend_language_scanner.c      | 1765 +++++++++++++++++++------------------
Zend/zend_language_scanner.l      |    4 +
Zend/zend_language_scanner_defs.h |    2 +-
ext/tokenizer/tokenizer_data.c    |    4 +-
5 files changed, 904 insertions(+), 872 deletions(-)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_language_scanner.c'&gt;&lt;code&gt;zend_language_scanner.c&lt;/code&gt;&lt;/a&gt; file is the actual lexer generated using re2c. Because it contains line number information every change to the lexer will cause a huge diff. So don&amp;#8217;t worry about that ;)&lt;/p&gt;

&lt;h2 id='parsing__compilation'&gt;Parsing &amp;amp; Compilation&lt;/h2&gt;

&lt;p&gt;Now that the source code is broken down in meaningful tokens PHP has to recognize larger structures like &amp;#8220;this is an &lt;code&gt;if&lt;/code&gt; block&amp;#8221; or &amp;#8220;you&amp;#8217;re defining a function there&amp;#8221;. This process is called parsing and is defined in the &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_language_parser.y'&gt;&lt;code&gt;zend_language_parser.y&lt;/code&gt;&lt;/a&gt; file. Again this is just a definition file and the actual parser is generated using bison.&lt;/p&gt;

&lt;p&gt;To understand how the parser definitions work, let&amp;#8217;s look at an example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;class_statement:
        variable_modifiers { CG(access_type) = Z_LVAL($1.u.constant); } class_variable_declaration &amp;#39;;&amp;#39;
    |   class_constant_declaration &amp;#39;;&amp;#39;
    |   trait_use_statement
    |   method_modifiers function is_reference T_STRING { zend_do_begin_function_declaration(&amp;amp;$2, &amp;amp;$4, 1, $3.op_type, &amp;amp;$1 TSRMLS_CC); } &amp;#39;(&amp;#39;
           parameter_list &amp;#39;)&amp;#39; method_body { zend_do_abstract_method(&amp;amp;$4, &amp;amp;$1, &amp;amp;$9 TSRMLS_CC); zend_do_end_function_declaration(&amp;amp;$2 TSRMLS_CC); }
;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;For now, let&amp;#8217;s leave the parts in curly braces out, so we&amp;#8217;re left with the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;class_statement:
        variable_modifiers class_variable_declaration &amp;#39;;&amp;#39;
    |   class_constant_declaration &amp;#39;;&amp;#39;
    |   trait_use_statement
    |   method_modifiers function is_reference T_STRING &amp;#39;(&amp;#39; parameter_list &amp;#39;)&amp;#39; method_body
;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can read this as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;A class statement is
        a variable declaration (with access modifier)
    or  a class constant declaration
    or  a trait use statement
    or  a method (with method modifier, optional return-by-ref, method name, parameter list and method body)
.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To find out what exactly a &amp;#8220;method modifier&amp;#8221; is you&amp;#8217;d then go to the &lt;code&gt;method_modifier&lt;/code&gt; definition, etc. Should be fairly straightforward.&lt;/p&gt;

&lt;p&gt;So in order to add &lt;code&gt;in&lt;/code&gt; support to the parser, all you have to do is add a new &lt;code&gt;expr T_IN expr&lt;/code&gt; rule to &lt;code&gt;expr_without_variable&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;expr_without_variable:
    ...
    |   expr T_IN expr
    ...
;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you now run &lt;code&gt;make -j4&lt;/code&gt; again, bison will attempt to rebuild the parser, but it will fail with the following rather obscure error message:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;conflicts: 87 shift/reduce
/some/path/php-src/Zend/zend_language_parser.y: expected 3 shift/reduce conflicts
make: *** [/some/path/php-src/Zend/zend_language_parser.c] Error 1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A shift/reduce conflict basically means that the parser doesn&amp;#8217;t know what to do in some situation. The PHP grammar has three shift/reduce conflicts by itself (which is expected due to stuff like the dangling elseif/else ambiguity). The remaining 84 conflicts are caused by the new rule.&lt;/p&gt;

&lt;p&gt;The reason is that we didn&amp;#8217;t specify how &lt;code&gt;in&lt;/code&gt; should behave around other operators. An example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// if you write
$foo in $bar &amp;amp;&amp;amp; $someOtherCond
// should PHP interpret this as
($foo in $bar) &amp;amp;&amp;amp; $someOtherCond
// or as
$foo in ($bar &amp;amp;&amp;amp; $someOtherCond)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The above is called &amp;#8220;operator precedence&amp;#8221;. A related concept is &amp;#8220;operator associativity&amp;#8221;, which determines what happens when you write &lt;code&gt;$foo in $bar in $baz&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In order to fix the shift/reduce conflicts all you have to do is find the following line at the start of the parser and &lt;code&gt;T_IN&lt;/code&gt; at the end of it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;%nonassoc &amp;#39;&amp;lt;&amp;#39; T_IS_SMALLER_OR_EQUAL &amp;#39;&amp;gt;&amp;#39; T_IS_GREATER_OR_EQUAL&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What this means is that &lt;code&gt;in&lt;/code&gt; has the same operator precedence as the &lt;code&gt;&amp;lt;&lt;/code&gt;-style comparison operators and that the operator is non-associative. Here are some examples of what this does:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$foo in $bar &amp;amp;&amp;amp; $someOtherCond
// is interpreted as
($foo in $bar) &amp;amp;&amp;amp; $someOtherCond
// because `&amp;amp;&amp;amp;` has lower precedence than `in`

$foo in [&amp;#39;abc&amp;#39;, &amp;#39;def&amp;#39;] + [&amp;#39;ghi&amp;#39;, &amp;#39;jkl&amp;#39;]
// is interpreted as
$foo in ([&amp;#39;abc&amp;#39;, &amp;#39;def&amp;#39;] + [&amp;#39;ghi&amp;#39;, &amp;#39;jkl&amp;#39;])
// because `+` has a higher precedence than `in`

$foo in $bar in $baz
// will throw a parse error, because `in` is non-associative&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you rerun &lt;code&gt;make -j4&lt;/code&gt; now everything should work out fine. After that you can try out running something like &lt;code&gt;sapi/cli/php -r &amp;#39;&amp;quot;foo&amp;quot; in &amp;quot;bar&amp;quot;;&amp;#39;&lt;/code&gt;. This should do nothing, apart from printing a memory leak info:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[Thu Jul 26 22:33:14 2012]  Script:  &amp;#39;-&amp;#39;
Zend/zend_language_scanner.l(876) :  Freeing 0xB777E7AC (4 bytes), script=-
=== Total 1 memory leaks detected ===&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is expected because we haven&amp;#8217;t told the parser yet what it should do when it matches an &lt;code&gt;in&lt;/code&gt; operator. And this is where the curly braces come in. What you have to do is replace the existing &lt;code&gt;expr T_IN expr&lt;/code&gt; rule with the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;expr T_IN expr { zend_do_binary_op(ZEND_IN, &amp;amp;$$, &amp;amp;$1, &amp;amp;$3 TSRMLS_CC); }&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The part in the curly braces is called a semantic action and is run whenever the parser matches a certain rule (or part of it). The strange looking &lt;code&gt;$$&lt;/code&gt;, &lt;code&gt;$1&lt;/code&gt; and &lt;code&gt;$3&lt;/code&gt; variables in there are nodes. For example &lt;code&gt;$1&lt;/code&gt; refers to the first &lt;code&gt;expr&lt;/code&gt;, &lt;code&gt;$3&lt;/code&gt; refers to the second &lt;code&gt;expr&lt;/code&gt; (&lt;code&gt;$3&lt;/code&gt; because it is the third element in the rule) and &lt;code&gt;$$&lt;/code&gt; is the result node.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;zend_do_binary_op&lt;/code&gt; is a compiler instruction. It tells the compiler to emit a &lt;code&gt;ZEND_IN&lt;/code&gt; opcode that will take &lt;code&gt;$1&lt;/code&gt; and &lt;code&gt;$3&lt;/code&gt; as operands and put the result into &lt;code&gt;$$&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Compiler instructions are defined in &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_compile.c'&gt;&lt;code&gt;zend_compile.c&lt;/code&gt;&lt;/a&gt; (with a header entry in &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_compile.h'&gt;&lt;code&gt;zend_compile.h&lt;/code&gt;&lt;/a&gt;). &lt;code&gt;zend_do_binary_op&lt;/code&gt; for example looks like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='nf'&gt;zend_do_binary_op&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;zend_uchar&lt;/span&gt; &lt;span class='n'&gt;op&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;znode&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;znode&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;znode&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt; &lt;span class='n'&gt;TSRMLS_DC&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;zend_op&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;get_next_op&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;CG&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;active_op_array&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='n'&gt;TSRMLS_CC&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;opcode&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;op&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result_type&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;IS_TMP_VAR&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;var&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;get_temporary_variable&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;CG&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;active_op_array&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
    &lt;span class='n'&gt;SET_NODE&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;SET_NODE&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;GET_NODE&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The code should be easy to grasp and the next section should help putting it into context. One last thing to mention here is that in most cases you will have to add your own &lt;code&gt;zend_do_*&lt;/code&gt; function when adding some new syntax. Adding a new binary operator is one of the few cases where you don&amp;#8217;t have to. So if you have to add a new &lt;code&gt;zend_do_*&lt;/code&gt; function, just have a look at the existing ones. Most of them are quite simple.&lt;/p&gt;

&lt;h2 id='execution'&gt;Execution&lt;/h2&gt;

&lt;p&gt;In the previous section I already mentioned that the compiler is emitting opcodes. Let&amp;#8217;s look closer at how those opcodes look like (see &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_compile.h#106'&gt;&lt;code&gt;zend_compile.h&lt;/code&gt;&lt;/a&gt;):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;_zend_op&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;opcode_handler_t&lt;/span&gt; &lt;span class='n'&gt;handler&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;znode_op&lt;/span&gt; &lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;znode_op&lt;/span&gt; &lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;znode_op&lt;/span&gt; &lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;ulong&lt;/span&gt; &lt;span class='n'&gt;extended_value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;uint&lt;/span&gt; &lt;span class='n'&gt;lineno&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_uchar&lt;/span&gt; &lt;span class='n'&gt;opcode&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_uchar&lt;/span&gt; &lt;span class='n'&gt;op1_type&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_uchar&lt;/span&gt; &lt;span class='n'&gt;op2_type&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_uchar&lt;/span&gt; &lt;span class='n'&gt;result_type&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;A short description of what the individual components mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;opcode&lt;/code&gt;: This is the actual operation that should be executed. This could for example be &lt;code&gt;ZEND_ADD&lt;/code&gt; or &lt;code&gt;ZEND_SUB&lt;/code&gt;.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;op1&lt;/code&gt;, &lt;code&gt;op2&lt;/code&gt;, &lt;code&gt;result&lt;/code&gt;: Every operation can have a maximum of two operands (it can obviously also use just one of them, or even none at all) and a result node. The type of the nodes is given by &lt;code&gt;op1_type&lt;/code&gt;, &lt;code&gt;op2_type&lt;/code&gt; and &lt;code&gt;result_type&lt;/code&gt;. We&amp;#8217;ll cover how nodes looks like and what types there are in a minute.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;extended_value&lt;/code&gt;: The extended value is used to store flags or some other integer value. E.g. the variable fetching instruction uses it to store the variable kind (like &lt;code&gt;ZEND_FETCH_LOCAL&lt;/code&gt; or &lt;code&gt;ZEND_FETCH_GLOBAL&lt;/code&gt;).&lt;/li&gt;

&lt;li&gt;&lt;code&gt;handler&lt;/code&gt;: To optimize opcode execution this stores the handler function associated with the opcode and the operand types. This is determined automatically, so it doesn&amp;#8217;t have to be set in the compilation code.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;lineno&lt;/code&gt;: You know what that means&amp;#8230;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are five basic types that can go into the &lt;code&gt;*_type&lt;/code&gt; properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;IS_TMP_VAR&lt;/code&gt;: A temporary variable, usually the result of some expression like &lt;code&gt;$foo + $bar&lt;/code&gt;. Temporary variables cannot be shared, so they don&amp;#8217;t implement refcounting. They are usually very short lived, so one instruction creates them and the next one already frees them again. Temporary variables are usually written as &lt;code&gt;~n&lt;/code&gt;, so &lt;code&gt;~0&lt;/code&gt; would be the first temporary variable, &lt;code&gt;~1&lt;/code&gt; the second, etc.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;IS_CV&lt;/code&gt;: A compiled variable. To save hash table lookups PHP caches the location of simple variables like &lt;code&gt;$foo&lt;/code&gt; in an array (C array). Furthermore compiled variables allow PHP to optimize the hash table away altogether. CVs are denoted using &lt;code&gt;!n&lt;/code&gt; (&lt;code&gt;n&lt;/code&gt; here is the offset into the compiled-variable array.)&lt;/li&gt;

&lt;li&gt;&lt;code&gt;IS_VAR&lt;/code&gt;: Only simple variables can be turned into CVs. All other kinds of variable accesses, like &lt;code&gt;$foo[&amp;#39;bar&amp;#39;]&lt;/code&gt; or &lt;code&gt;$foo-&amp;gt;bar&lt;/code&gt; return an &lt;code&gt;IS_VAR&lt;/code&gt; variable. It basically is just a normal zval (with refcounting and everything). Vars are written as &lt;code&gt;$n&lt;/code&gt;.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;IS_CONST&lt;/code&gt;: Constants are literals in the code. For example if you write &lt;code&gt;&amp;quot;foo&amp;quot;&lt;/code&gt; or &lt;code&gt;3.141&lt;/code&gt; in your code, those will be of type &lt;code&gt;IS_CONST&lt;/code&gt;. Constants allow for some further optimizations, like reusing zvals for the same value, or precalculating hashes.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;IS_UNUSED&lt;/code&gt;: Operand not used.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Related to this is how &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_compile.h#74'&gt;&lt;code&gt;znode_op&lt;/code&gt;&lt;/a&gt; itself looks like:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;typedef&lt;/span&gt; &lt;span class='k'&gt;union&lt;/span&gt; &lt;span class='n'&gt;_znode_op&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;zend_uint&lt;/span&gt;      &lt;span class='n'&gt;constant&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_uint&lt;/span&gt;      &lt;span class='n'&gt;var&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_uint&lt;/span&gt;      &lt;span class='n'&gt;num&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_ulong&lt;/span&gt;     &lt;span class='n'&gt;hash&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_uint&lt;/span&gt;      &lt;span class='n'&gt;opline_num&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_op&lt;/span&gt;       &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;jmp_addr&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zval&lt;/span&gt;          &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;zv&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zend_literal&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;literal&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='kt'&gt;void&lt;/span&gt;          &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;ptr&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;znode_op&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As you can see a node is a union, i.e. it can contain one of the elements above (just one!), depending on context. For example &lt;code&gt;zv&lt;/code&gt; is used to store &lt;code&gt;IS_CONST&lt;/code&gt; zvals, &lt;code&gt;var&lt;/code&gt; is used to store the variable number for &lt;code&gt;IS_CV&lt;/code&gt;, &lt;code&gt;IS_VAR&lt;/code&gt; and &lt;code&gt;IS_TMP_VAR&lt;/code&gt; variables. The rest is used in various special circumstances. E.g. &lt;code&gt;jmp_addr&lt;/code&gt; is used with the &lt;code&gt;JMP*&lt;/code&gt; instructions (which are required for conditions and loops). Others again are used only during compilation, not during execution (like &lt;code&gt;constant&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;So now that we know how individual ops look like, the only remaining question is where those are stored: For every function (and file) PHP creates a &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_compile.h#_zend_op_array'&gt;&lt;code&gt;zend_op_array&lt;/code&gt;&lt;/a&gt;, which stores the opcodes as well as a lot of other information. I don&amp;#8217;t want to go into detail what the individual components are for, you should just know that this structure exists.&lt;/p&gt;

&lt;p&gt;Now let&amp;#8217;s get back to the implementation of the &lt;code&gt;in&lt;/code&gt; operator! We already instructed the compiler to emit a &lt;code&gt;ZEND_IN&lt;/code&gt; opcode. Now we have to define what this opcode does.&lt;/p&gt;

&lt;p&gt;This is done in the &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_def.h'&gt;&lt;code&gt;zend_vm_def.h&lt;/code&gt;&lt;/a&gt; file. If you look at it, you&amp;#8217;ll find that it is full of definitions like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='n'&gt;ZEND_VM_HANDLER&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;ZEND_ADD&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;CONST&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;TMP&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;VAR&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;CV&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;CONST&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;TMP&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;VAR&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;CV&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;USE_OPLINE&lt;/span&gt;
    &lt;span class='n'&gt;zend_free_op&lt;/span&gt; &lt;span class='n'&gt;free_op1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;free_op2&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='n'&gt;SAVE_OPLINE&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;fast_add_function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;EX_T&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;var&lt;/span&gt;&lt;span class='p'&gt;).&lt;/span&gt;&lt;span class='n'&gt;tmp_var&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
        &lt;span class='n'&gt;GET_OP1_ZVAL_PTR&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;BP_VAR_R&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt;
        &lt;span class='n'&gt;GET_OP2_ZVAL_PTR&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;BP_VAR_R&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='n'&gt;TSRMLS_CC&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;FREE_OP1&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;FREE_OP2&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;CHECK_EXCEPTION&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;ZEND_VM_NEXT_OPCODE&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;ZEND_IN&lt;/code&gt; opcode will look very similar to this, so it&amp;#8217;s worth understanding what is going on there. I&amp;#8217;ll go through it line by line:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='c1'&gt;// The header defines four things:&lt;/span&gt;
&lt;span class='c1'&gt;//   1. This is the opcode with ID 1&lt;/span&gt;
&lt;span class='c1'&gt;//   2. This opcode is called ZEND_ADD&lt;/span&gt;
&lt;span class='c1'&gt;//   3. This opcode accepts CONST, TMP, VAR and CV as the first operand&lt;/span&gt;
&lt;span class='c1'&gt;//   4. This opcode accepts CONST, TMP, VAR and CV as the second operand&lt;/span&gt;
&lt;span class='n'&gt;ZEND_VM_HANDLER&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;ZEND_ADD&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;CONST&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;TMP&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;VAR&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;CV&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;CONST&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;TMP&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;VAR&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;CV&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='c1'&gt;// USE_OPLINE means that we want to access the zend_op as `opline`.&lt;/span&gt;
    &lt;span class='c1'&gt;// This is required for all opcodes accessing operands or setting a return value.&lt;/span&gt;
    &lt;span class='n'&gt;USE_OPLINE&lt;/span&gt;
    &lt;span class='c1'&gt;// For every operand that is accessed a free_op* variable has to be defined.&lt;/span&gt;
    &lt;span class='c1'&gt;// It is used to figure out whether the operand needs freeing.&lt;/span&gt;
    &lt;span class='n'&gt;zend_free_op&lt;/span&gt; &lt;span class='n'&gt;free_op1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;free_op2&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='c1'&gt;// SAVE_OPLINE() actually loads the zend_op into `opline`.&lt;/span&gt;
    &lt;span class='c1'&gt;// USE_OPLINE was only the declaration&lt;/span&gt;
    &lt;span class='n'&gt;SAVE_OPLINE&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='c1'&gt;// Call the fast add function&lt;/span&gt;
    &lt;span class='n'&gt;fast_add_function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
        &lt;span class='c1'&gt;// And tell it to put the result into the temporary result variable&lt;/span&gt;
        &lt;span class='c1'&gt;// EX_T here accesses the temporary variable with ID opline-&amp;gt;result.var.&lt;/span&gt;
        &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;EX_T&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;var&lt;/span&gt;&lt;span class='p'&gt;).&lt;/span&gt;&lt;span class='n'&gt;tmp_var&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
        &lt;span class='c1'&gt;// Fetch the first operand for reading (the R in BP_VAR_R)&lt;/span&gt;
        &lt;span class='n'&gt;GET_OP1_ZVAL_PTR&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;BP_VAR_R&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt;
        &lt;span class='c1'&gt;// Fetch the second operand for reading&lt;/span&gt;
        &lt;span class='n'&gt;GET_OP2_ZVAL_PTR&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;BP_VAR_R&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='n'&gt;TSRMLS_CC&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='c1'&gt;// Free both operands (if necessary)&lt;/span&gt;
    &lt;span class='n'&gt;FREE_OP1&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;FREE_OP2&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='c1'&gt;// Check for exceptions. Exceptions can occur virtually everywhere, so one has to check for them in nearly all&lt;/span&gt;
    &lt;span class='c1'&gt;// opcodes. If in doubt, add the check.&lt;/span&gt;
    &lt;span class='n'&gt;CHECK_EXCEPTION&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='c1'&gt;// Go to the next opcode&lt;/span&gt;
    &lt;span class='n'&gt;ZEND_VM_NEXT_OPCODE&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As you probably noticed there is a lot &lt;code&gt;UPPERCASE_STUFF&lt;/code&gt; in there. The reason is that &lt;code&gt;zend_vm_def.h&lt;/code&gt; once again is only a definition file. The actual Zend VM is generated from it and stored in &lt;a href='http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_execute.h'&gt;&lt;code&gt;zend_vm_execute.h&lt;/code&gt;&lt;/a&gt; (biiig file). PHP has three different virtual machine kinds, namely &lt;code&gt;CALL&lt;/code&gt; (default), &lt;code&gt;GOTO&lt;/code&gt; and &lt;code&gt;SWITCH&lt;/code&gt;. Because they all have different implementation details the definition file uses lots of pseudo-macros like &lt;code&gt;USE_OPLINE&lt;/code&gt; that are later replaced by some concrete implementation.&lt;/p&gt;

&lt;p&gt;Furthermore the generated VM creates specialized implementations from all possible operand-type permutations. So in the end there won&amp;#8217;t be a single &lt;code&gt;ZEND_ADD&lt;/code&gt; function, but rather different functions for &lt;code&gt;ZEND_ADD_CONST_CONST&lt;/code&gt;, &lt;code&gt;ZEND_ADD_CONST_TMP&lt;/code&gt;, &lt;code&gt;ZEND_ADD_CONST_VAR&lt;/code&gt;&amp;#8230;&lt;/p&gt;

&lt;p&gt;Now, in order to implement the &lt;code&gt;ZEND_IN&lt;/code&gt; opcode, you should add a new opcode definition skeleton at the end of the &lt;code&gt;zend_vm_def.h&lt;/code&gt; file:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='c1'&gt;// 159 is the number of the next free opcode for me. You may need to choose a larger number&lt;/span&gt;
&lt;span class='n'&gt;ZEND_VM_HANDLER&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;159&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;ZEND_IN&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;CONST&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;TMP&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;VAR&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;CV&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;CONST&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;TMP&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;VAR&lt;/span&gt;&lt;span class='o'&gt;|&lt;/span&gt;&lt;span class='n'&gt;CV&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;USE_OPLINE&lt;/span&gt;
    &lt;span class='n'&gt;zend_free_op&lt;/span&gt; &lt;span class='n'&gt;free_op1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;free_op2&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zval&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='n'&gt;SAVE_OPLINE&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;op1&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;GET_OP1_ZVAL_PTR&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;BP_VAR_R&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;op2&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;GET_OP2_ZVAL_PTR&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;BP_VAR_R&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='cm'&gt;/* TODO */&lt;/span&gt;

    &lt;span class='n'&gt;FREE_OP1&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;FREE_OP2&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;CHECK_EXCEPTION&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
    &lt;span class='n'&gt;ZEND_VM_NEXT_OPCODE&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This does nothing more than fetching the operands and discarding them again right away.&lt;/p&gt;

&lt;p&gt;In order to generate a new VM you now have to run &lt;code&gt;php zend_vm_gen.php&lt;/code&gt; from within the &lt;code&gt;Zend/&lt;/code&gt; directory. (If it gives you lots of warnings about the &lt;code&gt;/e&lt;/code&gt; modifier being deprecated, ignore those.) After that go into the top level directory again and run &lt;code&gt;make -j4&lt;/code&gt; to recompile.&lt;/p&gt;

&lt;p&gt;If everything worked out fine you can now run &lt;code&gt;sapi/cli/php -r &amp;#39;&amp;quot;foo&amp;quot; in &amp;quot;bar&amp;quot;;&amp;#39;&lt;/code&gt; without getting any errors (but it still won&amp;#8217;t do anything).&lt;/p&gt;

&lt;p&gt;Now, finally, we can implement the actual logic. Let&amp;#8217;s start with the string case:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Z_TYPE_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='n'&gt;IS_STRING&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;zval&lt;/span&gt; &lt;span class='n'&gt;op1_copy&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;use_copy&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='cm'&gt;/* Convert the needle into a string */&lt;/span&gt;
    &lt;span class='n'&gt;zend_make_printable_zval&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;op1_copy&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;use_copy&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;use_copy&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;op1&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;op1_copy&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Z_STRLEN_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='cm'&gt;/* For empty needles return true */&lt;/span&gt;
        &lt;span class='n'&gt;ZVAL_TRUE&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;EX_T&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;var&lt;/span&gt;&lt;span class='p'&gt;).&lt;/span&gt;&lt;span class='n'&gt;tmp_var&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='kt'&gt;char&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;found&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;zend_memnstr&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
            &lt;span class='n'&gt;Z_STRVAL_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt;                  &lt;span class='cm'&gt;/* haystack */&lt;/span&gt;
            &lt;span class='n'&gt;Z_STRVAL_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt;                  &lt;span class='cm'&gt;/* needle */&lt;/span&gt;
            &lt;span class='n'&gt;Z_STRLEN_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt;                  &lt;span class='cm'&gt;/* needle length */&lt;/span&gt;
            &lt;span class='n'&gt;Z_STRVAL_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='n'&gt;Z_STRLEN_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='cm'&gt;/* haystack end ptr */&lt;/span&gt;
        &lt;span class='p'&gt;);&lt;/span&gt;

        &lt;span class='n'&gt;ZVAL_BOOL&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;EX_T&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;var&lt;/span&gt;&lt;span class='p'&gt;).&lt;/span&gt;&lt;span class='n'&gt;tmp_var&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;found&lt;/span&gt; &lt;span class='o'&gt;!=&lt;/span&gt; &lt;span class='nb'&gt;NULL&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='cm'&gt;/* Free copy */&lt;/span&gt;
    &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;use_copy&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;zval_dtor&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;op1_copy&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The hardest part here is actually casting the needle into a string. This is done using &lt;code&gt;zend_make_printable_zval&lt;/code&gt;. This function may either have to create a new zval, or not. That&amp;#8217;s why we pass &lt;code&gt;op1_copy&lt;/code&gt; and &lt;code&gt;use_copy&lt;/code&gt; into it. If the function copied the value we just put it into the &lt;code&gt;op1&lt;/code&gt; variable (so we don&amp;#8217;t have to deal with two different variables everywhere). Furthermore the copy has to be freed at the end (what the last three lines do).&lt;/p&gt;

&lt;p&gt;After that everything is simple. We can find out whether the haystack contains the needle using &lt;code&gt;zend_memnstr&lt;/code&gt;. If the needle is an empty string we just return &lt;code&gt;true&lt;/code&gt; directly (because the empty string is part of every string).&lt;/p&gt;

&lt;p&gt;Now, if you added the above code in place of the &lt;code&gt;/* TODO */&lt;/code&gt;, reran &lt;code&gt;zend_vm_gen.php&lt;/code&gt; and recompiled using &lt;code&gt;make -j4&lt;/code&gt;, you will already have a half-working &lt;code&gt;in&lt;/code&gt; operator:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sapi/cli/php -r &amp;#39;var_dump(&amp;quot;foo&amp;quot; in &amp;quot;bar&amp;quot;);&amp;#39;
bool(false)
$ sapi/cli/php -r &amp;#39;var_dump(&amp;quot;foo&amp;quot; in &amp;quot;foobar&amp;quot;);&amp;#39;
bool(true)
$ sapi/cli/php -r &amp;#39;var_dump(&amp;quot;foo&amp;quot; in &amp;quot;hallo foo world&amp;quot;);&amp;#39;
bool(true)
$ sapi/cli/php -r &amp;#39;var_dump(2 in &amp;quot;123&amp;quot;);&amp;#39;
bool(true)
$ sapi/cli/php -r &amp;#39;var_dump(5 in &amp;quot;123&amp;quot;);&amp;#39;
bool(false)
$ sapi/cli/php -r &amp;#39;var_dump(&amp;quot;&amp;quot; in &amp;quot;test&amp;quot;);&amp;#39;
bool(true)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Next, we have to implement the array behavior:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='nf'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Z_TYPE_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='n'&gt;IS_ARRAY&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;HashPosition&lt;/span&gt; &lt;span class='n'&gt;pos&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;zval&lt;/span&gt; &lt;span class='o'&gt;**&lt;/span&gt;&lt;span class='n'&gt;value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='cm'&gt;/* Start under the assumption that the value isn&amp;#39;t contained */&lt;/span&gt;
    &lt;span class='n'&gt;ZVAL_FALSE&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;EX_T&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;var&lt;/span&gt;&lt;span class='p'&gt;).&lt;/span&gt;&lt;span class='n'&gt;tmp_var&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='cm'&gt;/* Iterate through the array */&lt;/span&gt;
    &lt;span class='n'&gt;zend_hash_internal_pointer_reset_ex&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Z_ARRVAL_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;pos&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;zend_hash_get_current_data_ex&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Z_ARRVAL_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='o'&gt;**&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;value&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;pos&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='n'&gt;SUCCESS&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;zval&lt;/span&gt; &lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

        &lt;span class='cm'&gt;/* Compare values using == */&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;is_equal_function&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;op1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;value&lt;/span&gt; &lt;span class='n'&gt;TSRMLS_CC&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='n'&gt;SUCCESS&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;Z_LVAL&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='n'&gt;ZVAL_TRUE&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;EX_T&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;var&lt;/span&gt;&lt;span class='p'&gt;).&lt;/span&gt;&lt;span class='n'&gt;tmp_var&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='k'&gt;break&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='n'&gt;zend_hash_move_forward_ex&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Z_ARRVAL_P&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;op2&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;pos&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Here the haystack is simply traversed and every values is checked against the needle. We compare using &lt;code&gt;==&lt;/code&gt;. To compare using &lt;code&gt;===&lt;/code&gt; one would have to replace &lt;code&gt;is_equal_function&lt;/code&gt; with &lt;code&gt;is_identical_function&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After rerunning &lt;code&gt;zend_vm_gen.php&lt;/code&gt; and &lt;code&gt;make -j4&lt;/code&gt; the &lt;code&gt;in&lt;/code&gt; operator should be fully operational:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sapi/cli/php -r &amp;#39;var_dump(&amp;quot;test&amp;quot; in []);&amp;#39;
bool(false)
$ sapi/cli/php -r &amp;#39;var_dump(&amp;quot;test&amp;quot; in [&amp;quot;foo&amp;quot;, &amp;quot;bar&amp;quot;]);&amp;#39;
bool(false)
$ sapi/cli/php -r &amp;#39;var_dump(&amp;quot;test&amp;quot; in [&amp;quot;foo&amp;quot;, &amp;quot;test&amp;quot;, &amp;quot;bar&amp;quot;]);&amp;#39;
bool(true)
$ sapi/cli/php -r &amp;#39;var_dump(0 in [&amp;quot;foo&amp;quot;]);&amp;#39;
bool(true) // because we&amp;#39;re comparing using ==&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;One last thing to consider is what should happen when the second operator is neither array nor string. I&amp;#8217;ll just take the easy way out for this: Throw a warning and return false:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;zend_error&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;E_WARNING&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;Right operand of in has to be either string or array&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;ZVAL_FALSE&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;EX_T&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;opline&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='n'&gt;result&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;var&lt;/span&gt;&lt;span class='p'&gt;).&lt;/span&gt;&lt;span class='n'&gt;tmp_var&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;After rebuilding and recompiling the VM:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sapi/cli/php -r &amp;#39;var_dump(&amp;quot;foo&amp;quot; in new stdClass);&amp;#39;

Warning: Right operand of in has to be either string or array in Command line code on line 1
bool(false)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This might not be the best behavior. E.g. one could allow to do things like &lt;code&gt;2 in 123&lt;/code&gt; or &lt;code&gt;3.14 in 3.141&lt;/code&gt;. But I&amp;#8217;m too lazy for that right now ;)&lt;/p&gt;

&lt;h2 id='finishing_thoughts'&gt;Finishing thoughts&lt;/h2&gt;

&lt;p&gt;I hope the above helped you understand how to add new features to PHP and what the Zend Engine does when it runs a PHP script. But even though the article is quite long I covered only small parts of the whole system. So when you want to do modifications to the ZE the largest part of the job will be reading the existing code. The &lt;a href='http://lxr.php.net'&gt;cross-reference tool&lt;/a&gt; helps a lot when browsing through the PHP source code. Apart from that you can always ask questions in the #php.pecl room on efnet.&lt;/p&gt;

&lt;p&gt;After you created an implementation for whatever feature you want, the next step is bringing it up on the &lt;a href='http://php.net/mailing-lists.php'&gt;internals mailing list&lt;/a&gt;. People will then look at your feature and decide whether or not it should go in.&lt;/p&gt;

&lt;p&gt;Oh, and one last thing: The &lt;code&gt;in&lt;/code&gt; operator here was just an example. I don&amp;#8217;t plan on proposing it for inclusion ;)&lt;/p&gt;

&lt;p&gt;As always, if you have any further comments or questions, please leave them below.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>What PHP 5.5 might look like</title>
      <link>http://nikic.github.com/2012/07/10/What-PHP-5-5-might-look-like.html</link>
      <pubDate>Tue, 10 Jul 2012 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/07/10/What-PHP-5-5-might-look-like</guid>
      <description>&lt;p&gt;PHP 5.4 was released just four months ago, so it probably is a bit too early to look at the next PHP version. Still I&amp;#8217;d like to give all the people who aren&amp;#8217;t following the &lt;a href='http://php.net/mailing-lists.php'&gt;internals mailing list&lt;/a&gt; a small sneak peek at what PHP 5.5 might look like.&lt;/p&gt;

&lt;p&gt;But be sure to understand this: PHP 5.5 is in an early development stage, so nobody knows how the end result will look like. All I am talking about here are &lt;em&gt;proposals&lt;/em&gt;. I&amp;#8217;m pretty sure that not all of the things listed below will go into PHP 5.5, or at least not in their current form.&lt;/p&gt;

&lt;p&gt;So, don&amp;#8217;t get &lt;em&gt;too&lt;/em&gt; excited about this :)&lt;/p&gt;

&lt;p&gt;The list of new features / proposals is rather large and not sorted by significance. So if you don&amp;#8217;t want to read through all of it, here are the four features I personally am most excited about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='#a_simple_api_for_password_hashing'&gt;A simple API for password hashing&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='#scalar_typehinting'&gt;Scalar typehinting&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='#getters_and_setters'&gt;Getters and setters&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='#generators'&gt;Generators&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, without further ado, the list of stuff being worked on for PHP 5.5:&lt;/p&gt;

&lt;h2 id='backwards_compatibility_breaks'&gt;Backwards compatibility breaks&lt;/h2&gt;

&lt;p&gt;We&amp;#8217;ll start off with two changes that already landed in master and represent BC breaks (to some degree at least):&lt;/p&gt;

&lt;h3 id='windows_xp_and_2003_support_dropped'&gt;Windows XP and 2003 support dropped&lt;/h3&gt;
&lt;small&gt;Status: landed; Responsible: Pierre Joye&lt;/small&gt;
&lt;p&gt;PHP 5.5 will no longer support Windows XP and 2003. Those systems are around a decade old, so PHP is pulling the plug on them.&lt;/p&gt;

&lt;h3 id='e_modifier_deprecated'&gt;/e modifier deprecated&lt;/h3&gt;
&lt;small&gt;Status: landed; Responsible: myself&lt;/small&gt;
&lt;p&gt;The &lt;code&gt;e&lt;/code&gt; modifier instructs the &lt;a href='http://php.net/preg_replace'&gt;&lt;code&gt;preg_replace&lt;/code&gt;&lt;/a&gt; function to evaluate the replacement string as PHP code instead of just doing a simple string replacement. Unsurprisingly this behavior is a constant source of problems and security issues. That&amp;#8217;s why use of this modifier will throw a deprecation warning as of PHP 5.5. As a replacement you should use the &lt;a href='http://php.net/preg_replace_callback'&gt;&lt;code&gt;preg_replace_callback&lt;/code&gt;&lt;/a&gt; function. You can find more information on this change in the &lt;a href='https://wiki.php.net/rfc/remove_preg_replace_eval_modifier'&gt;corresponding RFC&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id='function_and_class_additions'&gt;Function and class additions&lt;/h2&gt;

&lt;p&gt;Next we&amp;#8217;ll look at some the planned function and class additions:&lt;/p&gt;

&lt;h3 id='boolval'&gt;boolval()&lt;/h3&gt;
&lt;small&gt;Status: landed; Responsible: Jille Timmermans&lt;/small&gt;
&lt;p&gt;PHP already implements the &lt;code&gt;strval&lt;/code&gt;, &lt;code&gt;intval&lt;/code&gt; and &lt;code&gt;floatval&lt;/code&gt; functions. To be consistent the &lt;code&gt;boolval&lt;/code&gt; function is now added, too. It does exactly the same thing as a &lt;code&gt;(bool)&lt;/code&gt; cast, but can be used as a callback function.&lt;/p&gt;

&lt;h3 id='hash_pbkdf2'&gt;hash_pbkdf2()&lt;/h3&gt;
&lt;small&gt;Status: landed; Responsible: Anthony Ferrara&lt;/small&gt;
&lt;p&gt;&lt;a href='http://en.wikipedia.org/wiki/PBKDF2'&gt;PBKDF2&lt;/a&gt; stands for &amp;#8220;Password-Based Key Derivation Function 2&amp;#8221; and is - as the name already says - an algorithm for deriving a cryptographic key from a password. This is required for encryption algorithms, but can also be used for password hashing. For a more extensive description and usage examples see &lt;a href='https://wiki.php.net/rfc/hash_pbkdf2'&gt;the RFC&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id='intl_additions'&gt;Intl additions&lt;/h3&gt;
&lt;small&gt;Status: landed; Responsible: Gustavo André dos Santos Lopes&lt;/small&gt;
&lt;p&gt;There have been many improvements to the &lt;a href='http://php.net/intl'&gt;intl extension&lt;/a&gt;. E.g. there will be new &lt;code&gt;IntlCalendar&lt;/code&gt;, &lt;code&gt;IntlGregorianCalendar&lt;/code&gt;, &lt;code&gt;IntlTimeZone&lt;/code&gt;, &lt;code&gt;IntlBreakIterator&lt;/code&gt;, &lt;code&gt;IntlRuleBasedBreakIterator&lt;/code&gt;, &lt;code&gt;IntlCodePointBreakIterator&lt;/code&gt; classes. I sadly don&amp;#8217;t know much about the intl extension, so I will just direct you to the mailing list announcements for &lt;a href='http://markmail.org/thread/b7sldwebskmhbxmx'&gt;Calendar&lt;/a&gt; and &lt;a href='http://markmail.org/thread/ulnm3hhist5ygido'&gt;BreakIterator&lt;/a&gt;, if you want to know more.&lt;/p&gt;

&lt;h3 id='array_column'&gt;array_column()&lt;/h3&gt;
&lt;small&gt;Status: proposed; Responsible: Ben Ramsey&lt;/small&gt;
&lt;p&gt;There is &lt;a href='https://wiki.php.net/rfc/array_column'&gt;a proposal&lt;/a&gt; pending for a new &lt;code&gt;array_column&lt;/code&gt; (or &lt;code&gt;array_pluck&lt;/code&gt;) function that would behave as follows:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='nv'&gt;$userNames&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;array_column&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$users&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='c1'&gt;// is the same as&lt;/span&gt;
&lt;span class='nv'&gt;$userNames&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[];&lt;/span&gt;
&lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$users&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$user&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$userNames&lt;/span&gt;&lt;span class='p'&gt;[]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$user&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;name&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;];&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;So it would be like fetching a column from a database, but for arrays.&lt;/p&gt;

&lt;h3 id='a_simple_api_for_password_hashing'&gt;A simple API for password hashing&lt;/h3&gt;
&lt;small&gt;Status: proposed; Responsible: Anthony Ferrara&lt;/small&gt;
&lt;p&gt;The recent password leaks (from LinkedIn etc) have shown that even large websites don&amp;#8217;t get how to properly hash passwords. People have been advocating the use of bcrypt for years, but still most people seem to be using completely unsafe &lt;code&gt;sha1&lt;/code&gt; hashes.&lt;/p&gt;

&lt;p&gt;We figured that the reason for this might be the really hard to use API of the &lt;a href='http://php.net/crypt'&gt;&lt;code&gt;crypt&lt;/code&gt;&lt;/a&gt; function. Thus we would like to introduce a new, simple API for secure password hashing:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='nv'&gt;$password&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;foo&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='c1'&gt;// creating the hash&lt;/span&gt;
&lt;span class='nv'&gt;$hash&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;password_hash&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$password&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;PASSWORD_BCRYPT&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

&lt;span class='c1'&gt;// verifying a password&lt;/span&gt;
&lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;password_verify&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$password&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$hash&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='c1'&gt;// password correct!&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;else&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='c1'&gt;// password wrong!&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The new hashing API comes with a few more features, which are outlined in &lt;a href='https://wiki.php.net/rfc/password_hash'&gt;the RFC&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id='language_changes'&gt;Language changes&lt;/h2&gt;

&lt;p&gt;Now comes the really interesting stuff: New language features and enhancements.&lt;/p&gt;

&lt;h3 id='constant_dereferencing'&gt;Constant dereferencing&lt;/h3&gt;
&lt;small&gt;Status: landed; Responsible: Xinchen Hui&lt;/small&gt;
&lt;p&gt;&amp;#8220;Constant dereferencing&amp;#8221; means that array operations can be directly applied to string and array literals. Here two examples:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;randomHexString&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$length&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$str&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='nv'&gt;$length&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$str&lt;/span&gt; &lt;span class='o'&gt;.=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;0123456789abcdef&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nx'&gt;mt_rand&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;15&lt;/span&gt;&lt;span class='p'&gt;)];&lt;/span&gt; &lt;span class='c1'&gt;// direct dereference of string&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;randomBool&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='k'&gt;false&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;][&lt;/span&gt;&lt;span class='nx'&gt;mt_rand&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)];&lt;/span&gt; &lt;span class='c1'&gt;// direct dereference of array&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;I don&amp;#8217;t think that this feature is of much use in practice, but it makes the language a bit more consistent. See also &lt;a href='https://wiki.php.net/rfc/constdereference'&gt;the RFC&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id='empty_works_with_function_calls_and_other_expressions'&gt;empty() works with function calls (and other expressions)&lt;/h3&gt;
&lt;small&gt;Status: landed; Responsible: myself&lt;/small&gt;
&lt;p&gt;Currently the &lt;code&gt;empty()&lt;/code&gt; language construct can only be used on variables, not on other expressions. In particular code like &lt;code&gt;empty($this-&amp;gt;getFriends())&lt;/code&gt; would throw an error. As of PHP 5.5 this becomes valid code. For more info see &lt;a href='https://wiki.php.net/rfc/empty_isset_exprs'&gt;the RFC&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id='getting_the_fully_qualified_class_name'&gt;Getting the fully qualified class name&lt;/h3&gt;
&lt;small&gt;Status: proposed; Responsible: Ralph Schindler&lt;/small&gt;
&lt;p&gt;PHP 5.3 introduced namespaces with the ability to alias classes and namespaces to shorter versions. This does not apply to string class names though:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;use&lt;/span&gt; &lt;span class='nx'&gt;Some\Deeply\Nested\Namespace\FooBar&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='c1'&gt;// does not work, because this will try to use the global `FooBar` class&lt;/span&gt;
&lt;span class='nv'&gt;$reflection&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;ReflectionClass&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;FooBar&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;To solve this a new &lt;code&gt;FooBar::class&lt;/code&gt; syntax is proposed, which returns the fully qualified name of the class:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;use&lt;/span&gt; &lt;span class='nx'&gt;Some\Deeply\Nested\Namespace\FooBar&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='c1'&gt;// this works because FooBar::class is resolved to &amp;quot;Some\\Deeply\\Nested\\Namespace\\FooBar&amp;quot;&lt;/span&gt;
&lt;span class='nv'&gt;$reflection&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;ReflectionClass&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;FooBar&lt;/span&gt;&lt;span class='o'&gt;::&lt;/span&gt;&lt;span class='na'&gt;class&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;For more examples see &lt;a href='https://wiki.php.net/rfc/class_name_scalars'&gt;the RFC&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id='parameter_skipping'&gt;Parameter skipping&lt;/h3&gt;
&lt;small&gt;Status: proposed; Responsible: Stas Malyshev&lt;/small&gt;
&lt;p&gt;If you have a function accepting multiple optional parameters there is currently no way to change just the last one, leaving all others at their default.&lt;/p&gt;

&lt;p&gt;Taking the example from &lt;a href='https://wiki.php.net/rfc/skipparams'&gt;the RFC&lt;/a&gt;, if you have a function like the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function create_query($where, $order_by, $join_type=&amp;#39;&amp;#39;, $execute = false, $report_errors = true) { ... }&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then there is no way to set &lt;code&gt;$report_errors = false&lt;/code&gt; without replicating the other two default values. To solve this a way of skipping parameters is proposed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;create_query(&amp;quot;deleted=0&amp;quot;, &amp;quot;name&amp;quot;, default, default, false);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Personally I&amp;#8217;m not particular fond of this proposal. In my eyes code that needs this feature is just badly designed. Functions shouldn&amp;#8217;t have 12 optional parameters.&lt;/p&gt;

&lt;h3 id='scalar_typehinting'&gt;Scalar typehinting&lt;/h3&gt;
&lt;small&gt;Status: proposed; Responsible: Anthony Ferrara&lt;/small&gt;
&lt;p&gt;Scalar typehinting was originally planned to go into 5.4, but never made it due to lack of consensus. For more info on why scalar typehints haven&amp;#8217;t made it into PHP yet, see: &lt;a href='http://nikic.github.com/2012/03/06/Scalar-type-hinting-is-harder-than-you-think.html'&gt;Scalar typehints are harder than you think&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For PHP 5.5 the discussion has come up again and I think there is a fairly decent &lt;a href='https://wiki.php.net/rfc/scalar_type_hinting_with_cast'&gt;proposal for scalar typehints with casting&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It would work by casting the input value to the specified type, but only if the cast can occur &lt;em&gt;without data loss&lt;/em&gt;. E.g. &lt;code&gt;123&lt;/code&gt;, &lt;code&gt;123.0&lt;/code&gt;, &lt;code&gt;&amp;quot;123&amp;quot;&lt;/code&gt; would all be valid inputs for an &lt;code&gt;int&lt;/code&gt; parameter, but &lt;code&gt;&amp;quot;hallo world&amp;quot;&lt;/code&gt; would not. This matches the behavior of internal functions.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='x'&gt;function foo(int $i) { ... }&lt;/span&gt;

&lt;span class='x'&gt;foo(1);      // $i = 1&lt;/span&gt;
&lt;span class='x'&gt;foo(1.0);    // $i = 1&lt;/span&gt;
&lt;span class='x'&gt;foo(&amp;quot;1&amp;quot;);    // $i = 1&lt;/span&gt;
&lt;span class='x'&gt;foo(&amp;quot;1abc&amp;quot;); // not yet clear, maybe $i = 1 with notice&lt;/span&gt;
&lt;span class='x'&gt;foo(1.5);    // not yet clear, maybe $i = 1 with notice&lt;/span&gt;
&lt;span class='x'&gt;foo([]);     // error&lt;/span&gt;
&lt;span class='x'&gt;foo(&amp;quot;abc&amp;quot;);  // error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;h3 id='getters_and_setters'&gt;Getters and setters&lt;/h3&gt;
&lt;small&gt;Status: proposed; Responsible: Clint Priest&lt;/small&gt;
&lt;p&gt;If you&amp;#8217;ve never been a fan of writing all those &lt;code&gt;getXYZ()&lt;/code&gt; and &lt;code&gt;setXYZ($value)&lt;/code&gt; methods, then this should be a welcome change for you. The proposal adds a new syntax for defining what should happen when a property is set / read:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;TimePeriod&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='nv'&gt;$seconds&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='nv'&gt;$hours&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;get&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;seconds&lt;/span&gt; &lt;span class='o'&gt;/&lt;/span&gt; &lt;span class='mi'&gt;3600&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;
        &lt;span class='nx'&gt;set&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;seconds&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt; &lt;span class='mi'&gt;3600&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$timePeriod&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;TimePeriod&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$timePeriod&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;hours&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;10&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$timePeriod&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;seconds&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// int(36000)&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$timePeriod&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;hours&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;   &lt;span class='c1'&gt;// int(10)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;There are also some more features like read-only properties. If you want to know more, have a look at &lt;a href='https://wiki.php.net/rfc/propertygetsetsyntax-as-implemented'&gt;the RFC&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id='generators'&gt;Generators&lt;/h3&gt;
&lt;small&gt;Status: proposed; Responsible: myself&lt;/small&gt;
&lt;p&gt;Currently custom iterators are used only rarely, because their implementation requires lots of boilerplate code. Generators solve this issue by providing an easy and boilerplate-free way to create iterators.&lt;/p&gt;

&lt;p&gt;For example, this is how you could define the &lt;code&gt;range&lt;/code&gt; function, but as an iterator:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='err'&gt;*&lt;/span&gt;&lt;span class='nf'&gt;xrange&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$start&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$end&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$step&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$start&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='nv'&gt;$end&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;+=&lt;/span&gt; &lt;span class='nv'&gt;$step&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nx'&gt;yield&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;xrange&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;10&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;20&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='c1'&gt;// ...&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The above &lt;code&gt;xrange&lt;/code&gt; function has the same behavior as the builtin &lt;code&gt;range&lt;/code&gt; function with one difference: Instead of returning an array with all the values, it returns an iterator which generates the values on-the-fly.&lt;/p&gt;

&lt;p&gt;For more in-depth introduction into the topic see &lt;a href='https://wiki.php.net/rfc/generators'&gt;the RFC&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id='list_comprehensions_and_generator_expressions'&gt;List comprehensions and generator expressions&lt;/h3&gt;
&lt;small&gt;Status: proposed; Responsible: myself&lt;/small&gt;
&lt;p&gt;List comprehensions provide a simple way to do small operations on arrays:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$firstNames = [foreach ($users as $user) yield $user-&amp;gt;firstName];&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The above list comprehension is equivalent to the following code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$firstNames = [];
foreach ($users as $user) {
    $firstNames[] = $user-&amp;gt;firstName;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It is also possible to filter arrays this way:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$underageUsers = [foreach ($users as $user) if ($user-&amp;gt;age &amp;lt; 18) yield $user];&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Generator expressions are similar, but return an iterator (which generates the values on-the-fly) instead of an array.&lt;/p&gt;

&lt;p&gt;For more examples see &lt;a href='http://markmail.org/thread/uvendztpe2rrwiif'&gt;the mailing list announcement&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id='wrapping_up'&gt;Wrapping up&lt;/h2&gt;

&lt;p&gt;As you can see, there is a lot of awesome stuff being worked on for PHP 5.5. But as I already said, PHP 5.5 is still very young, so we don&amp;#8217;t know for sure what will get in and what will not.&lt;/p&gt;

&lt;p&gt;If you want to stay updated on the new features or want to help out in discussions and/or development, be sure to &lt;a href='http://php.net/mailing-lists.php'&gt;subscribe to the internals mailing list&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Comments welcome!&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>A plea for less (XML) configuration files</title>
      <link>http://nikic.github.com/2012/07/09/A-plea-for-less-XML-configuration-files.html</link>
      <pubDate>Mon, 09 Jul 2012 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/07/09/A-plea-for-less-XML-configuration-files</guid>
      <description>&lt;p&gt;I recently tried using &lt;a href='http://www.phing.info/trac/'&gt;Phing&lt;/a&gt; (a PHP build system) to do some simple release automation. Just creating a PEAR package and doing a few string replacements here and there.&lt;/p&gt;

&lt;p&gt;The result? After several wasted hours I ended up using Phing only for PEAR packaging and doing everything else in a custom PHP build script.&lt;/p&gt;

&lt;p&gt;The reason? Phing uses XML files to configure what it should do during a build. So an excerpt from a build file might look like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='xml'&gt;&lt;span class='cp'&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot; encoding=&amp;quot;UTF-8&amp;quot;?&amp;gt;&lt;/span&gt;
&lt;span class='nt'&gt;&amp;lt;project&lt;/span&gt; &lt;span class='na'&gt;name=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;SomeName&amp;quot;&lt;/span&gt; &lt;span class='na'&gt;default=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;package&amp;quot;&lt;/span&gt; &lt;span class='na'&gt;basedir=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;..&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;
    &lt;span class='nt'&gt;&amp;lt;target&lt;/span&gt; &lt;span class='na'&gt;name=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;package&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;
        &lt;span class='nt'&gt;&amp;lt;property&lt;/span&gt; &lt;span class='na'&gt;file=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;build/build.properties&amp;quot;&lt;/span&gt; &lt;span class='nt'&gt;/&amp;gt;&lt;/span&gt;
        &lt;span class='nt'&gt;&amp;lt;propertyprompt&lt;/span&gt; &lt;span class='na'&gt;propertyName=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;project.version&amp;quot;&lt;/span&gt; &lt;span class='na'&gt;promptText=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;Enter project version&amp;quot;&lt;/span&gt; &lt;span class='nt'&gt;/&amp;gt;&lt;/span&gt;

        &lt;span class='nt'&gt;&amp;lt;delete&lt;/span&gt; &lt;span class='na'&gt;dir=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;${path.results.lib}&amp;quot;&lt;/span&gt; &lt;span class='nt'&gt;/&amp;gt;&lt;/span&gt;
        &lt;span class='nt'&gt;&amp;lt;mkdir&lt;/span&gt; &lt;span class='na'&gt;dir=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;${path.results.lib}&amp;quot;&lt;/span&gt; &lt;span class='nt'&gt;/&amp;gt;&lt;/span&gt;

        &lt;span class='nt'&gt;&amp;lt;copy&lt;/span&gt; &lt;span class='na'&gt;todir=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;${path.results.lib}&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;
            &lt;span class='nt'&gt;&amp;lt;fileset&lt;/span&gt; &lt;span class='na'&gt;dir=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;${path.lib}&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;
                &lt;span class='nt'&gt;&amp;lt;include&lt;/span&gt; &lt;span class='na'&gt;name=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;**/**&amp;quot;&lt;/span&gt; &lt;span class='nt'&gt;/&amp;gt;&lt;/span&gt;
            &lt;span class='nt'&gt;&amp;lt;/fileset&amp;gt;&lt;/span&gt;
        &lt;span class='nt'&gt;&amp;lt;/copy&amp;gt;&lt;/span&gt;

        &lt;span class='c'&gt;&amp;lt;!-- do more stuff --&amp;gt;&lt;/span&gt;
    &lt;span class='nt'&gt;&amp;lt;/target&amp;gt;&lt;/span&gt;

    &lt;span class='c'&gt;&amp;lt;!-- more targets --&amp;gt;&lt;/span&gt;
&lt;span class='nt'&gt;&amp;lt;/project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Could you explain me, why you have to do this? Why do you have to specify everything in an XML configuration file? Why can&amp;#8217;t you just do the same thing in the programming language you&amp;#8217;re using? E.g., why can&amp;#8217;t I write this instead:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;package&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Phing&lt;/span&gt; &lt;span class='nv'&gt;$phing&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;include&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;properties.php&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='nv'&gt;$project&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;version&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$phing&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;prompt&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;Enter project version&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='nv'&gt;$phing&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;deleteDir&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$path&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;results&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;lib&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='nv'&gt;$phing&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;createDir&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$path&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;results&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;lib&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='nv'&gt;$phing&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;copyDir&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$path&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;lib&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$path&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;results&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;lib&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='c1'&gt;// do more stuff, like replace version strings with $project-&amp;gt;version&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Don&amp;#8217;t you think that this is much nicer? Some reasons why I prefer this over XML configs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It uses an environment you&amp;#8217;re already used to. You don&amp;#8217;t have to learn how exactly the Phing XML files work. You don&amp;#8217;t need to learn where you can use properties and where you can&amp;#8217;t. You simply know already.&lt;/li&gt;

&lt;li&gt;You can use the full power of the programming language. E.g. the main reason why I stopped using Phing was that I could not implement some slightly more complex string replacement in it without writing 200 lines of XML. I was able to implement those replacements in 5 lines of PHP on the other hand.&lt;/li&gt;

&lt;li&gt;All the tools you normally use for programming will work. E.g. the IDE will be able to provide autocompletion and error analysis. One could use phpDocumentor to document the different build targets and options. Heck, you could even unit test the building process, if you really wanted to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And obviously: XML is a rather verbose language, whereas most programming languages (short of Brainfuck) try to be concise and readable.&lt;/p&gt;

&lt;p&gt;At this point you might say that the above isn&amp;#8217;t really a configuration file, but rather a program written in XML. I&amp;#8217;d agree with you there, but I think the the above also applies to &amp;#8220;real&amp;#8221; configuration files. They can also benefit from the power of the programming language (like putting repeating configuration &amp;#8220;patterns&amp;#8221; into functions or loops), etc.&lt;/p&gt;

&lt;p&gt;Also, this isn&amp;#8217;t specific to XML. I think that it is &lt;em&gt;generally&lt;/em&gt; preferable to just use your usual programming language to write configuration, instead of using some special format.&lt;/p&gt;

&lt;p&gt;But don&amp;#8217;t get me wrong, there certainly are cases where using a special configuration format is the way to go:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the configuration will be used by multiple programming languages. Obviously it doesn&amp;#8217;t make sense to write PHP configs if they have to be used by Python too.&lt;/li&gt;

&lt;li&gt;If the type of configuration is very specific and writing it would be much easier using a Domain Specific Language.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my personal experience neither of those points apply for most of the config files I&amp;#8217;ve encountered.&lt;/p&gt;

&lt;p&gt;So, what I ask you for is: If you&amp;#8217;re creating a config file, consider writing it in your normal programming language instead of XML or some DSL.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>PHP solves problems. Oh, and you can program with it too!</title>
      <link>http://nikic.github.com/2012/06/29/PHP-solves-problems-Oh-and-you-can-program-with-it-too.html</link>
      <pubDate>Fri, 29 Jun 2012 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/06/29/PHP-solves-problems-Oh-and-you-can-program-with-it-too</guid>
      <description>&lt;p&gt;I&amp;#8217;d like to point your attention to one particular comment on &lt;a href='http://www.codinghorror.com/blog/2012/06/the-php-singularity.html'&gt;Jeff Atwood&amp;#8217;s new &amp;#8220;PHP sucks!&amp;#8221; blog post&lt;/a&gt;, a comment by Adam Victor Nazareth Brandizzi:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am a Java programmer at work (could be no more corporate-y) and a Python developer in my projects (could be no more hipster) but I admire PHP and its ability of solving problems. It grows because, some times, some poor soul wants to create an &lt;a href='http://en.wikipedia.org/wiki/Main_Page'&gt;online encyclopaedia&lt;/a&gt;, or some teacher needs an &lt;a href='http://moodle.org/'&gt;online teaching platform&lt;/a&gt;, or someone wants to write a &lt;a href='http://wordpress.org/'&gt;blog&lt;/a&gt;. Those people do not want to learn to program, they want to &lt;strong&gt;solve problems&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I think this comment really nailed it. I think &lt;em&gt;this&lt;/em&gt; is the most important reason for PHP&amp;#8217;s success.&lt;/p&gt;

&lt;p&gt;People come to PHP because they have some problem and they need to solve it. This is what PHP really shines at. You can simply take your static HTML website, add a simple &lt;code&gt;&amp;lt;?php include &amp;#39;counter.php&amp;#39;; ?&amp;gt;&lt;/code&gt; in there, and &amp;#8230; be done!&lt;/p&gt;

&lt;p&gt;From there you start writing simple scripts, learn how to process forms, how to talk to the database, etc. After some time you start using object oriented programming and maybe make use of some framework.&lt;/p&gt;

&lt;p&gt;That&amp;#8217;s actually pretty much how I got into programming.&lt;/p&gt;

&lt;p&gt;With most other languages it is the other way around. With them you first study computer science for five years and then you go out into the world to find some problem you can solve. (You could say that PHP is a programmer-producing language, whereas most other languages are programmer-consuming.)&lt;/p&gt;

&lt;p&gt;At this point you may ask: Well, once you learned programming, why not switch to another language? Simple: PHP has its quirks, but it is not &lt;strong&gt;that&lt;/strong&gt; bad. Sure, people always try to tell you that, but it just isn&amp;#8217;t true. Most of the things people criticize about PHP are really non-issues practically. Like the inconsistent needle/haystack order. That&amp;#8217;s always at the top of things that are wrong with PHP. But in reality it really doesn&amp;#8217;t matter. Sure, it would be nice if the order was consistent, but my IDE does a very good job at reminding me how to do it correctly.&lt;/p&gt;

&lt;p&gt;To summarize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PHP is a great language to start programming!&lt;/li&gt;

&lt;li&gt;Once you started, PHP is also good for &amp;#8220;real&amp;#8221; programming (you know, object orientation and stuff).&lt;/li&gt;

&lt;li&gt;PHP is not as bad as they say. There are issues, like with every language, but they rarely cause problems in practice.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To add to that, I also noticed that most of the PHP bashers have a ten-year-old image of the language. E.g. consider this quote from Jeff&amp;#8217;s post:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What&amp;#8217;s depressing is not that PHP is horribly designed. Does anyone even dispute that PHP is the worst designed mainstream &amp;#8220;language&amp;#8221; to blight our craft in decades? What&amp;#8217;s truly depressing is that &lt;strong&gt;so little has changed&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This statement is outrageous. PHP has changed a lot in recent years, but most people seem to remember it as the shitty language with terrible OOP support that PHP 4 was. Well, I got news for you: That language is dead for nearly a decade now. PHP 5 has very good object orientation support, which is actually really similar to Java. PHP 5.3 additionally added namespace and lambda function support to the mix. (And PHP 5.5 &lt;em&gt;will&lt;/em&gt; add a lot of awesome new features.) And you tell me nothing changed?&lt;/p&gt;

&lt;p&gt;End of rant.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>The true power of regular expressions</title>
      <link>http://nikic.github.com/2012/06/15/The-true-power-of-regular-expressions.html</link>
      <pubDate>Fri, 15 Jun 2012 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/06/15/The-true-power-of-regular-expressions</guid>
      <description>&lt;p&gt;As someone who frequents the &lt;a href='http://stackoverflow.com/questions/tagged/php'&gt;PHP tag on StackOverflow&lt;/a&gt; I pretty often see questions about how to parse some particular aspect of HTML using regular expressions. A common reply to such a question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You cannot parse HTML with regular expressions, because HTML isn&amp;#8217;t regular. Use an XML parser instead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This statement - in the context of the question - is somewhere between very misleading and outright wrong. What I&amp;#8217;ll try to demonstrate in this article is how powerful modern regular expressions &lt;em&gt;really&lt;/em&gt; are.&lt;/p&gt;

&lt;h2 id='what_does_regular_actually_mean'&gt;What does &amp;#8220;regular&amp;#8221; actually mean?&lt;/h2&gt;

&lt;p&gt;In the context of &lt;a href='http://en.wikipedia.org/wiki/Formal_language'&gt;formal language theory&lt;/a&gt;, something is called &amp;#8220;regular&amp;#8221; when it has a grammar where all production rules have one of the following forms:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;B -&amp;gt; a
B -&amp;gt; aC
B -&amp;gt; ε&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can read those &lt;code&gt;-&amp;gt;&lt;/code&gt; rules as &amp;#8220;The left hand side can be replaced with the right hand side&amp;#8221;. So the first rule would be &amp;#8220;B can be replaced with a&amp;#8221;, the second one &amp;#8220;B can be replaced with aC&amp;#8221; and the third one &amp;#8220;B can be replaced with the empty string&amp;#8221; (&lt;code&gt;ε&lt;/code&gt; is the symbol for the empty string).&lt;/p&gt;

&lt;p&gt;So what are &lt;code&gt;B&lt;/code&gt;, &lt;code&gt;C&lt;/code&gt; and &lt;code&gt;a&lt;/code&gt;? By convention, uppercase characters denote so called &amp;#8220;non-terminals&amp;#8221; - symbols which &lt;em&gt;can&lt;/em&gt; be broken down further - and lowercase characters denote &amp;#8220;terminals&amp;#8221; - symbols which &lt;em&gt;cannot&lt;/em&gt; be broken down any further.&lt;/p&gt;

&lt;p&gt;All that probably sounds a bit abstract, so let&amp;#8217;s look at an example: Defining the natural numbers as a grammar.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;N -&amp;gt; 0
N -&amp;gt; 1
N -&amp;gt; 2
N -&amp;gt; 3
N -&amp;gt; 4
N -&amp;gt; 5
N -&amp;gt; 6
N -&amp;gt; 7
N -&amp;gt; 8
N -&amp;gt; 9
N -&amp;gt; 0N
N -&amp;gt; 1N
N -&amp;gt; 2N
N -&amp;gt; 3N
N -&amp;gt; 4N
N -&amp;gt; 5N
N -&amp;gt; 6N
N -&amp;gt; 7N
N -&amp;gt; 8N
N -&amp;gt; 9N&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What this grammar says is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;A natural number (N) is
... one of the digits 0 to 9
or
... one of the digits 0 to 9 followed by another natural number (N)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this example the digits 0 to 9 would be terminals (as they can&amp;#8217;t be broken down any further) and &lt;code&gt;N&lt;/code&gt; would be the only non-terminal (as it can be and is broken down further).&lt;/p&gt;

&lt;p&gt;If you have another look at the rules and compare them to the definition of a regular grammar from above, you&amp;#8217;ll see that they meet the criteria: The first ten rules are of the form &lt;code&gt;B -&amp;gt; a&lt;/code&gt; and the second ten rules follow the form &lt;code&gt;B -&amp;gt; aC&lt;/code&gt;. Thus the grammar defining the natural numbers is &lt;em&gt;regular&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Another thing you might notice is that even though the above grammar defines such a simple thing, it is already quite bloated. Wouldn&amp;#8217;t it be better if we could express the same concept in a more concise manner?&lt;/p&gt;

&lt;p&gt;And that&amp;#8217;s where regular expressions come in: The above grammar is equivalent to the regex &lt;code&gt;[0-9]+&lt;/code&gt; (which is a hell lot simpler). And this kind of transformation can be done with &lt;em&gt;any&lt;/em&gt; regular grammar: Every regular grammar has a corresponding regular expression which defines all its valid strings.&lt;/p&gt;

&lt;h2 id='what_can_regular_expressions_match'&gt;What can regular expressions match?&lt;/h2&gt;

&lt;p&gt;Thus the question arises: Can regular expressions match only regular grammars, or can they also match more? The answer to this is both yes &lt;em&gt;and&lt;/em&gt; no:&lt;/p&gt;

&lt;p&gt;Regular expressions in the formal grammar sense can (pretty much by definition) only parse regular grammars and nothing more.&lt;/p&gt;

&lt;p&gt;But when programmers talk about &amp;#8220;regular expressions&amp;#8221; they aren&amp;#8217;t talking about formal grammars. They are talking about the regular expression &lt;em&gt;derivative&lt;/em&gt; which their language implements. And those regex implementations are only very slightly related to the original notion of regularity.&lt;/p&gt;

&lt;p&gt;Any modern regex flavor can match a &lt;em&gt;lot&lt;/em&gt; more than just regular languages. How much exactly, that&amp;#8217;s what the rest of the article is about.&lt;/p&gt;

&lt;p&gt;To keep things simple, I&amp;#8217;ll focus on the PCRE regex implementation in the following, simply because I know it best (as it&amp;#8217;s used by PHP). Most other regex implementations are quite similar though, so most stuff should apply to them too.&lt;/p&gt;

&lt;h2 id='the_language_hierarchy'&gt;The language hierarchy&lt;/h2&gt;

&lt;p&gt;In order to analyze what regular expressions can and cannot match, we first have to look at what other types of languages there are. A good starting point for this is the &lt;a href='http://en.wikipedia.org/wiki/Chomsky_hierarchy'&gt;Chomsky hierarchy&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Chomsky hierarchy:

/-------------------------------------------\
|                                           |
|     Recursively enumerable languages      | Type 0
|                                           |
|   /-----------------------------------\   |
|   |                                   |   |
|   |    Context-sensitive languages    |   | Type 1
|   |                                   |   |
|   |   /---------------------------\   |   |
|   |   |                           |   |   |
|   |   |  Context-free languages   |   |   | Type 2
|   |   |                           |   |   |
|   |   |   /-------------------\   |   |   |
|   |   |   | Regular languages |   |   |   | Type 3
|   |   |   \-------------------/   |   |   |
|   |   \---------------------------/   |   |
|   \-----------------------------------/   |
\-------------------------------------------/&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As you can see the Chomsky hierarchy divides formal languages into four types:&lt;/p&gt;

&lt;p&gt;Regular languages (Type 3) are the least-powerful, followed by the context-free languages (Type 2), the context-sensitive languages (Type 1) and at last the all-mighty recursively enumerable languages (Type 0).&lt;/p&gt;

&lt;p&gt;The Chomsky hierarchy is a containment hierarchy, so the smaller boxes in the above image are fully contained in the larger boxes. For example every regular language is also a context-free language (but &lt;em&gt;not&lt;/em&gt; the other way around!)&lt;/p&gt;

&lt;p&gt;So, let&amp;#8217;s move one step up in that hierarchy: We already know that regular expressions can match any regular language. But can they also match context-free languages?&lt;/p&gt;

&lt;p&gt;(Reminder: When I say &amp;#8220;regular expression&amp;#8221; here I obviously mean it in the programmer sense, not the formal language theory sense.)&lt;/p&gt;

&lt;h2 id='matching_contextfree_languages'&gt;Matching context-free languages&lt;/h2&gt;

&lt;p&gt;The answer to this is &lt;em&gt;yes&lt;/em&gt;, they can!&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s take the classical example of a context-free language, namely &lt;code&gt;{a^n b^n, n&amp;gt;0}&lt;/code&gt;, which means &amp;#8220;A number of &lt;code&gt;a&lt;/code&gt; characters followed by the &lt;em&gt;same&lt;/em&gt; number of &lt;code&gt;b&lt;/code&gt; characters&amp;#8221;. The (PCRE) regex for this language is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/^(a(?1)?b)$/&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The regular expression is very simple: &lt;code&gt;(?1)&lt;/code&gt; is a reference to the first subpattern, namely &lt;code&gt;(a(?1)?b)&lt;/code&gt;. So basically you could replace the &lt;code&gt;(?1)&lt;/code&gt; by that subpattern, thus forming a recursive dependency:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/^(a(?1)?b)$/
/^(a(a(?1)?b)?b)$/
/^(a(a(a(?1)?b)?b)?b)$/
/^(a(a(a(a(?1)?b)?b)?b)?b)$/
# and so on&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;From the above expansions it should be clear that this expression can match any string with the same number of &lt;code&gt;a&lt;/code&gt;s and &lt;code&gt;b&lt;/code&gt;s.&lt;/p&gt;

&lt;p&gt;Thus regular expressions can match at least some non-regular, context-free grammars. But can they match all? To answer that, we first have to look at how context-free grammars are defined.&lt;/p&gt;

&lt;p&gt;In a context-free grammar all production rules take the following form:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;A -&amp;gt; β&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here &lt;code&gt;A&lt;/code&gt; once again is a non-terminal symbol and &lt;code&gt;β&lt;/code&gt; is an arbitrary string of terminals and non-terminals. Thus every production rule of a context-free grammar has a non-terminal on the left hand side and an arbitrary symbol string on the right hand side.&lt;/p&gt;

&lt;p&gt;As an example, have a look at the following grammar:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function_declaration -&amp;gt; T_FUNCTION is_ref T_STRING &amp;#39;(&amp;#39; parameter_list &amp;#39;)&amp;#39; &amp;#39;{&amp;#39; inner_statement_list &amp;#39;}&amp;#39;

is_ref -&amp;gt; &amp;#39;&amp;amp;&amp;#39;
is_ref -&amp;gt; ε

parameter_list -&amp;gt; non_empty_parameter_list
parameter_list -&amp;gt; ε

non_empty_parameter_list -&amp;gt; parameter
non_empty_parameter_list -&amp;gt; non_empty_parameter_list &amp;#39;,&amp;#39; parameter

// ... ... ...&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What you see there is an excerpt from the PHP grammar (just a few sample rules). The syntax is slightly different from what we used before, but should be easy to understand. One aspect worth mentioning is that the uppercase &lt;code&gt;T_SOMETHING&lt;/code&gt; names here also are terminal symbols. These symbols which are usually called &lt;em&gt;tokens&lt;/em&gt; encode more abstract concepts. E.g. &lt;code&gt;T_FUNCTION&lt;/code&gt; represents the &lt;code&gt;function&lt;/code&gt; keyword and &lt;code&gt;T_STRING&lt;/code&gt; is a label token (like &lt;code&gt;getUserById&lt;/code&gt; or &lt;code&gt;some_other_name&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m using this example to show one thing: Context-free grammars are already powerful enough to encode quite complex languages. That&amp;#8217;s why pretty much all programming languages have a context-free grammar. In particular this also includes well-formed HTML.&lt;/p&gt;

&lt;p&gt;Now, back to the actual question: Can regular expressions match all context-free grammars? Once again, the answer is &lt;em&gt;yes&lt;/em&gt;!&lt;/p&gt;

&lt;p&gt;This is pretty easy to prove as regular expressions (at least PCRE and similar) provide a syntax very similar to the above for constructing grammars:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/
    (?(DEFINE)
        (?&amp;lt;addr_spec&amp;gt; (?&amp;amp;local_part) @ (?&amp;amp;domain) )
        (?&amp;lt;local_part&amp;gt; (?&amp;amp;dot_atom) | (?&amp;amp;quoted_string) | (?&amp;amp;obs_local_part) )
        (?&amp;lt;domain&amp;gt; (?&amp;amp;dot_atom) | (?&amp;amp;domain_literal) | (?&amp;amp;obs_domain) )
        (?&amp;lt;domain_literal&amp;gt; (?&amp;amp;CFWS)? \[ (?: (?&amp;amp;FWS)? (?&amp;amp;dtext) )* (?&amp;amp;FWS)? \] (?&amp;amp;CFWS)? )
        (?&amp;lt;dtext&amp;gt; [\x21-\x5a] | [\x5e-\x7e] | (?&amp;amp;obs_dtext) )
        (?&amp;lt;quoted_pair&amp;gt; \\ (?: (?&amp;amp;VCHAR) | (?&amp;amp;WSP) ) | (?&amp;amp;obs_qp) )
        (?&amp;lt;dot_atom&amp;gt; (?&amp;amp;CFWS)? (?&amp;amp;dot_atom_text) (?&amp;amp;CFWS)? )
        (?&amp;lt;dot_atom_text&amp;gt; (?&amp;amp;atext) (?: \. (?&amp;amp;atext) )* )
        (?&amp;lt;atext&amp;gt; [a-zA-Z0-9!#$%&amp;amp;&amp;#39;*+/=?^_`{|}~-]+ )
        (?&amp;lt;atom&amp;gt; (?&amp;amp;CFWS)? (?&amp;amp;atext) (?&amp;amp;CFWS)? )
        (?&amp;lt;word&amp;gt; (?&amp;amp;atom) | (?&amp;amp;quoted_string) )
        (?&amp;lt;quoted_string&amp;gt; (?&amp;amp;CFWS)? &amp;quot; (?: (?&amp;amp;FWS)? (?&amp;amp;qcontent) )* (?&amp;amp;FWS)? &amp;quot; (?&amp;amp;CFWS)? )
        (?&amp;lt;qcontent&amp;gt; (?&amp;amp;qtext) | (?&amp;amp;quoted_pair) )
        (?&amp;lt;qtext&amp;gt; \x21 | [\x23-\x5b] | [\x5d-\x7e] | (?&amp;amp;obs_qtext) )

        # comments and whitespace
        (?&amp;lt;FWS&amp;gt; (?: (?&amp;amp;WSP)* \r\n )? (?&amp;amp;WSP)+ | (?&amp;amp;obs_FWS) )
        (?&amp;lt;CFWS&amp;gt; (?: (?&amp;amp;FWS)? (?&amp;amp;comment) )+ (?&amp;amp;FWS)? | (?&amp;amp;FWS) )
        (?&amp;lt;comment&amp;gt; \( (?: (?&amp;amp;FWS)? (?&amp;amp;ccontent) )* (?&amp;amp;FWS)? \) )
        (?&amp;lt;ccontent&amp;gt; (?&amp;amp;ctext) | (?&amp;amp;quoted_pair) | (?&amp;amp;comment) )
        (?&amp;lt;ctext&amp;gt; [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | (?&amp;amp;obs_ctext) )

        # obsolete tokens
        (?&amp;lt;obs_domain&amp;gt; (?&amp;amp;atom) (?: \. (?&amp;amp;atom) )* )
        (?&amp;lt;obs_local_part&amp;gt; (?&amp;amp;word) (?: \. (?&amp;amp;word) )* )
        (?&amp;lt;obs_dtext&amp;gt; (?&amp;amp;obs_NO_WS_CTL) | (?&amp;amp;quoted_pair) )
        (?&amp;lt;obs_qp&amp;gt; \\ (?: \x00 | (?&amp;amp;obs_NO_WS_CTL) | \n | \r ) )
        (?&amp;lt;obs_FWS&amp;gt; (?&amp;amp;WSP)+ (?: \r\n (?&amp;amp;WSP)+ )* )
        (?&amp;lt;obs_ctext&amp;gt; (?&amp;amp;obs_NO_WS_CTL) )
        (?&amp;lt;obs_qtext&amp;gt; (?&amp;amp;obs_NO_WS_CTL) )
        (?&amp;lt;obs_NO_WS_CTL&amp;gt; [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f )

        # character class definitions
        (?&amp;lt;VCHAR&amp;gt; [\x21-\x7E] )
        (?&amp;lt;WSP&amp;gt; [ \t] )
    )
    ^(?&amp;amp;addr_spec)$
/x&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What you see above is a regular expression for matching email addresses as per &lt;a href='http://tools.ietf.org/html/rfc5322'&gt;RFC 5322&lt;/a&gt;. It was constructed simply by transforming the BNF rules from the RFC into a notation that PCRE understands.&lt;/p&gt;

&lt;p&gt;The syntax is quite simple:&lt;/p&gt;

&lt;p&gt;All rules definitions are wrapped into a &lt;code&gt;DEFINE&lt;/code&gt; assertion, which basically means that all those rules should not be directly matched against, they should just be defined. Only the &lt;code&gt;^(?&amp;amp;addr_spec)$&lt;/code&gt; part at the end specifies what should be matched.&lt;/p&gt;

&lt;p&gt;The rule definitions are actually not really &amp;#8220;rules&amp;#8221; but rather named subpatterns. In the previous &lt;code&gt;(a(?1)?b)&lt;/code&gt; example the &lt;code&gt;1&lt;/code&gt; referenced the first subpattern. With many subpatterns this obviously is impractical, thus they can be named. So &lt;code&gt;(?&amp;lt;xyz&amp;gt; ...)&lt;/code&gt; defines a pattern with name &lt;code&gt;xyz&lt;/code&gt;. &lt;code&gt;(?&amp;amp;xyz)&lt;/code&gt; then references it.&lt;/p&gt;

&lt;p&gt;Also, pay attention to another fact: The regular expression above uses the &lt;code&gt;x&lt;/code&gt; modifier. This instructs the engine to ignore whitespace and to allow &lt;code&gt;#&lt;/code&gt;-style comments. This way you can nicely format the regex, so that other people can actually understand it. (Much unlike &lt;a href='http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html'&gt;this&lt;/a&gt; RFC 822 email address regex&amp;#8230;)&lt;/p&gt;

&lt;p&gt;The above syntax thus allows simple mappings from grammars to regular expressions:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;A -&amp;gt; B C
A -&amp;gt; C D
// becomes
(?&amp;lt;A&amp;gt; (?&amp;amp;B) (?&amp;amp;C)
    | (?&amp;amp;C) (?&amp;amp;D)
)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The only catch is: Regular expressions don&amp;#8217;t support left recursion. E.g. taking the above definition of a parameter list:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;non_empty_parameter_list -&amp;gt; parameter
non_empty_parameter_list -&amp;gt; non_empty_parameter_list &amp;#39;,&amp;#39; parameter&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You &lt;em&gt;can&amp;#8217;t&lt;/em&gt; directly convert it into a grammar based regex. The following will not work:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(?&amp;lt;non_empty_parameter_list&amp;gt;
    (?&amp;amp;parameter)
  | (?&amp;amp;non_empty_parameter_list) , (?&amp;amp;parameter)
)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The reason is that here &lt;code&gt;non_empty_parameter_list&lt;/code&gt; appears as the leftmost part of it&amp;#8217;s own rule definition. This is called left-recursion and is very common in grammar definitions. The reason is that the LALR(1) parsers which are usually used to parse them handle left-recursion much better than right-recursion.&lt;/p&gt;

&lt;p&gt;But, no fear, this does not affect the power of regular expressions at all. Every left-recursive grammar can be transformed to a right-recursive one. In the above example it&amp;#8217;s as simple as swapping the two parts:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;non_empty_parameter_list -&amp;gt; parameter
non_empty_parameter_list -&amp;gt; parameter &amp;#39;,&amp;#39; non_empty_parameter_list&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So now it should be clear that regular expressions can match any context-free language (and thus pretty much all languages which programmers are confronted with). Only problem is: Even though regular expressions can &lt;em&gt;match&lt;/em&gt; context-free languages nicely, they can&amp;#8217;t usually &lt;em&gt;parse&lt;/em&gt; them. Parsing means converting some string into an abstract syntax tree. This is not possible using regular expressions, at least not with PCRE (sure, in Perl where you can embed arbitrary code into a regex you can do pretty much everything&amp;#8230;).&lt;/p&gt;

&lt;p&gt;Still, the above &lt;code&gt;DEFINE&lt;/code&gt; based regex definition has proven to be &lt;em&gt;very&lt;/em&gt; useful to me. Usually you don&amp;#8217;t need full parsing support, but want to just match (e.g. email addresses) or extract small pieces of data (not the whole parse tree). Most complex string processing problems can be made much simpler using grammar based regexes :)&lt;/p&gt;

&lt;p&gt;At this point, let me point out again what I already quickly mentioned earlier: Well-formed HTML is context-free. So you &lt;em&gt;can&lt;/em&gt; match it using regular expressions, contrary to popular opinion. But don&amp;#8217;t forget two things: Firstly, most HTML you see in the wild is &lt;em&gt;not&lt;/em&gt; well-formed (usually not even close to it). And secondly, just because you &lt;em&gt;can&lt;/em&gt;, doesn&amp;#8217;t mean that you &lt;em&gt;should&lt;/em&gt;. You could write your software in Brainfuck, still for some reason you don&amp;#8217;t.&lt;/p&gt;

&lt;p&gt;My opinion on the topic is: Whenever you need generic HTML processing, use a DOM library of your choice. It&amp;#8217;ll gracefully handle malformed HTML and take the burden of parsing from you. On the other hand if you are dealing with specific situations a quick regular expression is often the way to go. And I have to admit: Even though I often tell people to not parse HTML with regular expressions I do it myself notoriously often. Simply because in most cases I deal with specific and contained situations in which using regex is just simpler.&lt;/p&gt;

&lt;h2 id='contextsensitive_grammars'&gt;Context-sensitive grammars&lt;/h2&gt;

&lt;p&gt;Now that we covered context-free languages extensively, let&amp;#8217;s move up one step in the Chomsky hierarchy: Context-sensitive languages.&lt;/p&gt;

&lt;p&gt;In a context-sensitive language all production rules have the following form:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;αAβ → αγβ&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This mix of characters might start to look more complicated, but it is actually quite simple. At it&amp;#8217;s core you still have the pattern &lt;code&gt;A → γ&lt;/code&gt;, which was how we defined context-free grammars. The new thing now is that you additionally have &lt;code&gt;α&lt;/code&gt; and &lt;code&gt;β&lt;/code&gt; on both sides. Those two form the &lt;em&gt;context&lt;/em&gt; (which also gives this grammar class the name). So basically &lt;code&gt;A&lt;/code&gt; can now only be replaced with &lt;code&gt;γ&lt;/code&gt; if it has &lt;code&gt;α&lt;/code&gt; to its left and &lt;code&gt;β&lt;/code&gt; to its right.&lt;/p&gt;

&lt;p&gt;To make this more clear, try to interpret the following rules:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;a b A -&amp;gt; a b c
a B c -&amp;gt; a Q H c
H B -&amp;gt; H C&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The English translations would be:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Replace `A` with `c`, but only if it has `a b` to its left.
Replace `B` with `Q H`, but only if it has `a` to its left and `c` to its right.
Replace `B` with `C`, but only if it has `H` to its left.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Context-sensitive languages are something that you will rarely encounter during &amp;#8220;normal&amp;#8221; programming. They are mostly important in the context of natural language processing (as natural languages are clearly not context-free. Words have different meaning depending on context). But even in natural language processing people usually work with so called &amp;#8220;mildly context-sensitive languages&amp;#8221;, as they are sufficient for modeling the language but can be parsed much faster.&lt;/p&gt;

&lt;p&gt;To understand just how powerful context-sensitive grammars are let&amp;#8217;s look at another grammar class, which has the exact same expressive power as the context-sensitive ones: Non-contracting grammars.&lt;/p&gt;

&lt;p&gt;With non-contracting grammars every production rule has the form &lt;code&gt;α -&amp;gt; β&lt;/code&gt; where both &lt;code&gt;α&lt;/code&gt; and &lt;code&gt;β&lt;/code&gt; are arbitrary symbol strings with just one restriction: The number of symbols on the right hand side is not less than on the left hand side. Formally this is expressed in the formula &lt;code&gt;|α| &amp;lt;= |β|&lt;/code&gt; where &lt;code&gt;|x|&lt;/code&gt; denotes the length of the symbol string.&lt;/p&gt;

&lt;p&gt;So non-contracting grammars allow rules of any form as long as they don&amp;#8217;t shorten the input. E.g. &lt;code&gt;A B C -&amp;gt; H Q&lt;/code&gt; would be an invalid rule as the left hand side has three symbols and the right hand side only two. Thus this rule would be shortening (or &amp;#8220;contracting&amp;#8221;). The reverse rule &lt;code&gt;H Q -&amp;gt; A B C&lt;/code&gt; on the other hand would be valid, as the right side has more symbols than the left, thus being lengthening.&lt;/p&gt;

&lt;p&gt;This equivalence relationship of context-sensitive grammars and non-contracting grammars should make pretty clear that you can match near-everything with a context-sensitive grammar. Just don&amp;#8217;t shorten :)&lt;/p&gt;

&lt;p&gt;To get an impression of why both grammar kinds have the same expressive power look at the following transformation example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// the non-contracting grammar
A B -&amp;gt; C D
// can be transformed to the following context-sensitive grammar
A B -&amp;gt; A X
A X -&amp;gt; Y X
Y X -&amp;gt; Y D
Y D -&amp;gt; C D&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Anyways, back to regular expressions. Can they match context-sensitive languages too?&lt;/p&gt;

&lt;p&gt;This time I can&amp;#8217;t gave you definite answer. They certainly can match &lt;em&gt;some&lt;/em&gt; context-sensitive languages, but I don&amp;#8217;t know whether they can match &lt;em&gt;all&lt;/em&gt; of them.&lt;/p&gt;

&lt;p&gt;An example of a context-sensitive language that can be easily matched using regex is a modification of the context-free language &lt;code&gt;{a^n b^n, n&amp;gt;0}&lt;/code&gt; mentioned above. When you change it into &lt;code&gt;{a^n b^n c^n, n&amp;gt;0}&lt;/code&gt;, i.e. some number of &lt;code&gt;a&lt;/code&gt;s followed by the same number of &lt;code&gt;b&lt;/code&gt;s and &lt;code&gt;c&lt;/code&gt;s, it becomes context-sensitive.&lt;/p&gt;

&lt;p&gt;The PCRE regex for this language is this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/^
    (?=(a(?-1)?b)c)
     a+(b(?-1)?c)
$/x&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you ignore the &lt;code&gt;(?=...)&lt;/code&gt; assertion for now you&amp;#8217;re left with &lt;code&gt;a+(b(?-1)?c)&lt;/code&gt;. This checks that there is an arbitrary number of &lt;code&gt;a&lt;/code&gt;s, followed by the same number of &lt;code&gt;b&lt;/code&gt;s and &lt;code&gt;c&lt;/code&gt;s. The &lt;code&gt;(?-1)&lt;/code&gt; is a relative subpattern reference and means &amp;#8220;the last defined subpattern&amp;#8221;, which is &lt;code&gt;(b(?-1)?c)&lt;/code&gt; in this case.&lt;/p&gt;

&lt;p&gt;The new thing now is the &lt;code&gt;(?=...)&lt;/code&gt; which is a so called zero-width lookahead assertion. It checks that the following text matches the pattern, but it does not actually consume the text. Thus the text is basically checked against both patterns at the same time. The &lt;code&gt;a+(b(?-1)?c)&lt;/code&gt; part verifies that the number of &lt;code&gt;b&lt;/code&gt;s and &lt;code&gt;c&lt;/code&gt;s is the same and the &lt;code&gt;(a(?-1)?b)c&lt;/code&gt; part checks that the number of &lt;code&gt;a&lt;/code&gt;s and &lt;code&gt;b&lt;/code&gt;s is the same. Both pattern together thus ensure that the number of all three characters is the same.&lt;/p&gt;

&lt;p&gt;In the above regex you can already see how the concept of &amp;#8220;context&amp;#8221; is realized in regular expressions: Using assertions. If we get back to the definition of a context-sensitive grammar, you could now say that a production rule of type&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;αAβ → αγβ&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;can be converted into the following regex &lt;code&gt;DEFINE&lt;/code&gt; rule:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;(?&amp;lt;A&amp;gt; (?&amp;lt;= α ) γ (?= β ) )&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This would then say that &lt;code&gt;A&lt;/code&gt; is &lt;code&gt;γ&lt;/code&gt;, but only if it has &lt;code&gt;α&lt;/code&gt; to its left and &lt;code&gt;β&lt;/code&gt; to its right.&lt;/p&gt;

&lt;p&gt;Now, the above might look as if you can easily convert a context-sensitive grammar into a regular expression, but its not actually true. The reason is that lookbehind assertions (&lt;code&gt;(?&amp;lt;= ... )&lt;/code&gt;) have one very significant limitation: They have to be fixed-width. This means that the length of the text matched by the assertion has to known in advance. E.g. you can write &lt;code&gt;(?&amp;lt;= a(bc|cd) )&lt;/code&gt;, but you can&amp;#8217;t write &lt;code&gt;(?&amp;lt;= ab+)&lt;/code&gt;. In the first case the assertion matches exactly three characters in any case, thus being fixed-width. In the second case on the other hand the assertion could match &lt;code&gt;ab&lt;/code&gt;, &lt;code&gt;abb&lt;/code&gt;, &lt;code&gt;abbb&lt;/code&gt; etc. All of those have different lengths. Thus the engine can&amp;#8217;t know when it should start to match them and as such they are simply disallowed.&lt;/p&gt;

&lt;p&gt;This pretty much blows the easy conversion of context-sensitive grammars to regex. Pretty much all such grammars require variable-width lookbehind assertions.&lt;/p&gt;

&lt;p&gt;But the fact that there is no direct context-sensitive grammar to regex conversion doesn&amp;#8217;t by itself mean that regular expressions can&amp;#8217;t match all of them. E.g. the above &lt;code&gt;{a^n b^n c^n, n&amp;gt;0}&lt;/code&gt; language also has a grammar that would require variable-width lookbehind assertions. But we can still avoid using them as regex isn&amp;#8217;t bound to specifying rules in a grammar. Maybe the same is possible for all other context-sensitive grammars too. I honestly don&amp;#8217;t know.&lt;/p&gt;

&lt;p&gt;So, what can we say here? Regex can match at least &lt;em&gt;some&lt;/em&gt; context-sensitive languages, but it&amp;#8217;s unknown whether it can match &lt;em&gt;all&lt;/em&gt; of them.&lt;/p&gt;

&lt;h2 id='unrestricted_grammars'&gt;Unrestricted grammars&lt;/h2&gt;

&lt;p&gt;The next grammar class in the Chomsky hierarchy are the unrestricted grammars. The language set which one can form using them is the set of all recursively enumerable languages.&lt;/p&gt;

&lt;p&gt;There is little to say about unrestricted grammars as they are, well, unrestricted. Production rules for unrestricted grammars have the form &lt;code&gt;α -&amp;gt; β&lt;/code&gt;, where &lt;code&gt;α&lt;/code&gt; and &lt;code&gt;β&lt;/code&gt; are symbol strings with no restrictions whatsoever.&lt;/p&gt;

&lt;p&gt;So basically unrestricted grammars remove the &amp;#8220;non-contracting&amp;#8221; part of the non-contracting grammars. Thus for them &lt;code&gt;A B C -&amp;gt; H Q&lt;/code&gt; would be a valid rule, even though previously it wasn&amp;#8217;t.&lt;/p&gt;

&lt;p&gt;How powerful are unrestricted grammars exactly? They are as powerful as it gets: They are Turing-complete. There even is a &amp;#8220;programming language&amp;#8221; which is based on unrestricted grammars: &lt;a href='http://en.wikipedia.org/wiki/Thue_%28programming_language%29'&gt;Thue&lt;/a&gt;. As it is Turing-complete it can do everything that other languages can do.&lt;/p&gt;

&lt;p&gt;One implication of being Turing-complete is that checking whether a certain string adheres to some grammar is undecidable for the general case.&lt;/p&gt;

&lt;p&gt;Sadly I can&amp;#8217;t say anything whatsoever about how regular expressions and unrestricted grammars relate. Heck, I couldn&amp;#8217;t even find an example of a meaningful unrestricted grammar (that wasn&amp;#8217;t non-contracting).&lt;/p&gt;

&lt;p&gt;But now that we started talking about Turing-completeness we get to another point:&lt;/p&gt;

&lt;h2 id='regular_expressions_with_backreferences_are_npcomplete'&gt;Regular expressions with backreferences are NP-complete&lt;/h2&gt;

&lt;p&gt;There is another very powerful regular expression feature that I did not mentioned previously: backreferences.&lt;/p&gt;

&lt;p&gt;E.g. consider this very simple regex:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/^(.+)\1$/&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;(.+)&lt;/code&gt; matches some arbitrary text and &lt;code&gt;\1&lt;/code&gt; matches the &lt;em&gt;same&lt;/em&gt; text. In general &lt;code&gt;\n&lt;/code&gt; means &amp;#8220;whatever the &lt;code&gt;n&lt;/code&gt;th subpattern matched&amp;#8221;. E.g. if &lt;code&gt;(.+)&lt;/code&gt; matched &lt;code&gt;foo&lt;/code&gt;, then &lt;code&gt;\1&lt;/code&gt; will also match only &lt;code&gt;foo&lt;/code&gt; and nothing else. Thus the expression &lt;code&gt;(.+)\1&lt;/code&gt; means &amp;#8220;Some text followed by a copy of itself&amp;#8221;.&lt;/p&gt;

&lt;p&gt;What this simple regex matches is called the &amp;#8220;copy language&amp;#8221; and is another typical example of a context-sensitive language.&lt;/p&gt;

&lt;p&gt;Similarly you can match the other example grammars from above using backreferences:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;# {a^n b^n, n&amp;gt;0} (context-free)
/^ (?: a (?= a* (\1?+ b) ) )+ \1 $/x
# {a^n b^n c^n, n&amp;gt;0} (context-sensitive)
/^ (?: a (?= a* (\1?+ b) b* (\2?+ c) ) )+ \1 \2 $/x&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Explaining how these work is outside the scope of this article, but you can read an excellent explanation on &lt;a href='http://stackoverflow.com/a/3644267/385378'&gt;StackOverflow&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As you can see, the mere addition of backreference (without subpattern recursion support) already adds a lot of power to regular expressions. The addition is actually so powerful that it makes matching of regular expressions an NP-complete problem.&lt;/p&gt;

&lt;p&gt;What does NP-complete mean? NP-complete is a computational complexity class for decision problems in which many &amp;#8220;hard&amp;#8221; problems fall. Some examples of NP-complete problems are the traveling salesman problem (TSP), the boolean satisfiability problem (SAT) and the knapsack problem (BKP).&lt;/p&gt;

&lt;p&gt;One of the main conditions for a problem being NP-complete is that every other NP problem is reducible to it. Thus all NP-complete problems are basically interchangeable. If you find a fast solution to one of them, you got a fast solution to all of them.&lt;/p&gt;

&lt;p&gt;So if somebody found a fast solution to a NP-complete problem, pretty much all of the computationally hard problems of humanity would be solved all in one strike. This would mean the end to civilisation as we know.&lt;/p&gt;

&lt;p&gt;To prove that regular expressions with backreferences are indeed NP-complete one can simply take one of the known NP-complete problems and prove that it can be solved using regular expressions. As an example I choose the 3-CNF SAT problem:&lt;/p&gt;

&lt;p&gt;3-CNF SAT stands for &amp;#8220;3-conjunctive normal form boolean satisfiability problem&amp;#8221; and is quite easy to understand. You get a boolean formula of the following form:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;   (!$a ||  $b ||  $d)
&amp;amp;&amp;amp; ( $a || !$c ||  $d)
&amp;amp;&amp;amp; ( $a || !$b || !$d)
&amp;amp;&amp;amp; ( $b || !$c || !$d)
&amp;amp;&amp;amp; (!$a ||  $c || !$d)
&amp;amp;&amp;amp; ( $a ||  $b ||  $c)
&amp;amp;&amp;amp; (!$a || !$b || !$c)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Thus the boolean formula is made up of a number of clauses separated by ANDs. Each of those clauses consists of three variables (or their negations) separated by ORs. The 3-CNF SAT problem now asks whether there exists a solution to the given boolean formula (such that it is true).&lt;/p&gt;

&lt;p&gt;The above boolean formula can be converted to the following regular expression:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$regex&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;/^&lt;/span&gt;
&lt;span class='s1'&gt;    (x?)(x?)(x?)(x?) .* ;&lt;/span&gt;
&lt;span class='s1'&gt;    (?: x\1 | \2  | \4  ),&lt;/span&gt;
&lt;span class='s1'&gt;    (?: \1  | x\3 | \4  ),&lt;/span&gt;
&lt;span class='s1'&gt;    (?: \1  | x\2 | x\4 ),&lt;/span&gt;
&lt;span class='s1'&gt;    (?: \2  | x\3 | x\4 ),&lt;/span&gt;
&lt;span class='s1'&gt;    (?: x\1 | \3  | x\4 ),&lt;/span&gt;
&lt;span class='s1'&gt;    (?: \1  | \2  | \3  ),&lt;/span&gt;
&lt;span class='s1'&gt;    (?: x\1 | x\2 | x\3 ),&lt;/span&gt;
&lt;span class='s1'&gt;$/x&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$string&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;xxxx;x,x,x,x,x,x,x,&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$regex&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$matches&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$matches&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;If you run this code you&amp;#8217;ll get the following &lt;code&gt;$matches&lt;/code&gt; result:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;array(5) {
  [0]=&amp;gt; string(19) &amp;quot;xxxx;x,x,x,x,x,x,x,&amp;quot;
  [1]=&amp;gt; string(1) &amp;quot;x&amp;quot;
  [2]=&amp;gt; string(1) &amp;quot;x&amp;quot;
  [3]=&amp;gt; string(0) &amp;quot;&amp;quot;
  [4]=&amp;gt; string(0) &amp;quot;&amp;quot;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This means that the above formula is satisfied if &lt;code&gt;$a = true&lt;/code&gt;, &lt;code&gt;$b = true&lt;/code&gt;, &lt;code&gt;$c = false&lt;/code&gt; and &lt;code&gt;$d = false&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The regular expression works with a very simple trick: For ever 3-clause the string contains a &lt;code&gt;x,&lt;/code&gt; which has to be matched. So if you have something like &lt;code&gt;(?: \1 | x\3 | \4 ),&lt;/code&gt; in the regex, then the string can be only matched if either &lt;code&gt;\1&lt;/code&gt; is &lt;code&gt;x&lt;/code&gt; (true), &lt;code&gt;\3&lt;/code&gt; is the empty string (false) or &lt;code&gt;\4&lt;/code&gt; is &lt;code&gt;x&lt;/code&gt; (true).&lt;/p&gt;

&lt;p&gt;The rest is left up to the engine. It&amp;#8217;ll try out different ways of matching the string until it either finds a solution or has to give up.&lt;/p&gt;

&lt;h2 id='wrapping_up'&gt;Wrapping up&lt;/h2&gt;

&lt;p&gt;As the article was quite long, here a summary of the main points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &amp;#8220;regular expressions&amp;#8221; used by programmers have very little in common with the original notion of regularity in the context of formal language theory.&lt;/li&gt;

&lt;li&gt;Regular expressions (at least PCRE) can match all context-free languages. As such they can also match well-formed HTML and pretty much all other programming languages.&lt;/li&gt;

&lt;li&gt;Regular expressions can match at least some context-sensitive languages.&lt;/li&gt;

&lt;li&gt;Matching of regular expressions is NP-complete. As such you can solve any other NP problem using regular expressions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But don&amp;#8217;t forget: Just because you &lt;em&gt;can&lt;/em&gt;, doesn&amp;#8217;t mean that you &lt;em&gt;should&lt;/em&gt;. Processing HTML with regular expressions is a really bad idea in some cases. In other cases it&amp;#8217;s probably the best thing to do.&lt;/p&gt;

&lt;p&gt;Just check what the easiest solution to your particular problem is and use it. If you choose to solve a problem using regular expressions, don&amp;#8217;t forget about the &lt;code&gt;x&lt;/code&gt; modifier, which allows you to nicely format your regex. For complex regular expressions also don&amp;#8217;t forget to make use of &lt;code&gt;DEFINE&lt;/code&gt; assertions and named subpatterns to keep your code clean and readable.&lt;/p&gt;

&lt;p&gt;That&amp;#8217;s it.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Understanding PHP's internal array implementation (PHP's Source Code for PHP Developers - Part 4)</title>
      <link>http://nikic.github.com/2012/03/28/Understanding-PHPs-internal-array-implementation.html</link>
      <pubDate>Wed, 28 Mar 2012 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/03/28/Understanding-PHPs-internal-array-implementation</guid>
      <description>&lt;p&gt;Welcome back to the fourth part of the &amp;#8220;PHP&amp;#8217;s Source Code for PHP Developers&amp;#8221; series, in which we&amp;#8217;ll cover how PHP arrays are internally represented and used throughout the code base.&lt;/p&gt;

&lt;p&gt;In case you missed them, here are the previous parts of this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='http://blog.ircmaxell.com/2012/03/phps-source-code-for-php-developers.html'&gt;Part 1: Structure of the source code and introduction to C&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://nikic.github.com/2012/03/16/Understanding-PHPs-internal-function-definitions.html'&gt;Part 2: Finding and understanding PHP&amp;#8217;s internal function definitions&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://blog.ircmaxell.com/2012/03/phps-source-code-for-php-developers_21.html'&gt;Part 3: PHP&amp;#8217;s internal value representation: The zval&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id='everything_is_a_hash_table'&gt;Everything is a hash table!&lt;/h2&gt;

&lt;p&gt;Basically, everything in PHP is a hash table. Not only are hash tables used in the underlying implementation of PHP arrays, they are also used to store object properties and methods, functions, variables and pretty much everything else.&lt;/p&gt;

&lt;p&gt;And because the hash table is so fundamental to PHP, it is worth having a deeper look into how it works.&lt;/p&gt;

&lt;h2 id='so_what_is_a_hash_table'&gt;So, what is a hash table?&lt;/h2&gt;

&lt;p&gt;Remember that in C arrays are basically chunks of memory, which you can access by index. Thus arrays in C only have integer keys and have to be continuous (i.e. you can&amp;#8217;t have a key &lt;code&gt;0&lt;/code&gt; and the next key is &lt;code&gt;1332423442&lt;/code&gt;). There is no such thing as an associative array.&lt;/p&gt;

&lt;p&gt;And this is where hash tables come in: They convert string keys into normal integer keys using a hash function. The result can then be used as an index into a normal C array (aka chunk of memory). The problem here obviously is that the hash function can have collisions, i.e. multiple string keys can yield the same hash. For example in a PHP array with up to 64 elements the strings &lt;code&gt;&amp;quot;foo&amp;quot;&lt;/code&gt; and &lt;code&gt;&amp;quot;oof&amp;quot;&lt;/code&gt; would have the same hash.&lt;/p&gt;

&lt;p&gt;This problem is solved by not storing the value directly at the generated index, but storing a linked list of &lt;em&gt;possible&lt;/em&gt; values instead.&lt;/p&gt;

&lt;h2 id='hashtable_and_bucket'&gt;HashTable and Bucket&lt;/h2&gt;

&lt;p&gt;So, after the basic concept of hash tables is clear, let&amp;#8217;s have a look at the structures actually used in PHP&amp;#8217;s hash table implementation:&lt;/p&gt;

&lt;p&gt;The first one is the &lt;a href='http://lxr.php.net/xref/PHP_5_4/Zend/zend_hash.h#66'&gt;&lt;code&gt;HashTable&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;typedef struct _hashtable {
    uint nTableSize;
    uint nTableMask;
    uint nNumOfElements;
    ulong nNextFreeElement;
    Bucket *pInternalPointer;
    Bucket *pListHead;
    Bucket *pListTail;
    Bucket **arBuckets;
    dtor_func_t pDestructor;
    zend_bool persistent;
    unsigned char nApplyCount;
    zend_bool bApplyProtection;
#if ZEND_DEBUG
    int inconsistent;
#endif
} HashTable;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Let&amp;#8217;s quickly go through it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;nNumOfElements&lt;/code&gt; specifies how many values are currently stored in the array. This is also the number that &lt;code&gt;count($array)&lt;/code&gt; returns.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;nTableSize&lt;/code&gt; specifies the size of the internal C array. It is always the next power of 2 greater or equal to &lt;code&gt;nNumOfElements&lt;/code&gt;. E.g. if an array stores 32 elements, the internal C array also has a size of 32. But if one more element is added, i.e. the array then contains 33 elements, the internal C array is resized to 64 elements.&lt;/p&gt;

&lt;p&gt;This is done to always keep the hash table efficient in space and time. It is clear that if the internal array is too small there will be many collisions and the performance will degrade. If the internal array is too big on the other hand, we&amp;#8217;d be wasting memory. The power-of-2 size is a good compromise.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;nTableMask&lt;/code&gt; is the table size minus one. This mask is used to adjust the generated hashes for the current table size. For example the actual hash for &lt;code&gt;&amp;quot;foo&amp;quot;&lt;/code&gt; (through the &lt;a href='http://lxr.php.net/opengrok/xref/PHP_5_4/Zend/zend_hash.h#261'&gt;DJBX33A hashing function&lt;/a&gt;) is 193491849. If we currently have a table size of 64, we obviously can&amp;#8217;t use that as an index into the array. Instead we only take the lower bits of the hash by applying the table mask:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;   hash  |   193491849 |   0b1011100010000111001110001001
 &amp;amp; mask  | &amp;amp;        63 | &amp;amp; 0b0000000000000000000000111111
---------------------------------------------------------
 = index | =         9 | = 0b0000000000000000000000001001&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;nNextFreeElement&lt;/code&gt; is the next free integer key, which is used when you append to an array using &lt;code&gt;$array[] = xyz&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;pInternalPointer&lt;/code&gt; stores the current position in the array. This is used for &lt;code&gt;foreach&lt;/code&gt; iteration and can be accessed using the &lt;code&gt;reset()&lt;/code&gt;, &lt;code&gt;current()&lt;/code&gt;, &lt;code&gt;key()&lt;/code&gt;, &lt;code&gt;next()&lt;/code&gt;, &lt;code&gt;prev()&lt;/code&gt; and &lt;code&gt;end()&lt;/code&gt; functions.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;pListHead&lt;/code&gt; and &lt;code&gt;pListTail&lt;/code&gt; specify the first and last element of the array. Remember: PHP arrays have an order. E.g. &lt;code&gt;[&amp;#39;foo&amp;#39; =&amp;gt; &amp;#39;bar&amp;#39;, &amp;#39;bar&amp;#39; =&amp;gt; &amp;#39;foo&amp;#39;]&lt;/code&gt; and &lt;code&gt;[&amp;#39;bar&amp;#39; =&amp;gt; &amp;#39;foo&amp;#39;, &amp;#39;foo&amp;#39; =&amp;gt; &amp;#39;bar&amp;#39;]&lt;/code&gt; contain the same elements, but have a different order.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;arBuckets&lt;/code&gt; is the &amp;#8220;internal C array&amp;#8221; we were always talking about. It&amp;#8217;s defined as a &lt;code&gt;Bucket **&lt;/code&gt;, so it can be seen as an array of bucket pointers (we&amp;#8217;ll get to what exactly a Bucket is in a minute).&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;pDestructor&lt;/code&gt; is the destructor for the values. If a value is removed from the HT this function will be called on it. For a normal array the destructor function is &lt;code&gt;zval_ptr_dtor&lt;/code&gt;. &lt;code&gt;zval_ptr_dtor&lt;/code&gt; will reduce the reference count of the &lt;code&gt;zval&lt;/code&gt; and, if it reaches 0, destroy and free it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last four properties aren&amp;#8217;t really of interest to us. So let&amp;#8217;s just say that &lt;code&gt;persistent&lt;/code&gt; specifies that the hash table can live between multiple requests, &lt;code&gt;nApplyCount&lt;/code&gt; and &lt;code&gt;bApplyProtection&lt;/code&gt; are used to prevent infinite recursion in some places and &lt;code&gt;inconsistent&lt;/code&gt; is used to catch incorrect uses of hash tables in debug mode.&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s move on to the second important structure: &lt;a href='http://lxr.php.net/xref/PHP_5_4/Zend/zend_hash.h#54'&gt;&lt;code&gt;Bucket&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;typedef struct bucket {
    ulong h;
    uint nKeyLength;
    void *pData;
    void *pDataPtr;
    struct bucket *pListNext;
    struct bucket *pListLast;
    struct bucket *pNext;
    struct bucket *pLast;
    const char *arKey;
} Bucket;&lt;/code&gt;&lt;/pre&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;h&lt;/code&gt; is the hash (without the table mask applied).&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;arKey&lt;/code&gt; is used to save string keys. &lt;code&gt;nKeyLength&lt;/code&gt; is the corresponding length. For integer keys those two aren&amp;#8217;t used.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;pData&lt;/code&gt; or &lt;code&gt;pDataPtr&lt;/code&gt; is used to store the actual value. For PHP arrays that value is a &lt;code&gt;zval&lt;/code&gt; (but it&amp;#8217;s also used for other things internally.) Don&amp;#8217;t bother with the fact that there are two properties for this. The difference between them is who is responsible for freeing the value.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;pListNext&lt;/code&gt; and &lt;code&gt;pListLast&lt;/code&gt; specify the order of the array elements. If PHP wants to traverse the array it starts off at the &lt;code&gt;pListHead&lt;/code&gt; bucket (specified in the &lt;code&gt;HashTable&lt;/code&gt; struct) and then always takes the &lt;code&gt;pListNext&lt;/code&gt; bucket. The same works in reverse, by starting at &lt;code&gt;pListTail&lt;/code&gt; and always following &lt;code&gt;pListLast&lt;/code&gt;. (You can do this in userland by calling &lt;code&gt;end()&lt;/code&gt; and then always calling &lt;code&gt;prev()&lt;/code&gt;.)&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;code&gt;pNext&lt;/code&gt; and &lt;code&gt;pLast&lt;/code&gt; form the &amp;#8220;linked list of possible values&amp;#8221; I mentioned above. The &lt;code&gt;arBuckets&lt;/code&gt; array stores a pointer to the first possible bucket. If that bucket hasn&amp;#8217;t the right key, PHP will look at the bucket which &lt;code&gt;pNext&lt;/code&gt; points to. This is done until the right bucket is found. &lt;code&gt;pLast&lt;/code&gt; can be used to do the same in reverse.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you can see PHP&amp;#8217;s hash table implementation is fairly complex. This is the price one has to pay for its ultra-flexible array type.&lt;/p&gt;

&lt;h2 id='how_are_hash_tables_used'&gt;How are hash tables used?&lt;/h2&gt;

&lt;p&gt;The Zend Engine defines a large number of API functions for the work with hash tables. An overview of low-level hash table functions can be found in &lt;a href='http://lxr.php.net/xref/PHP_5_4/Zend/zend_hash.h#98'&gt;&lt;code&gt;zend_hash.h&lt;/code&gt;&lt;/a&gt;. Additionally the ZE defines a set of slightly higher level APIs in &lt;a href='http://lxr.php.net/opengrok/xref/PHP_TRUNK/Zend/zend_API.h#353'&gt;&lt;code&gt;zend_API.h&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We don&amp;#8217;t have time to go through all of these, but we can look at a sample function to see at least some of them in action. We&amp;#8217;ll use &lt;a href='http://de.php.net/array_fill_keys'&gt;&lt;code&gt;array_fill_keys&lt;/code&gt;&lt;/a&gt; as that sample function.&lt;/p&gt;

&lt;p&gt;Using the technique outlined in the &lt;a href='http://nikic.github.com/2012/03/16/Understanding-PHPs-internal-function-definitions.html'&gt;second part&lt;/a&gt; you should be able to find the function definition in &lt;a href='http://lxr.php.net/opengrok/xref/PHP_5_4/ext/standard/array.c#1583'&gt;&lt;code&gt;ext/standard/array.c&lt;/code&gt;&lt;/a&gt; easily. Let&amp;#8217;s quickly walk through it now.&lt;/p&gt;

&lt;p&gt;As always there&amp;#8217;s a set of variable declarations and a &lt;code&gt;zend_parse_parameters&lt;/code&gt; call at the top:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;zval *keys, *val, **entry;
HashPosition pos;

if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, &amp;quot;az&amp;quot;, &amp;amp;keys, &amp;amp;val) == FAILURE) {
    return;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;az&lt;/code&gt; obviously means that the first parameter is an &lt;strong&gt;a&lt;/strong&gt;rray (fetched into the &lt;code&gt;keys&lt;/code&gt; variable) and the second one is an arbitrary &lt;strong&gt;z&lt;/strong&gt;val (fetched into the &lt;code&gt;val&lt;/code&gt; variable).&lt;/p&gt;

&lt;p&gt;After the parameters are parsed the array to be returned is initialized:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/* Initialize return array */
array_init_size(return_value, zend_hash_num_elements(Z_ARRVAL_P(keys)));&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This line contains already three important parts of the array API:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;Z_ARRVAL_P&lt;/code&gt; macro fetches the hash table from a &lt;code&gt;zval&lt;/code&gt;.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;zend_hash_num_elements&lt;/code&gt; fetches the number of elements in a hash table (the &lt;code&gt;nNumOfElements&lt;/code&gt; property).&lt;/li&gt;

&lt;li&gt;&lt;code&gt;array_init_size&lt;/code&gt; initializes an array with a size hint.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So this line initializes an array into &lt;code&gt;return_value&lt;/code&gt; with the same size as the &lt;code&gt;keys&lt;/code&gt; array.&lt;/p&gt;

&lt;p&gt;The size hint here is just an optimization. The function could have also called just &lt;code&gt;array_init(return_value)&lt;/code&gt;, in which case PHP would have to do multiple resizes as more and more elements are added to the array. By specifying an explicit size, PHP allocates the right amount of memory from the start.&lt;/p&gt;

&lt;p&gt;After the return array initialization the function loops through the &lt;code&gt;keys&lt;/code&gt; array using a &lt;code&gt;while&lt;/code&gt; loop with roughly this structure:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;zend_hash_internal_pointer_reset_ex(Z_ARRVAL_P(keys), &amp;amp;pos);
while (zend_hash_get_current_data_ex(Z_ARRVAL_P(keys), (void **)&amp;amp;entry, &amp;amp;pos) == SUCCESS) {
    // some code

    zend_hash_move_forward_ex(Z_ARRVAL_P(keys), &amp;amp;pos);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This can be translated to PHP code easily:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;reset($keys);
while (null !== $entry = current($keys)) {
    // some code

    next($keys);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Which is the same as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;foreach ($keys as $entry) {
    // some code
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The only real difference is that the C iteration doesn&amp;#8217;t use the internal array pointer, but uses it&amp;#8217;s own &lt;code&gt;pos&lt;/code&gt; variable to store the current position.&lt;/p&gt;

&lt;p&gt;The code within the loop has two branches: One for integer keys and one for other keys. The integer key branch contains only two lines:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;zval_add_ref(&amp;amp;val);
zend_hash_index_update(Z_ARRVAL_P(return_value), Z_LVAL_PP(entry), &amp;amp;val, sizeof(zval *), NULL);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is pretty straightforward: First the refcount of the value is increased (adding the value to the hash table means adding another reference to it) and then the value is actually inserted into the hash table. The arguments of the &lt;code&gt;zend_hash_index_update&lt;/code&gt; macro are the hash table to update &lt;code&gt;Z_ARRVAL_P(return_value)&lt;/code&gt;, the integer index &lt;code&gt;Z_LVAL_PP(entry)&lt;/code&gt;, the value &lt;code&gt;&amp;amp;val&lt;/code&gt;, the size of the value &lt;code&gt;sizeof(zval *)&lt;/code&gt; and the destination pointer (which we don&amp;#8217;t care about, thus &lt;code&gt;NULL&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The non-integer-key branch is slightly more complicated:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;zval key, *key_ptr = *entry;

if (Z_TYPE_PP(entry) != IS_STRING) {
    key = **entry;
    zval_copy_ctor(&amp;amp;key);
    convert_to_string(&amp;amp;key);
    key_ptr = &amp;amp;key;
}

zval_add_ref(&amp;amp;val);
zend_symtable_update(Z_ARRVAL_P(return_value), Z_STRVAL_P(key_ptr), Z_STRLEN_P(key_ptr) + 1, &amp;amp;val, sizeof(zval *), NULL);

if (key_ptr != *entry) {
    zval_dtor(&amp;amp;key);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;First the key is converted to a string (unless it already is one) using &lt;code&gt;convert_to_string&lt;/code&gt;. But before this can be done the &lt;code&gt;entry&lt;/code&gt; has to be copied into a new &lt;code&gt;key&lt;/code&gt; variable. The &lt;code&gt;key = **entry&lt;/code&gt; line takes care of that. Additionally &lt;code&gt;zval_copy_ctor&lt;/code&gt; has to be called, otherwise complex structures (like strings or arrays) won&amp;#8217;t be copied correctly.&lt;/p&gt;

&lt;p&gt;The copy is necessary to ensure that the cast doesn&amp;#8217;t change the original array. Without the copy the cast would modify not only our local variable, but also the element in the &lt;code&gt;keys&lt;/code&gt; array (which obviously would be quite unexpected to the user).&lt;/p&gt;

&lt;p&gt;Obviously the copy has to be removed again after the loop, which is what the &lt;code&gt;zval_dtor(&amp;amp;key)&lt;/code&gt; line does. The difference between &lt;code&gt;zval_ptr_dtor&lt;/code&gt; and &lt;code&gt;zval_dtor&lt;/code&gt; is that &lt;code&gt;zval_ptr_dtor&lt;/code&gt; only destroys the zval if the refcount reaches 0, whereas &lt;code&gt;zval_dtor&lt;/code&gt; destroys it always, regardless of the refcount. That&amp;#8217;s why you&amp;#8217;ll find &lt;code&gt;zval_ptr_dtor&lt;/code&gt; used with &amp;#8220;normal&amp;#8221; values and &lt;code&gt;zval_dtor&lt;/code&gt; with temporary variables, which aren&amp;#8217;t used anywhere else anyways. Also &lt;code&gt;zval_ptr_dtor&lt;/code&gt; frees the zval after destroying it, while &lt;code&gt;zval_dtor&lt;/code&gt; does not. As we never &lt;code&gt;malloc()&lt;/code&gt;d anything, we also don&amp;#8217;t have to &lt;code&gt;free()&lt;/code&gt;, so &lt;code&gt;zval_dtor&lt;/code&gt; is the right choice in this respect too.&lt;/p&gt;

&lt;p&gt;Now, let&amp;#8217;s look at the last two lines left (the important ones ^^):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;zval_add_ref(&amp;amp;val);
zend_symtable_update(Z_ARRVAL_P(return_value), Z_STRVAL_P(key_ptr), Z_STRLEN_P(key_ptr) + 1, &amp;amp;val, sizeof(zval *), NULL);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Those are very similar to what was done in the integer-key branch. The difference is that now &lt;code&gt;zend_symtable_update&lt;/code&gt; is called instead of &lt;code&gt;zend_hash_index_update&lt;/code&gt; and the string key and it&amp;#8217;s length are passed in.&lt;/p&gt;

&lt;h2 id='the_symtable'&gt;The Symtable&lt;/h2&gt;

&lt;p&gt;The &amp;#8220;normal&amp;#8221; function for inserting string keys into a hash table is &lt;code&gt;zend_hash_update&lt;/code&gt;, but here &lt;code&gt;zend_symtable_update&lt;/code&gt; is used instead. What&amp;#8217;s the difference?&lt;/p&gt;

&lt;p&gt;A symtable basically is a special type of hash table, which is used for arrays. The difference to the ordinary hash table is how it handles numeric string keys: In a symtable the keys &lt;code&gt;&amp;quot;123&amp;quot;&lt;/code&gt; and &lt;code&gt;123&lt;/code&gt; are considered identical. So if you store a value in &lt;code&gt;$array[&amp;quot;123&amp;quot;]&lt;/code&gt;, you can then retrieve it using &lt;code&gt;$array[123]&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The underlying implementation could use two ways: Either save both &lt;code&gt;123&lt;/code&gt; and &lt;code&gt;&amp;quot;123&amp;quot;&lt;/code&gt; using the &lt;code&gt;&amp;quot;123&amp;quot;&lt;/code&gt; key or save both using the &lt;code&gt;123&lt;/code&gt; key. PHP obviously chooses the latter option (as integers are smaller and faster than strings).&lt;/p&gt;

&lt;p&gt;You can have a little bit of fun with symtables, if you manage to somehow insert the &lt;code&gt;&amp;quot;123&amp;quot;&lt;/code&gt; key without it being cast to &lt;code&gt;123&lt;/code&gt;. One way is to exploit the array to object cast:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$obj&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='k'&gt;stdClass&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$obj&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;{&lt;/span&gt;&lt;span class='mi'&gt;123&lt;/span&gt;&lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;foo&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$arr&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$obj&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$arr&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;123&lt;/span&gt;&lt;span class='p'&gt;]);&lt;/span&gt;   &lt;span class='c1'&gt;// Undefined offset: 123&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$arr&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;123&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;]);&lt;/span&gt; &lt;span class='c1'&gt;// Undefined offset: 123&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Object properties are always saved under string keys, even if they are numbers. So the &lt;code&gt;$obj-&amp;gt;{123} = &amp;#39;foo&amp;#39;&lt;/code&gt; line actually saves &lt;code&gt;&amp;quot;foo&amp;quot;&lt;/code&gt; under the &lt;code&gt;&amp;quot;123&amp;quot;&lt;/code&gt; index, not the &lt;code&gt;123&lt;/code&gt; index. When doing the array cast this is not changed. But as both &lt;code&gt;$arr[123]&lt;/code&gt; and &lt;code&gt;$arr[&amp;quot;123&amp;quot;]&lt;/code&gt; try to access the &lt;code&gt;123&lt;/code&gt; index (not the &lt;code&gt;&amp;quot;123&amp;quot;&lt;/code&gt; index that actually exists), both throw an error. So, congratulations, you&amp;#8217;ve created a hidden array element!&lt;/p&gt;

&lt;h2 id='in_the_next_part'&gt;In the next part&lt;/h2&gt;

&lt;p&gt;The next part will again be published on &lt;a href='http://blog.ircmaxell.com/'&gt;ircmaxell&amp;#8217;s blog&lt;/a&gt;. It will analyze how objects and classes work internally.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>PHP's Source Code for PHP Developers - Part 3 - Variables</title>
      <link>http://nikic.github.com/2012/03/21/PHPs-Source-Code-For-PHP-Developers-Part-3-Variables.html</link>
      <pubDate>Wed, 21 Mar 2012 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/03/21/PHPs-Source-Code-For-PHP-Developers-Part-3-Variables</guid>
      <description>&lt;p&gt;The &lt;a href='http://blog.ircmaxell.com/2012/03/phps-source-code-for-php-developers_21.html'&gt;third part of the PHP&amp;#8217;s Source Code for PHP Developers series&lt;/a&gt; can be found on &lt;a href='http://blog.ircmaxell.com/2012/03/phps-source-code-for-php-developers_21.html'&gt;ircmaxell&amp;#8217;s blog&lt;/a&gt;. It introduces how PHP values are internally represented and how they are used in the source code.&lt;/p&gt;

&lt;p&gt;The fourth part, which will again be published on this blog, will cover how arrays are implemented in PHP.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Understanding PHP's internal function definitions (PHP's Source Code for PHP Developers - Part 2)</title>
      <link>http://nikic.github.com/2012/03/16/Understanding-PHPs-internal-function-definitions.html</link>
      <pubDate>Fri, 16 Mar 2012 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/03/16/Understanding-PHPs-internal-function-definitions</guid>
      <description>&lt;p&gt;Welcome to the second part of the &amp;#8220;PHP&amp;#8217;s Source Code For PHP Developers&amp;#8221; series.&lt;/p&gt;

&lt;p&gt;In the &lt;a href='http://blog.ircmaxell.com/2012/03/phps-source-code-for-php-developers.html'&gt;previous part&lt;/a&gt; ircmaxell explained where you can find the PHP source code and how it is basically structured and also gave a small introduction to C (as that&amp;#8217;s the language PHP is written in). If you missed that post, you probably should read it before starting with this one.&lt;/p&gt;

&lt;p&gt;What we&amp;#8217;ll cover in this article is locating the definitions of internal functions in the PHP codebase, as well as understanding them.&lt;/p&gt;

&lt;h2 id='how_to_find_function_definitions'&gt;How to find function definitions&lt;/h2&gt;

&lt;p&gt;For a start, let&amp;#8217;s try to find out how the &lt;code&gt;strpos&lt;/code&gt; function is defined.&lt;/p&gt;

&lt;p&gt;The first thing to try, is to go to the &lt;a href='http://lxr.php.net/xref/PHP_5_4/'&gt;PHP 5.4 source code root&lt;/a&gt; and type &lt;code&gt;strpos&lt;/code&gt; into the search box at the top of the page. &lt;a href='http://lxr.php.net/opengrok/search?q=strpos&amp;amp;project=PHP_5_4'&gt;The result&lt;/a&gt; will be a huge listing of &lt;code&gt;strpos&lt;/code&gt; occurrences in the PHP source code.&lt;/p&gt;

&lt;p&gt;As this doesn&amp;#8217;t really help us much, we use a little trick: Instead of searching for just &lt;code&gt;strpos&lt;/code&gt;, we &lt;a href='http://lxr.php.net/opengrok/search?q=%22PHP_FUNCTION+strpos%22&amp;amp;project=PHP_5_4'&gt;search for &lt;code&gt;&amp;quot;PHP_FUNCTION strpos&amp;quot;&lt;/code&gt;&lt;/a&gt; instead (don&amp;#8217;t forget the quotes, they are important).&lt;/p&gt;

&lt;p&gt;Now we are left with only two entries:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;/PHP_5_4/ext/standard/
    php_string.h 48   PHP_FUNCTION(strpos);
    string.c     1789 PHP_FUNCTION(strpos)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;First thing to notice is that both occurrences are in the &lt;code&gt;ext/standard&lt;/code&gt; folder. This is exactly where one would expect to find them, as the &lt;code&gt;strpos&lt;/code&gt; function (together with pretty much all other string, array and file functions) is part of the &lt;code&gt;standard&lt;/code&gt; extension.&lt;/p&gt;

&lt;p&gt;Now open both links in new tabs and see what code hides behind them.&lt;/p&gt;

&lt;p&gt;You&amp;#8217;ll find that the first link leads you to the &lt;a href='http://lxr.php.net/opengrok/xref/PHP_5_4/ext/standard/php_string.h#48'&gt;&lt;code&gt;php_string.h&lt;/code&gt; file&lt;/a&gt;, which is full of code looking like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;// ...
PHP_FUNCTION(strpos);
PHP_FUNCTION(stripos);
PHP_FUNCTION(strrpos);
PHP_FUNCTION(strripos);
PHP_FUNCTION(strrchr);
PHP_FUNCTION(substr);
// ...&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is exactly how a typical header file (a file ending in &lt;code&gt;.h&lt;/code&gt;) looks like: A plain list of functions which are defined elsewhere. We aren&amp;#8217;t really interested in this, as we already know what we&amp;#8217;re looking for.&lt;/p&gt;

&lt;p&gt;The second link is much more interesting: It leads to the &lt;a href='http://lxr.php.net/opengrok/xref/PHP_5_4/ext/standard/string.c#1789'&gt;&lt;code&gt;string.c&lt;/code&gt; file&lt;/a&gt;, which contains the actual source code of the function.&lt;/p&gt;

&lt;p&gt;Before I&amp;#8217;ll walk you through the code step by step, I&amp;#8217;d recommend you to try and understand the function by yourself. It&amp;#8217;s a really simple function and most things should be clear even if you don&amp;#8217;t know the exact details.&lt;/p&gt;

&lt;h2 id='the_skeleton_of_a_php_function'&gt;The skeleton of a PHP function&lt;/h2&gt;

&lt;p&gt;All PHP functions share the same basic structure. At the top there are a few variable declarations, then there is a &lt;code&gt;zend_parse_parameters&lt;/code&gt; call, then comes the main logic, with &lt;code&gt;RETURN_***&lt;/code&gt; and &lt;code&gt;php_error_docref&lt;/code&gt; calls intermixed.&lt;/p&gt;

&lt;p&gt;So, let&amp;#8217;s start with the variable declarations:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;zval *needle;
char *haystack;
char *found = NULL;
char  needle_char[2];
long  offset = 0;
int   haystack_len;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first line declares &lt;code&gt;needle&lt;/code&gt; as being a pointer to a &lt;code&gt;zval&lt;/code&gt;. A &lt;code&gt;zval&lt;/code&gt; is PHP&amp;#8217;s internal representation of an arbitrary PHP value. How exactly it looks will be the subject of the next post.&lt;/p&gt;

&lt;p&gt;The second line declares &lt;code&gt;haystack&lt;/code&gt; as a pointer to a character. At this point you&amp;#8217;ll have to remember that in C, arrays are represented by pointers to their first value. I.e. the &lt;code&gt;haystack&lt;/code&gt; will point to the first character of the &lt;code&gt;$haystack&lt;/code&gt; string you passed in. Then &lt;code&gt;haystack + 1&lt;/code&gt; will point to the second character, &lt;code&gt;haystack + 2&lt;/code&gt; to the third, and so on. So one could read in the whole string by always incrementing the pointer by one.&lt;/p&gt;

&lt;p&gt;The problem arising here is that PHP has to know when the string ends. Otherwise it would always keep incrementing the pointer without ever stopping. In order to deal with this, PHP also stores an explicit length, here in the &lt;code&gt;haystack_len&lt;/code&gt; variable.&lt;/p&gt;

&lt;p&gt;The last declaration of interest to us at this point is the &lt;code&gt;offset&lt;/code&gt; variable, which will be used to store the third parameter of the function: the offset to start searching at. It is declared as a &lt;code&gt;long&lt;/code&gt;, which is an integer datatype, just like &lt;code&gt;int&lt;/code&gt;. The difference between those two is not of importance here, but you should know that PHP integers are stored in &lt;code&gt;long&lt;/code&gt;s and string lengths are stored in &lt;code&gt;int&lt;/code&gt;s.&lt;/p&gt;

&lt;p&gt;Now let&amp;#8217;s look at the next three lines:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, &amp;quot;sz|l&amp;quot;, &amp;amp;haystack, &amp;amp;haystack_len, &amp;amp;needle, &amp;amp;offset) == FAILURE) {
    return;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What these lines basically do, is take the parameters that were passed to the function and put them into the variables which were declared above.&lt;/p&gt;

&lt;p&gt;The first argument to the function is the number of arguments passed. This number is provided by the &lt;code&gt;ZEND_NUM_ARGS()&lt;/code&gt; macro.&lt;/p&gt;

&lt;p&gt;The next argument is the &lt;code&gt;TSRMLS_CC&lt;/code&gt; macro, which is kind of an idiosyncrasy of PHP. You&amp;#8217;ll find this strange macro scattered across pretty much the whole PHP code base. It is part of the Thread Safe Resource Mananger (TSRM), which ensures that PHP doesn&amp;#8217;t mix up variables between multiple threads. This is unimportant to us, so whenever you see &lt;code&gt;TSRMLS_CC&lt;/code&gt; (or &lt;code&gt;TSRMLS_DC&lt;/code&gt;) in the code, just ignore it. (A strangeness which you might have noticed, is that there is no comma before this &amp;#8220;argument&amp;#8221;. This has to do with the fact that depending on whether or not you are using a thread-safe build, the macro will either evaluate to nothing or to &lt;code&gt;, tsrm_ls&lt;/code&gt;. So basically the comma is part of the macro.)&lt;/p&gt;

&lt;p&gt;Now comes the important stuff: The &lt;code&gt;&amp;quot;sz|l&amp;quot;&lt;/code&gt; string specifies which parameters the function accepts:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;s  // first parameter is a *s*tring
z  // second parameter is a *z*val (an arbitrary value)
|  // the following parameters (here just one) are optional
l  // third parameter is a *l*ong (an integer)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are more type specifiers than &lt;code&gt;s&lt;/code&gt;, &lt;code&gt;z&lt;/code&gt; and &lt;code&gt;l&lt;/code&gt;, but most should be clear from the character. For example &lt;code&gt;b&lt;/code&gt; is a &lt;strong&gt;b&lt;/strong&gt;oolean, &lt;code&gt;d&lt;/code&gt; is a &lt;strong&gt;d&lt;/strong&gt;ouble (floating point number), &lt;code&gt;a&lt;/code&gt; is an &lt;strong&gt;a&lt;/strong&gt;rray, &lt;code&gt;f&lt;/code&gt; is a callback (&lt;strong&gt;f&lt;/strong&gt;unction) and &lt;code&gt;o&lt;/code&gt; is an &lt;strong&gt;o&lt;/strong&gt;bject.&lt;/p&gt;

&lt;p&gt;The remaining arguments &lt;code&gt;&amp;amp;haystack, &amp;amp;haystack_len, &amp;amp;needle, &amp;amp;offset&lt;/code&gt; specify the variables to put the values of the arguments into. As you can see, they are all passed by reference (&lt;code&gt;&amp;amp;&lt;/code&gt;), which means that not the variables themselves are passed, but pointers to them.&lt;/p&gt;

&lt;p&gt;After this call &lt;code&gt;haystack&lt;/code&gt; will contain the haystack string, &lt;code&gt;haystack_len&lt;/code&gt; the length of that string, &lt;code&gt;needle&lt;/code&gt; the needle value and &lt;code&gt;offset&lt;/code&gt; the starting offset.&lt;/p&gt;

&lt;p&gt;Additionally the function is checked for &lt;code&gt;FAILURE&lt;/code&gt; (which happens if you try to pass invalid arguments to the function, e.g an array to a string parameter). In this case &lt;code&gt;zend_parse_parameters&lt;/code&gt; will throw a warning and the code of the function just &lt;code&gt;return&lt;/code&gt;s (which will eventually return &lt;code&gt;null&lt;/code&gt; to the userland PHP code).&lt;/p&gt;

&lt;p&gt;So after the parameters are parsed, the main function body starts:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;if (offset &amp;lt; 0 || offset &amp;gt; haystack_len) {
    php_error_docref(NULL TSRMLS_CC, E_WARNING, &amp;quot;Offset not contained in string&amp;quot;);
    RETURN_FALSE;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What this code does is pretty obvious. If the offset is out of bounds an &lt;code&gt;E_WARNING&lt;/code&gt; level error is thrown through &lt;code&gt;php_error_docref&lt;/code&gt; and then false is returned using the &lt;code&gt;RETURN_FALSE&lt;/code&gt; macro.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;php_error_docref&lt;/code&gt; is the error function you&amp;#8217;ll mainly find in extensions (i.e. the &lt;code&gt;ext&lt;/code&gt; folder). The name comes from the fact that it emits a reference to the documentation in the error message (you know, the one that never works&amp;#8230;). Additionally there is the &lt;code&gt;zend_error&lt;/code&gt; function, which is mainly used by the Zend Engine, but also occurs in extension code from time to time.&lt;/p&gt;

&lt;p&gt;Both functions use &lt;a href='http://de2.php.net/sprintf'&gt;&lt;code&gt;sprintf&lt;/code&gt;&lt;/a&gt;-like formatting, thus error messages can contain placeholders, which are then filled using the following arguments. Here is an example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;php_error_docref(NULL TSRMLS_CC, E_WARNING, &amp;quot;Failed to write %d bytes to %s&amp;quot;, Z_STRLEN_PP(tmp), filename);
// %d is filled with Z_STRLEN_PP(tmp)
// %s is filled with filename&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Let&amp;#8217;s proceed in the code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;if (Z_TYPE_P(needle) == IS_STRING) {
    if (!Z_STRLEN_P(needle)) {
        php_error_docref(NULL TSRMLS_CC, E_WARNING, &amp;quot;Empty delimiter&amp;quot;);
        RETURN_FALSE;
    }

    found = php_memnstr(haystack + offset,
                        Z_STRVAL_P(needle),
                        Z_STRLEN_P(needle),
                        haystack + haystack_len);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first five lines should be clear: This branch is only executed if the &lt;code&gt;needle&lt;/code&gt; is a string and an error is thrown if it is empty. Then comes the interesting part: &lt;code&gt;php_memnstr&lt;/code&gt; is called, which is the function doing the main work. As always you can click on the function name to see its source code.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;php_memnstr&lt;/code&gt; returns the pointer to the first occurrence of the needle in the haystack (that&amp;#8217;s why the &lt;code&gt;found&lt;/code&gt; variable is declared as &lt;code&gt;char *&lt;/code&gt;, i.e. a pointer to character). From this the offset can be easily computed by subtracting the two pointers, as can be seen at the end of the function:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt; RETURN_LONG(found - haystack);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Finally, let&amp;#8217;s look at the branch which is taken when the &lt;code&gt;needle&lt;/code&gt; is not a string:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;else {
    if (php_needle_char(needle, needle_char TSRMLS_CC) != SUCCESS) {
        RETURN_FALSE;
    }
    needle_char[1] = 0;

    found = php_memnstr(haystack + offset,
                        needle_char,
                        1,
                        haystack + haystack_len);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I&amp;#8217;ll just quote what this does from the manual: &amp;#8220;If needle is not a string, it is converted to an integer and applied as the ordinal value of a character.&amp;#8221; This basically means that instead of writing &lt;code&gt;strpos($str, &amp;#39;A&amp;#39;)&lt;/code&gt; you could also write &lt;code&gt;strpos($str, 65)&lt;/code&gt;, because the ordinal value of &lt;code&gt;A&lt;/code&gt; is &lt;code&gt;65&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you look up at the variable declarations, you&amp;#8217;ll see that &lt;code&gt;needle_char&lt;/code&gt; is declared as &lt;code&gt;char needle_char[2]&lt;/code&gt;, i.e. a string with two characters. &lt;code&gt;php_needle_char&lt;/code&gt; will put the actual character (in our example the &lt;code&gt;A&lt;/code&gt;) into &lt;code&gt;needle_char[0]&lt;/code&gt;. Then the &lt;code&gt;strpos&lt;/code&gt; code will set &lt;code&gt;needle_char[1]&lt;/code&gt; to &lt;code&gt;0&lt;/code&gt;. The reason behind this is that in C, strings are zero-terminated, i.e. the last character is set to NUL (the character with the ordinal value &lt;code&gt;0&lt;/code&gt;). In the context of PHP this doesn&amp;#8217;t make much sense, as PHP stores an explicit length for all strings (so it does not need zero-termination to find the end of a string), but this still is done in order to ensure compatibility with the C functions used internally by PHP.&lt;/p&gt;

&lt;h2 id='zend_functions'&gt;Zend functions&lt;/h2&gt;

&lt;p&gt;I&amp;#8217;m getting tired of &lt;code&gt;strpos&lt;/code&gt;, so lets try to find another function: &lt;code&gt;strlen&lt;/code&gt;. We&amp;#8217;ll do this using our usual approach:&lt;/p&gt;

&lt;p&gt;Starting from the &lt;a href='http://lxr.php.net/xref/PHP_5_4/'&gt;PHP 5.4 source code root&lt;/a&gt; try to search for &lt;code&gt;strlen&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You&amp;#8217;ll see lots of unrelated uses of the function, so instead search for &lt;code&gt;&amp;quot;PHP_FUNCTION strlen&amp;quot;&lt;/code&gt;. While doing so, you&amp;#8217;ll notice something strange though: There won&amp;#8217;t be any results.&lt;/p&gt;

&lt;p&gt;The reason is that &lt;code&gt;strlen&lt;/code&gt; is one of the few functions, which is not defined by an extension, but by the Zend Engine itself. In such cases the function is not defined as &lt;code&gt;PHP_FUNCTION(strlen)&lt;/code&gt;, but as &lt;code&gt;ZEND_FUNCTION(strlen)&lt;/code&gt;. Thus we also have to &lt;a href='http://lxr.php.net/opengrok/search?q=%22ZEND_FUNCTION+strlen%22&amp;amp;project=PHP_5_4'&gt;search for &lt;code&gt;&amp;quot;ZEND_FUNCTION strlen&amp;quot;&lt;/code&gt;&lt;/a&gt; instead.&lt;/p&gt;

&lt;p&gt;As we already know, we have to click on the entry without a semicolon &lt;code&gt;;&lt;/code&gt; at the end to get to the source code. This leads us to the following definition in &lt;a href='http://lxr.php.net/opengrok/xref/PHP_5_4/Zend/zend_builtin_functions.c#478'&gt;&lt;code&gt;Zend/zend_builtin_functions.c&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ZEND_FUNCTION(strlen)
{
    char *s1;
    int s1_len;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, &amp;quot;s&amp;quot;, &amp;amp;s1, &amp;amp;s1_len) == FAILURE) {
        return;
    }

    RETVAL_LONG(s1_len);
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I don&amp;#8217;t think that I have to further comment on this, as the function is so simple.&lt;/p&gt;

&lt;h2 id='methods'&gt;Methods&lt;/h2&gt;

&lt;p&gt;We&amp;#8217;ll cover how classes and objects work in more detail in a different post, but as a small peek ahead: You can search for class methods by typing &lt;code&gt;ClassName::methodName&lt;/code&gt; into the search. As an example, try to &lt;a href='http://lxr.php.net/opengrok/search?q=SplFixedArray%3A%3AgetSize&amp;amp;project=PHP_5_4'&gt;search for &lt;code&gt;SplFixedArray::getSize&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id='in_the_next_part'&gt;In the next part&lt;/h2&gt;

&lt;p&gt;The next part will again be published on &lt;a href='http://blog.ircmaxell.com/'&gt;ircmaxell&amp;#8217;s blog&lt;/a&gt;. It will cover what &lt;code&gt;zval&lt;/code&gt;s are, how they work and how they are used in the source code (all those &lt;code&gt;Z_***&lt;/code&gt; macros&amp;#8230;)&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Scalar type hinting is harder than you think</title>
      <link>http://nikic.github.com/2012/03/06/Scalar-type-hinting-is-harder-than-you-think.html</link>
      <pubDate>Tue, 06 Mar 2012 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/03/06/Scalar-type-hinting-is-harder-than-you-think</guid>
      <description>&lt;p&gt;One of the features originally planned for PHP 5.4 was scalar type hinting. But as you know, they weren&amp;#8217;t included in the release.&lt;/p&gt;

&lt;p&gt;Recently the topic has come up again on the mailing list and there has been a hell lot of discussion about it. Yesterday ircmaxell published a &lt;a href='http://blog.ircmaxell.com/2012/03/parameter-type-casting-in-php.html'&gt;blog post about his particular proposals&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The reactions on &lt;a href='http://www.reddit.com/r/PHP/comments/qiniv/parameter_type_casting_in_php/'&gt;reddit&lt;/a&gt; were mixed. On one hand it is clear that people do really want scalar type hints, on the other hand they didn&amp;#8217;t seem to like that particular proposal.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://www.reddit.com/r/PHP/comments/qiniv/parameter_type_casting_in_php/c3xyzc5'&gt;One comment&lt;/a&gt; particularly caught my interest:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why can&amp;#8217;t the PHP dev&amp;#8217;s just get this thing right? Why must they always choose the worst, more useless, most convoluted way to implement every new feature! It&amp;#8217;s disheartening. Especially when there aren&amp;#8217;t good backwards-compatibility reasons for doing it wrong.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In a way, this is a pretty important comment: It shows what people think about how PHP is being developed. They think that it&amp;#8217;s a bunch of people that somehow, magically, manage to always do the wrong decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That&amp;#8217;s not how it works.&lt;/strong&gt; Really. You see, most people, when confronted with scalar type hinting, think &amp;#8220;Hey, this is so simple, just do XYZ&amp;#8221;, and then wonder why PHP is having such a hard time implementing this most trivial of all features.&lt;/p&gt;

&lt;p&gt;The thing is: It&amp;#8217;s just not that simple. Most proposals seem so easy and straightforward on first sight, but when it comes to the details they are not.&lt;/p&gt;

&lt;p&gt;In the following I want to introduce you to some of the various proposals that were made in the past and the problems associated with them. I hope this way people will get a little bit more insight into why this is so hard.&lt;/p&gt;

&lt;h2 id='strict_type_hinting'&gt;Strict type hinting&lt;/h2&gt;

&lt;p&gt;I&amp;#8217;ll start with the proposal that I personally dislike most: Strict type hinting, i.e. allowing only the hinted type to be passed and not any of the types that PHP would normally consider equivalent. See this example:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='cm'&gt;/* ... */&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;   &lt;span class='c1'&gt;// works&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mf'&gt;1.0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// fatal error: int expected, float given&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// fatal error: int expected, string given&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;I think it is evident that this is not an option. One of PHP&amp;#8217;s greatest strengths is being a weakly typed language and this proposal would turn it into a strictly typed one. This goes against the PHP philosophy and also differs from how pretty much everything else works in PHP.&lt;/p&gt;

&lt;h2 id='unenforced_type_hinting'&gt;Unenforced type hinting&lt;/h2&gt;

&lt;p&gt;Another proposal that was made, is type hinting that isn&amp;#8217;t enforced by the engine. WTF? What would this be good for? Basically, it would work just like doc comments (which aren&amp;#8217;t enforced either ^^), but with nicer syntax.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='cm'&gt;/* ... */&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;          &lt;span class='c1'&gt;// works&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mf'&gt;1.0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;        &lt;span class='c1'&gt;// works&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;        &lt;span class='c1'&gt;// works&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;123&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;  &lt;span class='c1'&gt;// works&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$iAmObject&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// works&lt;/span&gt;
&lt;span class='c1'&gt;// everything works...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;I dislike this proposal, too, for obvious reasons. For me it just doesn&amp;#8217;t make any sense to have type hints that are ignored. Doc comments do that already well enough&amp;#8230;&lt;/p&gt;

&lt;h2 id='casting_weak_type_hinting'&gt;Casting weak type hinting&lt;/h2&gt;

&lt;p&gt;A proposal that came up recently (this is the one &lt;a href='http://blog.ircmaxell.com/2012/03/parameter-type-casting-in-php.html'&gt;introduced by ircmaxell&lt;/a&gt;) is type hinting based on casts.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;((&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;       &lt;span class='c1'&gt;// int(1)&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mf'&gt;1.0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;     &lt;span class='c1'&gt;// int(1)&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;     &lt;span class='c1'&gt;// int(1)&lt;/span&gt;

&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mf'&gt;1.5&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;     &lt;span class='c1'&gt;// int(1)&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt; &lt;span class='c1'&gt;// int(0)&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;hi&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;    &lt;span class='c1'&gt;// int(0)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This is one of the more interesting proposals, for several reasons:&lt;/p&gt;

&lt;p&gt;Firstly, it is the only proposal that can be implemented without breaking backwards compatibility at all. All other proposals would at least break existing type hints on classes called &amp;#8220;Int&amp;#8221; or &amp;#8220;String&amp;#8221; (believe it or not, but some people really do use classes like this&amp;#8230;)&lt;/p&gt;

&lt;p&gt;Secondly, the syntax makes very clear that it will cast the value. It also reuses PHP&amp;#8217;s normal cast semantics, thus making it consistent.&lt;/p&gt;

&lt;p&gt;But it also has some not insignificant issues:&lt;/p&gt;

&lt;p&gt;The first one is that the syntax is simply ugly. Sure, you don&amp;#8217;t have to tell me, that&amp;#8217;s subjective and stuff like that, but honestly, this is an important aspect to consider. Scalar type hinting will be used a lot and we shouldn&amp;#8217;t make it ugly.&lt;/p&gt;

&lt;p&gt;The second, by far more important issue is that PHP&amp;#8217;s cast semantics simply suck. In the above example, the first three casts seem logical, but the three last ones are clearly broken. At least I don&amp;#8217;t want &lt;code&gt;&amp;quot;foobar&amp;quot;&lt;/code&gt; or &lt;code&gt;array()&lt;/code&gt; to be valid arguments to an integer parameter. It just doesn&amp;#8217;t make sense. That&amp;#8217;s why this proposal would go hand in hand with a change of implicit cast semantics: Whenever such a &amp;#8220;strange&amp;#8221; cast (one with data loss) occurs, it should throw a notice. This again would be a controversial change, as it would also change stuff in other places (which is probably a good thing though ^^). Also I would argue that just a notice is not enough and it should instead throw a recoverable fatal error.&lt;/p&gt;

&lt;p&gt;Also I&amp;#8217;m not really sure as to whether we really want incoming parameters to be cast, instead of just being validated. One argument against it is, that this approach is somewhat different from the existing &lt;code&gt;array&lt;/code&gt;, &lt;code&gt;callable&lt;/code&gt; and &lt;code&gt;Class&lt;/code&gt; type hints. Especially if you consider that you would now have both an &lt;code&gt;(array)&lt;/code&gt; and an &lt;code&gt;array&lt;/code&gt; type hint, the issue might become more clear: When would you use one over another? When would you want the argument to be cast and when just checked? Additionally the casts would prevent type hinted parameters to be used by reference in any reasonable manner.&lt;/p&gt;

&lt;h2 id='strict_weak_type_hinting'&gt;Strict weak type hinting&lt;/h2&gt;

&lt;p&gt;This leads us to another possibility, which is also my favorite: Doing weak type hints, but with stricter input validation (and without casts).&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;int&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;       &lt;span class='c1'&gt;// int(1)&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mf'&gt;1.0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;     &lt;span class='c1'&gt;// float(1.0)&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;     &lt;span class='c1'&gt;// string(1) &amp;quot;1&amp;quot;&lt;/span&gt;

&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mf'&gt;1.5&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;     &lt;span class='c1'&gt;// fatal error: int expected, float given&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt; &lt;span class='c1'&gt;// fatal error: int expected, array given&lt;/span&gt;
&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;hi&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;    &lt;span class='c1'&gt;// fatal error: int expected, string given&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;I like this proposal most, for several reasons: Firstly, it uses the &amp;#8220;normal&amp;#8221; type hinting syntax, not the strange cast syntax. Secondly it has stricter validation semantics as it only lets those values through which are representable in the hinted type without data loss. And thirdly it does not do casts, which is in line with previous type hints and also allows references.&lt;/p&gt;

&lt;p&gt;But it obviously also comes with problems: Again, this won&amp;#8217;t be backwards compatible as it at least will break existing &lt;code&gt;Int&lt;/code&gt; or &lt;code&gt;String&lt;/code&gt; class type hints. Additionally, depending on the exact implementation, it would also make stuff like &lt;code&gt;int&lt;/code&gt; and &lt;code&gt;string&lt;/code&gt; reserved keywords, thus (maybe) breaking even more stuff. Realistically though I&amp;#8217;d say that this will only affect a small fraction of scripts.&lt;/p&gt;

&lt;p&gt;By the way, this proposal would be very similar to how parameters for internal functions are parsed. E.g. an int argument of an internal function will not accept &lt;code&gt;&amp;quot;foobar&amp;quot;&lt;/code&gt; and instead throw a warning.&lt;/p&gt;

&lt;h2 id='boxing_based_type_hinting'&gt;Boxing based type hinting&lt;/h2&gt;

&lt;p&gt;Another very interesting idea that also came from ircmaxell is type hinting using boxing. Basically this would add magic methods for casting objects to scalars and the other way around. Something like this (just example names):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public function __toScalar()
public static function __fromScalar($value)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now see how this could be useful:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Int&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;false&lt;/span&gt; &lt;span class='o'&gt;===&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;value&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;filter_var&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;FILTER_VALIDATE_INT&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Exception&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;Malformed integer&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__toScalar&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__fromScalar&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='k'&gt;static&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;10&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// foo() is expecting an Int, so call Int::__fromScalar(10),&lt;/span&gt;
         &lt;span class='c1'&gt;// which returns an Int(10). This is then passed to foo().&lt;/span&gt;

&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;hi&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// Int::__fromScalar(&amp;quot;hi&amp;quot;) is called, which tries to create&lt;/span&gt;
           &lt;span class='c1'&gt;// an Int(&amp;quot;hi&amp;quot;), but this throws an Exception.&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Int&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nb'&gt;str_repeat&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;*&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// $i is Int(10) here, but str_repeat expects an integer,&lt;/span&gt;
                                &lt;span class='c1'&gt;// thus call $i-&amp;gt;__toScalar() [or -&amp;gt;__toInt() or whatever],&lt;/span&gt;
                                &lt;span class='c1'&gt;// which returns 10. This is then passed to str_repeat().&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This basically allows you to implement type hinting in userland code. It also allows to create strict type hinting:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;StrictInt&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nb'&gt;is_int&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Exception&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;Not an integer&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;value&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__toScalar&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__fromScalar&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='k'&gt;static&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;StrictInt&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='cm'&gt;/* ... */&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;10&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// foo() is expecting a StrictInt, so call StrictInt::__fromScalar(10),&lt;/span&gt;
         &lt;span class='c1'&gt;// which returns a StrictInt(10). This is then passed to foo().&lt;/span&gt;

&lt;span class='nx'&gt;foo&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;10&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// StrictInt::__fromScalar(&amp;quot;10&amp;quot;) is called, which tries to create&lt;/span&gt;
           &lt;span class='c1'&gt;// StrictInt(&amp;quot;10&amp;quot;), but this raises an exception.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As you can see, this proposal is very powerful as it allows users to define their own type hinting rules (PHP could obviously still provide some default &lt;code&gt;Int&lt;/code&gt;, etc classes.)&lt;/p&gt;

&lt;p&gt;My main issue with this proposal is that it requires the value to be boxed and unboxed on every type hint. This seems like overkill and like a performance issue. But obviously the internally defined type hinting classes could optimize this.&lt;/p&gt;

&lt;p&gt;Generally this proposal is not yet that mature, so it could well have some more flaws. Still, I think that this is one of the more interesting approaches, especially as it is so powerful.&lt;/p&gt;

&lt;h2 id='we_need_you'&gt;We need you!&lt;/h2&gt;

&lt;p&gt;I hope that I could give you a small overview of the different possible approaches.&lt;/p&gt;

&lt;p&gt;Now it&amp;#8217;s your turn! PHP needs your valuable feedback on the different proposals, so a good decision can be made.&lt;/p&gt;

&lt;p&gt;You can give feedback directly on the internals &lt;a href='http://php.net/mailing-lists.php'&gt;mailing list&lt;/a&gt;, but if you don&amp;#8217;t want to do that, you can also leave a comment or ping ircmaxell or me in the &lt;a href='http://chat.stackoverflow.com/rooms/11/php'&gt;StackOverflow PHP chatroom&lt;/a&gt;.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Pointer magic for efficient dynamic value representations</title>
      <link>http://nikic.github.com/2012/02/02/Pointer-magic-for-efficient-dynamic-value-representations.html</link>
      <pubDate>Thu, 02 Feb 2012 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/02/02/Pointer-magic-for-efficient-dynamic-value-representations</guid>
      <description>&lt;p&gt;What always intrigued me were the various little tricks that JavaScript implementations use to be as fast as they are. One of those tricks is how JS implementations represent values to be more time and space efficient.&lt;/p&gt;

&lt;p&gt;I think it&amp;#8217;s worth the time to look at those neat little tricks for two reasons: Firstly, some of them are also applicable in other situations (though most of them admittedly are not). And secondly, because they remind you of how various data is represented at the low level (stuff you may have learned in your CS courses at some point - or probably not).&lt;/p&gt;

&lt;p&gt;So, let&amp;#8217;s start!&lt;/p&gt;

&lt;h2 id='the_trivial_approach_tagged_unions'&gt;The trivial approach: Tagged unions&lt;/h2&gt;

&lt;p&gt;JavaScript is a dynamically typed language and as such needs some kind of &amp;#8220;value&amp;#8221; data structure, which stores the current type and the value in that type. The trivial approach to this problem is to use a tagged union. This could look roughly like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='cpp'&gt;&lt;span class='cp'&gt;#include &amp;lt;stdint.h&amp;gt;&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Value&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='k'&gt;public&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt;
    &lt;span class='k'&gt;enum&lt;/span&gt; &lt;span class='n'&gt;TypeTag&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;IntType&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;DoubleType&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;StringType&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;ObjectType&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='c1'&gt;// constructors, methods, ...&lt;/span&gt;
&lt;span class='k'&gt;private&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt;
    &lt;span class='k'&gt;union&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;int32_t&lt;/span&gt;  &lt;span class='n'&gt;asInt32&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='kt'&gt;double&lt;/span&gt;   &lt;span class='n'&gt;asDouble&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='n'&gt;String&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;asString&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='n'&gt;Object&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;asObject&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;payload&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;TypeTag&lt;/span&gt; &lt;span class='n'&gt;type&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As you can see the idea is very simple. You have a type tag, which specifies which type the value currently has. And a union that&amp;#8217;s used to store the various types. Examples to store integers and objects would look like that:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='cpp'&gt;&lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;Value&lt;/span&gt;&lt;span class='o'&gt;::&lt;/span&gt;&lt;span class='n'&gt;Value&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;int32_t&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;type&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;IntType&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;payload&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;Value&lt;/span&gt;&lt;span class='o'&gt;::&lt;/span&gt;&lt;span class='n'&gt;Value&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;Object&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;object&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;type&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;ObjectType&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;payload&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;object&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This straightforward approach has several problems: The size of the above object would be 16 bytes = 128 bits (on a 64 bit machine). This is a size that you wouldn&amp;#8217;t just pass around by value. So you need to allocate it on the heap and use a pointer to it. Allocating stuff on the heap is slow and incurs memory overhead. When using the value later it has to be fetched from the heap, which is again slow.&lt;/p&gt;

&lt;p&gt;So what we want is to make values smaller and move them off the heap. Before we get to that, let&amp;#8217;s first have a look at a technique called pointer tagging:&lt;/p&gt;

&lt;h2 id='an_interlude_pointer_tagging'&gt;An interlude: Pointer tagging&lt;/h2&gt;

&lt;p&gt;Have a look at the above structure again. It consists of a union, which has a size of 8 bytes, and an integer which fits into a single byte (a &lt;code&gt;char&lt;/code&gt;). So theoretically the total size should be 9 bytes. But it&amp;#8217;s not - it&amp;#8217;s 16 bytes. This is because the compiler adds padding to the data structure in order to align it in memory.&lt;/p&gt;

&lt;p&gt;This is done for performance reasons: The CPU can access the memory only in word sized chunks. So if our data always starts at a word it can be fetched efficiently. If it were to start somewhere in the middle of a word the CPU would have to apply masks and shifts in order to get the data - which would be too slow.&lt;/p&gt;

&lt;p&gt;A word on a 64 bit machine is 64 bit = 8 bytes (obviously ^^). So if all pointers to our object will be aligned to 8 bytes it means that they will be multiples of 8. A pointer thus can be 8, 16, 24, 109144, etc. But it can not be 7 or 13. Or speaking in binary: 0b1000 (=8), 0b10000 (=16), 0b11000 (=24), 0b11010101001011000 (=109144).&lt;/p&gt;

&lt;p&gt;As you can see the lower three bits are always zero. So those three bits are basically free to use. We can use them to store a &amp;#8220;tag&amp;#8221; in where, which is an integer between 0 (0b000) and 7 (0b111).&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppTTT
^- actual pointer                                   three tag bits -^&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here is a sample implementation of how to do so:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='cpp'&gt;&lt;span class='cp'&gt;#include &amp;lt;cassert&amp;gt;&lt;/span&gt;
&lt;span class='cp'&gt;#include &amp;lt;stdint.h&amp;gt;&lt;/span&gt;

&lt;span class='k'&gt;template&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='k'&gt;typename&lt;/span&gt; &lt;span class='n'&gt;T&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;alignedTo&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt;
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;TaggedPointer&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='k'&gt;private&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt;
    &lt;span class='n'&gt;static_assert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
        &lt;span class='n'&gt;alignedTo&lt;/span&gt; &lt;span class='o'&gt;!=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='p'&gt;((&lt;/span&gt;&lt;span class='n'&gt;alignedTo&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;alignedTo&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt;
        &lt;span class='s'&gt;&amp;quot;Alignment parameter must be power of two&amp;quot;&lt;/span&gt;
    &lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='c1'&gt;// for 8 byte alignment tagMask = alignedTo - 1 = 8 - 1 = 7 = 0b111&lt;/span&gt;
    &lt;span class='c1'&gt;// i.e. the lowest three bits are set, which is where the tag is stored&lt;/span&gt;
    &lt;span class='k'&gt;static&lt;/span&gt; &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;intptr_t&lt;/span&gt; &lt;span class='n'&gt;tagMask&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;alignedTo&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='c1'&gt;// pointerMask is the exact contrary: 0b...11111000&lt;/span&gt;
    &lt;span class='c1'&gt;// i.e. all bits apart from the three lowest are set, which is where the pointer is stored&lt;/span&gt;
    &lt;span class='k'&gt;static&lt;/span&gt; &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;intptr_t&lt;/span&gt; &lt;span class='n'&gt;pointerMask&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;~&lt;/span&gt;&lt;span class='n'&gt;tagMask&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='c1'&gt;// save us some reinterpret_casts with a union&lt;/span&gt;
    &lt;span class='k'&gt;union&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;asPointer&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='n'&gt;intptr_t&lt;/span&gt; &lt;span class='n'&gt;asBits&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;};&lt;/span&gt;

&lt;span class='k'&gt;public&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;TaggedPointer&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;tag&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;set&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;tag&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='n'&gt;set&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;tag&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// make sure that the pointer really is aligned&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;((&lt;/span&gt;&lt;span class='k'&gt;reinterpret_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;intptr_t&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;tagMask&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='c1'&gt;// make sure that the tag isn&amp;#39;t too large&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;((&lt;/span&gt;&lt;span class='n'&gt;tag&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;pointerMask&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

        &lt;span class='n'&gt;asPointer&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;|=&lt;/span&gt; &lt;span class='n'&gt;tag&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;getPointer&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;reinterpret_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;pointerMask&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;getTag&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;tagMask&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The code is fairly straightforward I think. You could then use it like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='cpp'&gt;&lt;span class='kt'&gt;double&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mf'&gt;17.0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='n'&gt;TaggedPointer&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='kt'&gt;double&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;8&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt; &lt;span class='n'&gt;taggedPointer&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;5&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='n'&gt;taggedPointer&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;getPointer&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt; &lt;span class='c1'&gt;// == &amp;amp;number&lt;/span&gt;
&lt;span class='n'&gt;taggedPointer&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;getTag&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt; &lt;span class='c1'&gt;// == 5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The technique described above is useful in many contexts. Basically it&amp;#8217;s applicable to any situation where you want to add a small bit of info to an aligned pointer. And this is also the situation we are in :)&lt;/p&gt;

&lt;p&gt;But before we get to that let&amp;#8217;s first have a look at another, very similar trick:&lt;/p&gt;

&lt;h2 id='storing_integers_in_the_pointer'&gt;Storing integers in the pointer&lt;/h2&gt;

&lt;p&gt;JavaScript does not have an integer type per se, it only has IEEE 754 doubles. Still people mainly work with small integral numbers (think of loops, array indexes, etc). Thus it makes sense to store those small integers in a real integer variable instead of a double, because many operations are much faster this way. That&amp;#8217;s also the reason why in the &lt;code&gt;Value&lt;/code&gt; class above I have a distinct int32 type and not just a double type.&lt;/p&gt;

&lt;p&gt;Especially for integers the above &lt;code&gt;Value&lt;/code&gt; approach seems like overkill. One has to allocate 16 bytes of memory (+ overhead) and has to maintain a pointer, just to store 4 bytes of data. That&amp;#8217;s inefficient both in terms of memory and performance.&lt;/p&gt;

&lt;p&gt;The solution obviously is to use something similar to the tagged pointers introduced above. You can say: &amp;#8220;If the last bit of the pointer is 1 then it actually isn&amp;#8217;t a pointer at all, but it&amp;#8217;s an integer.&amp;#8221;&lt;/p&gt;

&lt;p&gt;Here is how a pointer would look:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppTT0
^- actual pointer                                     two tag bits -^ ^- last bit 0&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And this is how the integer would look like (note that as we need one bit for distinguishing the type the integer has only 63 bits):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiii1
^- actual integer                                                     ^- last bit 1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To get the actual integer value we just need to shift off the 1 bit using &lt;code&gt;integer &amp;gt;&amp;gt; 1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here again a sample implementation:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='cpp'&gt;&lt;span class='cp'&gt;#include &amp;lt;cassert&amp;gt;&lt;/span&gt;
&lt;span class='cp'&gt;#include &amp;lt;stdint.h&amp;gt;&lt;/span&gt;

&lt;span class='k'&gt;template&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='k'&gt;typename&lt;/span&gt; &lt;span class='n'&gt;T&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;alignedTo&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt;
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;TaggedPointerOrInt&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='k'&gt;private&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt;
    &lt;span class='n'&gt;static_assert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
        &lt;span class='n'&gt;alignedTo&lt;/span&gt; &lt;span class='o'&gt;!=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='p'&gt;((&lt;/span&gt;&lt;span class='n'&gt;alignedTo&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;alignedTo&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt;
        &lt;span class='s'&gt;&amp;quot;Alignment parameter must be power of two&amp;quot;&lt;/span&gt;
    &lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='n'&gt;static_assert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
        &lt;span class='n'&gt;alignedTo&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
        &lt;span class='s'&gt;&amp;quot;Pointer must be at least 2-byte aligned in order to store an int&amp;quot;&lt;/span&gt;
    &lt;span class='p'&gt;);&lt;/span&gt;

    &lt;span class='c1'&gt;// for 8 byte alignment tagMask = alignedTo - 1 = 8 - 1 = 7 = 0b111&lt;/span&gt;
    &lt;span class='c1'&gt;// i.e. the lowest three bits are set, which is where the tag is stored&lt;/span&gt;
    &lt;span class='k'&gt;static&lt;/span&gt; &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;intptr_t&lt;/span&gt; &lt;span class='n'&gt;tagMask&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;alignedTo&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='c1'&gt;// pointerMask is the exact contrary: 0b...11111000&lt;/span&gt;
    &lt;span class='c1'&gt;// i.e. all bits apart from the three lowest are set, which is where the pointer is stored&lt;/span&gt;
    &lt;span class='k'&gt;static&lt;/span&gt; &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;intptr_t&lt;/span&gt; &lt;span class='n'&gt;pointerMask&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='o'&gt;~&lt;/span&gt;&lt;span class='n'&gt;tagMask&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='c1'&gt;// save us some reinterpret_casts with a union&lt;/span&gt;
    &lt;span class='k'&gt;union&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;asPointer&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='n'&gt;intptr_t&lt;/span&gt; &lt;span class='n'&gt;asBits&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;};&lt;/span&gt;

&lt;span class='k'&gt;public&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;TaggedPointerOrInt&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;tag&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;setPointer&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;tag&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;TaggedPointerOrInt&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;intptr_t&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;setInt&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='n'&gt;setPointer&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;tag&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// make sure that the pointer really is aligned&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;((&lt;/span&gt;&lt;span class='k'&gt;reinterpret_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;intptr_t&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;tagMask&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
        &lt;span class='c1'&gt;// make sure that the tag isn&amp;#39;t too large&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;(((&lt;/span&gt;&lt;span class='n'&gt;tag&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;pointerMask&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

        &lt;span class='c1'&gt;// last bit isn&amp;#39;t part of tag anymore, but just zero, thus the &amp;lt;&amp;lt; 1&lt;/span&gt;
        &lt;span class='n'&gt;asPointer&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;|=&lt;/span&gt; &lt;span class='n'&gt;tag&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='n'&gt;setInt&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;intptr_t&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// make sure that when we &amp;lt;&amp;lt; 1 there will be no data loss&lt;/span&gt;
        &lt;span class='c1'&gt;// i.e. make sure that it&amp;#39;s a 31 bit / 63 bit integer&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;(((&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

        &lt;span class='c1'&gt;// shift the number to the left and set lowest bit to 1&lt;/span&gt;
        &lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;getPointer&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;isPointer&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;

        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;reinterpret_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;T&lt;/span&gt; &lt;span class='o'&gt;*&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;pointerMask&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;getTag&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;isPointer&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;

        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;tagMask&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;intptr_t&lt;/span&gt; &lt;span class='n'&gt;getInt&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;isInt&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;

        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;bool&lt;/span&gt; &lt;span class='n'&gt;isPointer&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;bool&lt;/span&gt; &lt;span class='n'&gt;isInt&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Usage example:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='cpp'&gt;&lt;span class='c1'&gt;// either a pointer to a double (with a two bit tag) or a 63 bit integer&lt;/span&gt;
&lt;span class='kt'&gt;double&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mf'&gt;17.0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='n'&gt;TaggedPointerOrInt&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='kt'&gt;double&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;8&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt; &lt;span class='n'&gt;taggedPointerOrInt&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;3&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='n'&gt;taggedPointerOrInt&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;isPointer&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt; &lt;span class='c1'&gt;// == true;&lt;/span&gt;
&lt;span class='n'&gt;taggedPointerOrInt&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;getPointer&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt; &lt;span class='c1'&gt;// == &amp;amp;number&lt;/span&gt;
&lt;span class='n'&gt;taggedPointerOrInt&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;getTag&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt; &lt;span class='c1'&gt;// == 3&lt;/span&gt;

&lt;span class='n'&gt;taggedPointerOrInt&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;setInt&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;123456789&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='n'&gt;taggedPointerOrInt&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;isInt&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt; &lt;span class='c1'&gt;// == true&lt;/span&gt;
&lt;span class='n'&gt;taggedPointerOrInt&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;getInt&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt; &lt;span class='c1'&gt;// == 123456789&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Now using the above technique we have a powerful way to optimize the original Value class. Integers will be stored directly in the pointer; doubles, strings and objects will be stored as a pointer with a tag identifying their type (e.g 0 = 0b00 = object, 1 = 0b01 = string, 2 = 0b10 = double). Booleans could be stored similarly to ints (and fetched using a simple &lt;code&gt;bool &amp;gt;&amp;gt; 3&lt;/code&gt;):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;00000000|00000000|00000000|00000000|00000000|00000000|00000000|0000b110
                                                actual bool value -^  ^- last bit 0 (as it isn&amp;#39;t an int)
                       tag = 3 = 0b11 to identify that it&amp;#39;s a bool -^^&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So what have we gained overall? Quite a bit! Integers (and bools) aren&amp;#8217;t allocated on the heap anymore. That saves memory and probably more importantly improves performance (because the code doesn&amp;#8217;t have to access the heap). Also the pointers to strings and objects can now be dereferenced directly, without first fetching a value (which stores a pointer to the real string/object).&lt;/p&gt;

&lt;p&gt;But we aren&amp;#8217;t done yet: The double still requires a pointer; wouldn&amp;#8217;t it be nice to get rid of that too?&lt;/p&gt;

&lt;h2 id='pointers_in_the_nanspace'&gt;Pointers in the NaN-Space&lt;/h2&gt;

&lt;h3 id='recap_how_do_doubles_look_like'&gt;Recap: How do doubles look like?&lt;/h3&gt;

&lt;p&gt;Doubles in JS are following the IEEE 754 Standard for Floating-Point Arithmetic. In C++ doubles aren&amp;#8217;t specified to follow any standard, but we assume that they use IEEE 754, too (which, practically speaking is a pretty good assumption. In general this whole article is nothing more than a large set of evil assumptions that any self-respecting C++ developer would murder you for.)&lt;/p&gt;

&lt;p&gt;As you probably know IEEE 754 doubles are internally represented using the following bit sequence:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
 ^- exponent ^- mantissa
^- sign bit&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The resulting float is basically (-1)^s * m * 2^e (speaking oversimplifyingly).&lt;/p&gt;

&lt;p&gt;What is much more important to us are some special values that doubles can have:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Positive zero (0.0):
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
00000000|00000000|00000000|00000000|00000000|00000000|00000000|00000000
^- sign bit 0 (= +), all other bits also 0

Negative zero (-0.0):
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
10000000|00000000|00000000|00000000|00000000|00000000|00000000|00000000
^- sign bit 1 (= -), all other 0&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;IEEE defines two zeros: +0 and -0. Both have exponent and mantissa zeroed out and are distinguished by the value of the sign bit. This may seem pointless at first - zero is zero after all - but it allows for some subtle distinctions. For example 1.0 / 0.0 is +INF whereas 1.0 / -0.0 is -INF. There are also other operations where the sign of the zero makes a difference.&lt;/p&gt;

&lt;p&gt;One interesting behavior of the zeros is that they are considered equal in comparison. I.e. 0.0 == -0.0. The only way to find out whether a number is negative zero is to check whether the sign bit is set:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='cpp'&gt;&lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;bool&lt;/span&gt; &lt;span class='n'&gt;isNegativeZero&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;double&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='k'&gt;reinterpret_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;int64_t&lt;/span&gt; &lt;span class='o'&gt;*&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;!=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;pre&gt;&lt;code&gt;Positive infinity (+INF):
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
01111111|11110000|00000000|00000000|00000000|00000000|00000000|00000000
 ^- exponent bits all 1                          mantissa bits all 0 -^
^- sign bit 0 (= +)

Negative infinity (-INF):
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
11111111|11110000|00000000|00000000|00000000|00000000|00000000|00000000
 ^- exponent bits all 1                          mantissa bits all 0 -^
^- sign bit 1 (= -)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Infinity has all exponent bits set and a zero mantissa. Again there is +INF and -INF, distinguished by the sign bit.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Signaling NaN:
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
s1111111|11110ppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp
             ^- first mantissa bit 0    everything else is &amp;quot;payload&amp;quot; -^
 ^- exponent bits all 1
^- any sign bit

Quiet NaN:
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
s1111111|11111ppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp
             ^- first mantissa bit 1    everything else is &amp;quot;payload&amp;quot; -^
 ^- exponent bits all 1                 and mustn&amp;#39;t be all-zero (as it
^- any sign bit                         would be INF then)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;NaNs represent values that are Not a Number. E.g. 0.0/0.0 = NaN. NaNs also have interesting comparison semantics, as any comparison with NaN will be false (including NaN == NaN).&lt;/p&gt;

&lt;p&gt;The representations of NaNs also have all exponent bits set (like infinity) but have a non-zero mantissa. The sign bit is irrelevant for NaNs.&lt;/p&gt;

&lt;p&gt;There are two types of NaNs: Signaling NaNs (sNaN) and quiet NaNs (qNaN). They are distinguished by the first bit after the exponent: If it is 1 then the NaN is quiet; if it is 0 it is signaling. Signaling NaNs - in theory - should throw an invalid operation exception (EM_INVALID) when they are operated upon, whereas quiet NaNs should just be left alone. Practically this doesn&amp;#8217;t seem to be used much, at least in MSVC this is disabled by default and needs to be enabled by compiler option + &lt;code&gt;__controlfp()&lt;/code&gt; call.&lt;/p&gt;

&lt;p&gt;Where it starts to get interesting for us is that NaNs additionally encode a 51 bit &amp;#8220;payload&amp;#8221; in the mantissa. This payload was originally designed to contain error information.&lt;/p&gt;

&lt;p&gt;But deceitful as we are we will misuse that NaN payload to stuff other things in it - like integers and pointers.&lt;/p&gt;

&lt;h3 id='64_bit_is_a_lie'&gt;64 bit is a lie&lt;/h3&gt;

&lt;p&gt;On 64 bit architectures pointers have a size of 64 bits, obviously. But think about that again: How much is 64 bit actually? That&amp;#8217;s 2^64 = 1.84467441 * 10^19 addressable memory bytes. That&amp;#8217;s 16 EiB (Exabytes). Or talking in more convenient terms that&amp;#8217;s 17179869184 Gigabytes. I don&amp;#8217;t know about you but I only have 4GB of memory. I&amp;#8217;ve heard of exotic server setups that have around 800 GB of memory. But even those abominations costing hundreds of thousands of dollars still would only need a 40 bit pointer space (2^40 = 1024 GB).&lt;/p&gt;

&lt;p&gt;Unsurprisingly we aren&amp;#8217;t the first to notice this: the x86-64 architecture utilizes only the lower 48 bits (which still allows 256 TiB) of a pointer. Additionally bits 63 through 48 must be copies of bit 47. Pointers that follow this pattern are called canonical.&lt;/p&gt;

&lt;p&gt;This leads to a strange looking address space (you can find a nicer version of this image on &lt;a href='http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details'&gt;Wikipedia&lt;/a&gt;):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;0xFFFFFFFF.FFFFFFFF
         ...        &amp;lt;- Canonical upper &amp;quot;half&amp;quot;
0xFFFF8000.00000000
         ...
         ...        &amp;lt;- Noncanonical
         ...
0x00007FFF.FFFFFFFF
         ...        &amp;lt;- Canonical lower &amp;quot;half&amp;quot;
0x00000000.00000000&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The operating system normally assigns only pointers from the lower half to applications, reserving the upper half for itself. As always there are outliers - e.g. &lt;a href='https://bugzilla.mozilla.org/show_bug.cgi?id=577056'&gt;Solaris assigns addresses from the upper half&lt;/a&gt; under some circumstances (mmap). But we&amp;#8217;ll just ignore that and assume that only the lower half from 0x00000000.00000000 to 0x00007FFF.FFFFFFFF is used. (Again, all these evil assumptions&amp;#8230;)&lt;/p&gt;

&lt;p&gt;At this point you should see what I am driving at: If pointers are really only 48 bits large they fit perfectly into our 51 bit payload space.&lt;/p&gt;

&lt;h3 id='sample_implementation'&gt;Sample implementation&lt;/h3&gt;

&lt;p&gt;Here is an implementation stub for doing so (just implements doubles, ints and void pointers; a real implementation would look similar, just with more types):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='cpp'&gt;&lt;span class='cp'&gt;#include &amp;lt;stdint.h&amp;gt;&lt;/span&gt;
&lt;span class='cp'&gt;#include &amp;lt;cassert&amp;gt;&lt;/span&gt;

&lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;bool&lt;/span&gt; &lt;span class='n'&gt;isNegativeZero&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;double&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='k'&gt;reinterpret_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;int64_t&lt;/span&gt; &lt;span class='o'&gt;*&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;!=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Value&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='k'&gt;private&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt;
    &lt;span class='k'&gt;union&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='kt'&gt;double&lt;/span&gt; &lt;span class='n'&gt;asDouble&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='n'&gt;uint64_t&lt;/span&gt; &lt;span class='n'&gt;asBits&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;};&lt;/span&gt;

    &lt;span class='k'&gt;static&lt;/span&gt; &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;uint64_t&lt;/span&gt; &lt;span class='n'&gt;MaxDouble&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mh'&gt;0xfff8000000000000&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;static&lt;/span&gt; &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;uint64_t&lt;/span&gt; &lt;span class='n'&gt;Int32Tag&lt;/span&gt;  &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mh'&gt;0xfff9000000000000&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;static&lt;/span&gt; &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;uint64_t&lt;/span&gt; &lt;span class='n'&gt;PtrTag&lt;/span&gt;    &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mh'&gt;0xfffa000000000000&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='k'&gt;public&lt;/span&gt;&lt;span class='o'&gt;:&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;Value&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='kt'&gt;double&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;int32_t&lt;/span&gt; &lt;span class='n'&gt;asInt32&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;static_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;int32_t&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

        &lt;span class='c1'&gt;// if the double can be losslessly stored as an int32 do so&lt;/span&gt;
        &lt;span class='c1'&gt;// (int32 doesn&amp;#39;t have -0, so check for that too)&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='n'&gt;asInt32&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='n'&gt;isNegativeZero&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='k'&gt;this&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;Value&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asInt32&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
            &lt;span class='k'&gt;return&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='n'&gt;asDouble&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;Value&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='n'&gt;int32_t&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;number&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='n'&gt;Int32Tag&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;Value&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// ensure that the pointer really is only 48 bit&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;((&lt;/span&gt;&lt;span class='k'&gt;reinterpret_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;uint64_t&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;PtrTag&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

        &lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;reinterpret_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;uint64_t&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;pointer&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='n'&gt;PtrTag&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;bool&lt;/span&gt; &lt;span class='n'&gt;isDouble&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='n'&gt;MaxDouble&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;bool&lt;/span&gt; &lt;span class='n'&gt;isInt32&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;Int32Tag&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='n'&gt;Int32Tag&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;bool&lt;/span&gt; &lt;span class='n'&gt;isPointer&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='n'&gt;PtrTag&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='n'&gt;PtrTag&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;double&lt;/span&gt; &lt;span class='n'&gt;getDouble&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;isDouble&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;

        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;asDouble&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='n'&gt;int32_t&lt;/span&gt; &lt;span class='n'&gt;getInt32&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;isInt32&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;

        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;static_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='n'&gt;int32_t&lt;/span&gt;&lt;span class='o'&gt;&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='o'&gt;~&lt;/span&gt;&lt;span class='n'&gt;Int32Tag&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='kr'&gt;inline&lt;/span&gt; &lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;getPointer&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;assert&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;isPointer&lt;/span&gt;&lt;span class='p'&gt;());&lt;/span&gt;

        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='k'&gt;reinterpret_cast&lt;/span&gt;&lt;span class='o'&gt;&amp;lt;&lt;/span&gt;&lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='o'&gt;*&amp;gt;&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;asBits&lt;/span&gt; &lt;span class='o'&gt;&amp;amp;&lt;/span&gt; &lt;span class='o'&gt;~&lt;/span&gt;&lt;span class='n'&gt;PtrTag&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Again, the code should be easy enough to understand. As an aid, here are the three consts from the top in binary:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Maximum double (qNaN with sign bit set without payload):
                    seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
0xfff8000000000000: 11111111|11111000|00000000|00000000|00000000|00000000|00000000|00000000

32 bit integer:
                    seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
0xfff9000000000000: 11111111|11111001|00000000|00000000|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii
                                                        ^- integer value

Pointer:
                    seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm
0xfffa000000000000: 11111111|11111010|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp
                                      ^- 48 bit pointer value&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id='a_few_more_considerations'&gt;A few more considerations&lt;/h3&gt;

&lt;p&gt;The above code makes doubles accessible directly, without further operations and accesses pointers using a bit mask (that&amp;#8217;s what Mozilla does). An alternative implementation could do it the other way around and make pointers accessible directly and doubles using an offset (i.e. you would add &lt;code&gt;0x0001000000000000&lt;/code&gt; on inserting a double and subtract it when retrieving it). The latter is what WebKit does.&lt;/p&gt;

&lt;p&gt;On 32 bit machines one can access the lower 32 bits of the 64 bit value directly. So both pointers and integers can be accessed directly in any case. On the other hand, whereas the tagged pointer with embedded integer approach would need only a 32 bit value on 32 bit machines, the NaN approach will need a 64 bit value in both 32 bit and 64 bit environments. So on 32 bit two words need to be passed around, instead of just one (that&amp;#8217;s why Mozilla calls this approach fatvals).&lt;/p&gt;

&lt;p&gt;Still, considering the overall wins it is worth it: Both integers and doubles (and for that matter also bools and all other &amp;#8220;small&amp;#8221; values) can be stored directly in the value, without accessing the heap.&lt;/p&gt;

&lt;h3 id='links'&gt;Links&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='http://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations'&gt;value representation in javascript implementations&lt;/a&gt; (article similar to this one, looking more at the general picture than at the concrete implementation)&lt;/li&gt;

&lt;li&gt;&lt;a href='http://evilpie.github.com/sayrer-fatval-backup/cache.aspx.htm'&gt;Mozilla&amp;#8217;s New Javascript Value Representation&lt;/a&gt; (you&amp;#8217;ll have to scroll down a bit)&lt;/li&gt;

&lt;li&gt;WebKit&amp;#8217;s implementation: &lt;a href='http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSValue.h#L250'&gt;declaration&lt;/a&gt;; &lt;a href='http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSValueInlineMethods.h'&gt;inline methods&lt;/a&gt;; &lt;a href='http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSValue.cpp'&gt;methods&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://dxr.lanedo.com/mozilla-central/js/src/jsval.h.html'&gt;Mozilla&amp;#8217;s implementation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
    </item>
    
    <item>
      <title>htmlspecialchars() improvements in PHP 5.4</title>
      <link>http://nikic.github.com/2012/01/28/htmlspecialchars-improvements-in-PHP-5-4.html</link>
      <pubDate>Sat, 28 Jan 2012 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/01/28/htmlspecialchars-improvements-in-PHP-5-4</guid>
      <description>&lt;p&gt;There has been lots of buzz about many of the new features in PHP 5.4, like the traits support, the short array syntax and all those other syntax improvements.&lt;/p&gt;

&lt;p&gt;But one set of changes that I think is particularly important was largely overlooked: For PHP 5.4 cataphract (&lt;a href='http://stackoverflow.com/users/127724/artefacto'&gt;Artefacto&lt;/a&gt; on StackOverflow) heroically rewrote large parts of &lt;code&gt;htmlspecialchars&lt;/code&gt; thus fixing various quirks and adding some really nice new features.&lt;/p&gt;

&lt;p&gt;(The changes discussed here apply not only to &lt;a href='http://docs.php.net/htmlspecialchars'&gt;&lt;code&gt;htmlspecialchars&lt;/code&gt;&lt;/a&gt;, but also to the related &lt;a href='http://docs.php.net/htmlentities'&gt;&lt;code&gt;htmlentities&lt;/code&gt;&lt;/a&gt; and in parts to &lt;a href='http://docs.php.net/htmlspecialchars_decode'&gt;&lt;code&gt;htmlspecialchars_decode&lt;/code&gt;&lt;/a&gt;, &lt;a href='http://docs.php.net/html_entity_decode'&gt;&lt;code&gt;html_entity_decode&lt;/code&gt;&lt;/a&gt; and &lt;a href='http://docs.php.net/get_html_translation_table'&gt;&lt;code&gt;get_html_translation_table&lt;/code&gt;&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;Here a quick summary of the most important changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UTF-8 as the default charset&lt;/li&gt;

&lt;li&gt;Improved error handling (&lt;code&gt;ENT_SUBSTITUTE&lt;/code&gt;)&lt;/li&gt;

&lt;li&gt;Doctype handling (&lt;code&gt;ENT_HTML401&lt;/code&gt;, &amp;#8230;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id='utf8_as_the_default_charset'&gt;UTF-8 as the default charset&lt;/h2&gt;

&lt;p&gt;As you hopefully know the third argument for &lt;code&gt;htmlspecialchars&lt;/code&gt; is the character set. Thing is: Most people just leave that argument out, thus falling back to the default charset. This default charset was ISO-8859-1 before PHP 5.4 and as such did not match the UTF-8 encoding most people use. PHP 5.4 fixes this by making UTF-8 the default.&lt;/p&gt;

&lt;h2 id='improved_error_handling'&gt;Improved error handling&lt;/h2&gt;

&lt;p&gt;Error handling in &lt;code&gt;htmlspecialchars&lt;/code&gt; before PHP 5.4 was &amp;#8230; uhm, let&amp;#8217;s call it &amp;#8220;unintuitive&amp;#8221;:&lt;/p&gt;

&lt;p&gt;If you passed a string containing an &amp;#8220;invalid code unit sequence&amp;#8221; (which is Unicode slang for &amp;#8220;not encoded correctly&amp;#8221;) &lt;code&gt;htmlspecialchars&lt;/code&gt; would return an empty string. Well, okay, so far so good. The funny thing was that it additionally would throw an error, &lt;strong&gt;but only if error display was disabled&lt;/strong&gt;. So it would only error if errors are hidden. Nice, innit?&lt;/p&gt;

&lt;p&gt;This basically meant that on your development machine you wouldn&amp;#8217;t see any errors, but on your production machine the error log would be flooded with them. Awesome.&lt;/p&gt;

&lt;p&gt;So, as of PHP 5.4 thankfully &lt;em&gt;this behavior is gone&lt;/em&gt;. The error will not be generated anymore.&lt;/p&gt;

&lt;p&gt;Additionally there are two options that allow you to specify an alternative to just returning an empty string:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ENT_IGNORE&lt;/code&gt;: This option (which isn&amp;#8217;t actually new, it was there in PHP 5.3 already) will just drop all invalid code unit sequences. This is bad for two reasons: First, you won&amp;#8217;t notice invalid encoding because it&amp;#8217;ll be simply dropped. Second, this imposes a certain security risk (for more info see the &lt;a href='http://unicode.org/reports/tr36/#Deletion_of_Noncharacters'&gt;Unicode Security Considerations&lt;/a&gt;).&lt;/li&gt;

&lt;li&gt;&lt;code&gt;ENT_SUBSTITUTE&lt;/code&gt;: This new alternative option takes a much more sensible approach at the problem: Instead of just dropping the code units they will be replaced by a Unicode Replacement Character (U+FFFD). So invalid code unit sequences will be replaced by � characters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let&amp;#8217;s have a look at the different behaviors (&lt;a href='http://codepad.viper-7.com/kTwelM'&gt;demo&lt;/a&gt;):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt; &lt;span class='c1'&gt;// &amp;quot;\80&amp;quot; is invalid UTF-8 in this context&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;htmlspecialchars&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='se'&gt;\x80&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;                 &lt;span class='c1'&gt;// string(0) &amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;htmlspecialchars&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='se'&gt;\x80&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;ENT_IGNORE&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;     &lt;span class='c1'&gt;// string(2) &amp;quot;ab&amp;quot;&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;htmlspecialchars&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='se'&gt;\x80&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;ENT_SUBSTITUTE&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt; &lt;span class='c1'&gt;// string(5) &amp;quot;a�b&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Clearly, you want the last behavior. In your real code it will probably look like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class='c1'&gt;// this goes into the bootstrap (or where appropriate) to make the code&lt;/span&gt;
&lt;span class='c1'&gt;// not throw a notice on PHP 5.3&lt;/span&gt;
&lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nb'&gt;defined&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;ENT_SUBSTITUTE&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nb'&gt;define&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;ENT_SUBSTITUTE&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;          &lt;span class='c1'&gt;// if you want the empty string behavior on 5.3&lt;/span&gt;
    &lt;span class='c1'&gt;// or&lt;/span&gt;
    &lt;span class='nb'&gt;define&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;ENT_SUBSTITUTE&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;ENT_IGNORE&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// if you want the char removal behavior on 5.3&lt;/span&gt;
                                          &lt;span class='c1'&gt;// (don&amp;#39;t forget about the security issues though!)&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='c1'&gt;// don&amp;#39;t forget to specify the charset! Otherwise you&amp;#39;ll get the old default charset on 5.3.&lt;/span&gt;
&lt;span class='nv'&gt;$escaped&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;htmlspecialchars&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;ENT_QUOTES&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='nx'&gt;ENT_SUBSTITUTE&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;UTF-8&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;h2 id='doctype_handling'&gt;Doctype handling&lt;/h2&gt;

&lt;p&gt;In PHP 5.4 there are four additional flags for specifying the used doctype:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ENT_HTML401&lt;/code&gt; (HTML 4.01) =&amp;gt; this is the default&lt;/li&gt;

&lt;li&gt;&lt;code&gt;ENT_HTML5&lt;/code&gt; (HTML 5)&lt;/li&gt;

&lt;li&gt;&lt;code&gt;ENT_XML1&lt;/code&gt; (XML 1)&lt;/li&gt;

&lt;li&gt;&lt;code&gt;ENT_XHTML&lt;/code&gt; (XHTML)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Depending on which doctype you specify &lt;code&gt;htmlspecialchars&lt;/code&gt; (and the other related functions) will use different entity tables.&lt;/p&gt;

&lt;p&gt;You can see this in the following example (&lt;a href='http://codepad.viper-7.com/mRSquY'&gt;demo&lt;/a&gt;):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;htmlspecialchars&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;ENT_HTML401&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt; &lt;span class='c1'&gt;// string(6) &amp;quot;&amp;amp;#039;&amp;quot;&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;htmlspecialchars&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;ENT_HTML5&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;   &lt;span class='c1'&gt;// string(6) &amp;quot;&amp;amp;apos;&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;So for HTML 5 an &lt;code&gt;&amp;amp;apos;&lt;/code&gt; entity will be generated, whereas for HTML 4.01 - which does not yet support &lt;code&gt;&amp;amp;apos;&lt;/code&gt; - a numerical &lt;code&gt;&amp;amp;#039;&lt;/code&gt; entity is returned.&lt;/p&gt;

&lt;p&gt;The difference becomes more evident when using &lt;code&gt;htmlentities&lt;/code&gt;, because the differences are larger there. You can easily see this by having a look at the raw translation tables:&lt;/p&gt;

&lt;p&gt;To do this, we can use the &lt;code&gt;get_html_translation_table&lt;/code&gt; function. Here first an example for the XML 1 doctype (&lt;a href='http://codepad.viper-7.com/ArEfdy'&gt;demo&lt;/a&gt;):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;get_html_translation_table&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;HTML_ENTITIES&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;ENT_QUOTES&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='nx'&gt;ENT_XML1&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The result will look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;array(5) {
  [&amp;quot;&amp;quot;&amp;quot;]=&amp;gt;
  string(6) &amp;quot;&amp;amp;quot;&amp;quot;
  [&amp;quot;&amp;amp;&amp;quot;]=&amp;gt;
  string(5) &amp;quot;&amp;amp;amp;&amp;quot;
  [&amp;quot;&amp;#39;&amp;quot;]=&amp;gt;
  string(6) &amp;quot;&amp;amp;apos;&amp;quot;
  [&amp;quot;&amp;lt;&amp;quot;]=&amp;gt;
  string(4) &amp;quot;&amp;amp;lt;&amp;quot;
  [&amp;quot;&amp;gt;&amp;quot;]=&amp;gt;
  string(4) &amp;quot;&amp;amp;gt;&amp;quot;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This matches our expectations: XML by itself defines only the five basic entities.&lt;/p&gt;

&lt;p&gt;Now try the same thing for HTML 5 (&lt;a href='http://codepad.viper-7.com/FiYKxv'&gt;demo&lt;/a&gt;) and you&amp;#8217;ll see something like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;array(1510) {
  [&amp;quot;	&amp;quot;]=&amp;gt;
  string(5) &amp;quot;&amp;amp;Tab;&amp;quot;
  [&amp;quot;
&amp;quot;]=&amp;gt;
  string(9) &amp;quot;&amp;amp;NewLine;&amp;quot;
  [&amp;quot;!&amp;quot;]=&amp;gt;
  string(6) &amp;quot;&amp;amp;excl;&amp;quot;
  [&amp;quot;&amp;quot;&amp;quot;]=&amp;gt;
  string(6) &amp;quot;&amp;amp;quot;&amp;quot;
  [&amp;quot;#&amp;quot;]=&amp;gt;
  string(5) &amp;quot;&amp;amp;num;&amp;quot;
  [&amp;quot;$&amp;quot;]=&amp;gt;
  string(8) &amp;quot;&amp;amp;dollar;&amp;quot;
  [&amp;quot;%&amp;quot;]=&amp;gt;
  string(8) &amp;quot;&amp;amp;percnt;&amp;quot;
  [&amp;quot;&amp;amp;&amp;quot;]=&amp;gt;
  string(5) &amp;quot;&amp;amp;amp;&amp;quot;
  [&amp;quot;&amp;#39;&amp;quot;]=&amp;gt;
  string(6) &amp;quot;&amp;amp;apos;&amp;quot;
  // ...
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So HTML 5 defines a vast number of entities - 1510 to be precise. You can also try HTML 4.01 and XHTML; they both define 253 entities.&lt;/p&gt;

&lt;p&gt;Also affected by the chosen doctype is another new error handling flag which I did not mention above: &lt;code&gt;ENT_DISALLOWED&lt;/code&gt;. This flag will replace characters with a Unicode Replacement Character, which formally are a valid code unit sequences, but are invalid in the given doctype.&lt;/p&gt;

&lt;p&gt;This way you can ensure that the returned string is always well formed regarding encoding (in the given doctype). I&amp;#8217;m not sure though how much sense it makes to use this flag. The browser will handle invalid characters gracefully anyways, so this seems unnecessary to me (though I&amp;#8217;m probably wrong).&lt;/p&gt;

&lt;h2 id='there_is_other_stuff_too'&gt;There is other stuff too&amp;#8230;&lt;/h2&gt;

&lt;p&gt;&amp;#8230; but I don&amp;#8217;t want to list everything here. I think the three changes mentioned above are the most important improvements.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nb'&gt;htmlspecialchars&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&amp;lt;&lt;/span&gt;&lt;span class='se'&gt;\x80&lt;/span&gt;&lt;span class='s2'&gt;The End&lt;/span&gt;&lt;span class='se'&gt;\xef\xbf\xbf&lt;/span&gt;&lt;span class='s2'&gt;&amp;gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;ENT_QUOTES&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='nx'&gt;ENT_HTML5&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='nx'&gt;ENT_DISALLOWED&lt;/span&gt; &lt;span class='o'&gt;|&lt;/span&gt; &lt;span class='nx'&gt;ENT_SUBSTITUTE&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;UTF-8&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;</description>
    </item>
    
    <item>
      <title>Careful: XDebug can skew your performance numbers</title>
      <link>http://nikic.github.com/2012/01/19/Careful-XDebug-can-skew-your-performance-numbers.html</link>
      <pubDate>Thu, 19 Jan 2012 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/01/19/Careful-XDebug-can-skew-your-performance-numbers</guid>
      <description>&lt;p&gt;&lt;a href='http://xdebug.org/'&gt;XDebug&lt;/a&gt; is a great tool and a really big aid in debugging.&lt;/p&gt;

&lt;p&gt;But running it incurs a certain overhead on many common operations - like calling functions. Due to this having XDebug enabled can skew your benchmarking and profiling numbers. (&amp;#8220;Enabled&amp;#8221; here means just enabled, no debugging going on or such.)&lt;/p&gt;

&lt;p&gt;This has an important implication: If you optimize a script while XDebug is enabled it can actually turn out that it gets &lt;strong&gt;slower&lt;/strong&gt; on your production server (as you &lt;em&gt;hopefully&lt;/em&gt; don&amp;#8217;t have XDebug enabled there).&lt;/p&gt;

&lt;h2 id='an_example_optimizing_a_lexer_wrapper'&gt;An example: &amp;#8220;Optimizing&amp;#8221; a lexer wrapper&lt;/h2&gt;

&lt;p&gt;For my &lt;a href='https://github.com/nikic/PHP-Parser'&gt;PHP parser&lt;/a&gt; I need a smaller wrapper around PHP&amp;#8217;s &lt;a href='http://php.net/token_get_all'&gt;&lt;code&gt;token_get_all&lt;/code&gt;&lt;/a&gt; function. This wrapper (called &lt;code&gt;PHPParser_Lexer&lt;/code&gt;) only does nothing more than a little bit of normalization, filtering and mapping on the tokens.&lt;/p&gt;

&lt;p&gt;There are basically two ways how this wrapper can work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;For every source code to be parsed a new &lt;code&gt;Lexer&lt;/code&gt; instance is created and passed into the parser, which then does multiple calls to &lt;code&gt;-&amp;gt;lex()&lt;/code&gt; to fetch one token at a time.&lt;/li&gt;

&lt;li&gt;A &lt;code&gt;Lexer&lt;/code&gt; is creates once and reused for all source codes. Always only a single call to &lt;code&gt;-&amp;gt;lex()&lt;/code&gt; is made, which then returns all the tokens at once.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So, what I did is implement both approaches and see how much time is spent lexing the whole Symfony tree. I got the following results:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Scenario 1 on 5.3.8 with XDebug enabled: 16.939509868622 seconds
Scenario 2 on 5.3.8 with XDebug enabled:  6.104107856751 seconds&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That&amp;#8217;s an impressive saving of 10 seconds (aka 60%) right there! is what I though.&lt;/p&gt;

&lt;p&gt;But then I tried the same thing without XDebug:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Scenario 1 on 5.3.8 with XDebug disabled: 3.656718969345 seconds
Scenario 2 on 5.3.8 with XDebug disabled: 3.195748090744 seconds&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now that XDebug is disabled the numbers are drastically lower and also much closer. The 17 seconds from above are only 3.7 seconds now. The time for the 6 seconds case halved.&lt;/p&gt;

&lt;p&gt;Why is there such a drastic change on the first number, and a less pronounced one on the second? The first scenario needs approximately 70000 &lt;code&gt;-&amp;gt;lex()&lt;/code&gt; calls, whereas the second one only needs 2000. XDebug adds a quite large overhead to function calls, so the scenario with 70000 calls is impacted much more.&lt;/p&gt;

&lt;p&gt;Finally, let&amp;#8217;s look at the numbers for PHP 5.4.0 without XDebug:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Scenario 1 on 5.4.0 with XDebug disabled: 2.8629820346832 seconds
Scenario 2 on 5.4.0 with XDebug disabled: 3.0883010864258 seconds&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And here the numbers are now actually turned around. PHP 5.4 got some optimizations that made the seconds variant slower.&lt;/p&gt;

&lt;h2 id='another_example_micro_benchmarking'&gt;Another example: Micro benchmarking&lt;/h2&gt;

&lt;p&gt;As pointless as they are, people still love micro benchmarks, I do too. You&amp;#8217;ll find lots of blog posts comparing the performance of something vs. something else. The problem is: Most of these are done on development machines, with XDebug enabled.&lt;/p&gt;

&lt;p&gt;As an example this recent &lt;a href='http://gonzalo123.wordpress.com/2012/01/16/checking-the-performance-of-php-exceptions/'&gt;blog post about the performance of exceptions&lt;/a&gt; will serve. Here are the numbers the author measured for exceptions on PHP 5.3/5.4 for a certain script:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;PHP 5.3: 1.1479668617249
PHP 5.4: 0.1864490332&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Looking at those you might say: Damn, that&amp;#8217;s a pretty impressive improvement!&lt;/p&gt;

&lt;p&gt;Well, not really: Seeing these numbers I immediately had the suspicion that the author actually measured two different things: 5.3 with XDebug and 5.4 without it.&lt;/p&gt;

&lt;p&gt;So I tested both versions without XDebug and got 0.14 seconds for PHP 5.3 and 0.09 seconds for PHP 5.4, i.e. a much smaller difference. You can find more detailed results in &lt;a href='http://edorian.posterous.com/never-trust-other-peoples-benchmarks-a-recent'&gt;edorian&amp;#8217;s blog post&lt;/a&gt; on the topic.&lt;/p&gt;

&lt;p&gt;I think many micro benchmarking posts you&amp;#8217;ll find on the net are affected by this. I was also trapped by this multiple times (e.g. see &lt;a href='http://stackoverflow.com/q/3691625/385378'&gt;this question I asked about function call performance&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id='conclusion'&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The conclusion from the above is: If you do benchmarking, benchmark on a machine that&amp;#8217;s configured like your production machine. Otherwise you might actually be anti-optimizing.&lt;/p&gt;

&lt;p&gt;Also: Low level optimizations like function inlining may actually measurably improve performance on your development machine (like in my case), but have little effect in the production environment. So, you can safely go for clean code with small functions and not fear about performance degradation.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Disproving the Single Quotes Performance Myth</title>
      <link>http://nikic.github.com/2012/01/09/Disproving-the-Single-Quotes-Performance-Myth.html</link>
      <pubDate>Mon, 09 Jan 2012 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2012/01/09/Disproving-the-Single-Quotes-Performance-Myth</guid>
      <description>&lt;p&gt;If there is one PHP related thing that I &lt;em&gt;really&lt;/em&gt; hate, then it is definitely the Single Quotes Performance Myth. Let&amp;#8217;s do a random Google search for &lt;a href='https://www.google.com/search?q=php+single+quotes+performance'&gt;&amp;#8220;PHP single quotes performance&amp;#8221;&lt;/a&gt;: &lt;a href='http://stackoverflow.com/questions/482202/is-there-a-performance-benefit-single-quote-vs-double-quote-in-php'&gt;You&lt;/a&gt; &lt;a href='http://stackoverflow.com/questions/3316060/single-quotes-or-double-quotes-for-variable-concatenation'&gt;will&lt;/a&gt; &lt;a href='http://atomized.org/2005/04/php-performance-best-practices/'&gt;get&lt;/a&gt; &lt;a href='https://github.com/fabpot/Twig/issues/407'&gt;many&lt;/a&gt; &lt;a href='http://classyllama.com/development/php/php-single-vs-double-quotes/'&gt;results&lt;/a&gt; telling you that single quotes are faster than double quotes and that string interpolation is much slower than string concatenation. Most of them advise to use single quotes and concatenation to improve the performance of your application.&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s be clear here: &lt;strong&gt;This is pointless.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I hold the opinion that microoptimization like that is just pointless. I doubt that string parsing performance is the bottleneck of even a single PHP application out there. &lt;a href='http://stackoverflow.com/questions/482202/is-there-a-performance-benefit-single-quote-vs-double-quote-in-php#comment-299001'&gt;This comment&lt;/a&gt; nails it pretty well: &amp;#8220;And, on top of that, by using less pixels, you reduce greenhouse emissions.&amp;#8221; I think that everybody will agree that using single quotes won&amp;#8217;t save any significant greenhouse emissions. Still everybody seems to believe that it saves significant execution time. Strange.&lt;/p&gt;

&lt;p&gt;But okay, some say &amp;#8220;Well, it doesn&amp;#8217;t cost me anything to write single quotes, so why not just do it?&amp;#8221;&lt;/p&gt;

&lt;p&gt;Sounds plausible. So I decided to look into how big the difference really is. And it turned out: There is none.&lt;/p&gt;

&lt;h2 id='strings_at_runtime'&gt;Strings at runtime&lt;/h2&gt;

&lt;p&gt;What many people don&amp;#8217;t realize is that strings are parsed during lexing, not during execution. When a PHP script runs all strings are already parsed.&lt;/p&gt;

&lt;p&gt;A simple proof:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$x&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;Test&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$y&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\x54\x65\x73\x74&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;If you have a look at the opcodes that PHP generates for this they will look like this (&lt;a href='http://codepad.viper-7.com/XSHZra'&gt;online demo&lt;/a&gt;):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;compiled vars:  !0 = $x, !1 = $y
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   3     0  &amp;gt;   ASSIGN                                                 !0, &amp;#39;Test&amp;#39;
   4     1      ASSIGN                                                 !1, &amp;#39;Test&amp;#39;
         2    &amp;gt; RETURN                                                 1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You don&amp;#8217;t need to understand the complete output, but you should still see the main point: Both &lt;code&gt;&amp;#39;Test&amp;#39;&lt;/code&gt; and &lt;code&gt;&amp;quot;\x54\x65\x73\x74&amp;quot;&lt;/code&gt; compiled down to the very same opcode.&lt;/p&gt;

&lt;p&gt;What does this mean? That the used quote type and the use of escape sequences does not affect runtime, not at all. You obviously &lt;a href='http://codepad.viper-7.com/TZ95z3'&gt;can still try&lt;/a&gt; but any difference you see there is just measurement error.&lt;/p&gt;

&lt;h2 id='strings_at_compile_time'&gt;Strings at compile time&lt;/h2&gt;

&lt;p&gt;Now some may argue: PHP is an interpreted language, so compile time actually also happens at runtime. My response to that would normally be: Use &lt;a href='http://php.net/manual/en/book.apc.php'&gt;APC&lt;/a&gt;. APC will cache the generated opcodes so they don&amp;#8217;t need to be created on every single request. This can greatly improve page loading time and CPU load, so if you aren&amp;#8217;t using APC yet you might try as well now ;)&lt;/p&gt;

&lt;p&gt;But, hypothetically, let use assume that we are not using APC. So let&amp;#8217;s see whether single quoted strings are actually lexed faster than double quoted ones.&lt;/p&gt;

&lt;p&gt;For this we will use a handy function called &lt;a href='http://php.net/token_get_all'&gt;&lt;code&gt;token_get_all&lt;/code&gt;&lt;/a&gt;: It lexes a PHP string into a token array. The function will return the token texts plainly and not preparsed, but the internal string parsing routines still will get called, so we can use this as a good approximation to the real numbers.&lt;/p&gt;

&lt;p&gt;For testing I use the following script:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt; &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;lt;pre&amp;gt;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='no'&gt;NUM&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;10000000&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='nv'&gt;$singleQuotedStringCode&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&amp;lt;?php &amp;#39;&amp;quot;&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='nb'&gt;str_repeat&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;NUM&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&amp;#39;;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$doubleQuotedStringCode&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;lt;?php &amp;quot;&amp;#39;&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='nb'&gt;str_repeat&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;x&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;NUM&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;quot;;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='nv'&gt;$startTime&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;microtime&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nb'&gt;token_get_all&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$singleQuotedStringCode&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nv'&gt;$endTime&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;microtime&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

&lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;Single quotes: &amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$endTime&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='nv'&gt;$startTime&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; seconds&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='nv'&gt;$startTime&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;microtime&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nb'&gt;token_get_all&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$doubleQuotedStringCode&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nv'&gt;$endTime&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;microtime&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

&lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;Double quotes: &amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$endTime&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='nv'&gt;$startTime&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; seconds&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;It creates two strings with &lt;em&gt;ten million&lt;/em&gt; &lt;code&gt;x&lt;/code&gt; characters and lexes them.&lt;/p&gt;

&lt;p&gt;&lt;a href='http://codepad.viper-7.com/l2cl5X'&gt;You can try it yourself&lt;/a&gt;. If you reload the page multiples the numbers will vary, but the general image will be somewhere like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Single quotes: 0.061846971511841 seconds
Double quotes: 0.061599016189575 seconds&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Namely: &lt;strong&gt;No difference.&lt;/strong&gt; The Single Quotes Performance Myth thus is just a big lie: Single quotes are neither faster at runtime nor at compile time.&lt;/p&gt;

&lt;h2 id='string_interpolation'&gt;String interpolation&lt;/h2&gt;

&lt;p&gt;So the next question is, what is faster: String concatenation or string interpolation?&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s consider this simple case:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$world&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;World&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='s1'&gt;&amp;#39;Hallo &amp;#39;&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='nv'&gt;$world&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='s2'&gt;&amp;quot;Hallo &lt;/span&gt;&lt;span class='si'&gt;$world&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;If you &lt;a href='http://codepad.viper-7.com/tBx3TZ'&gt;try running the corresponding benchmark&lt;/a&gt; you will get numbers similar to this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Concatenation: 0.015104055404663 seconds
Interpolation: 0.016894817352295 seconds&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So interpolation seems &lt;em&gt;slightly&lt;/em&gt; slower than concatenation, doesn&amp;#8217;t it? Well, not exactly, but more on that later. Let&amp;#8217;s look at the opcodes first:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;compiled vars:  !0 = $world
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   2     0  &amp;gt;   ASSIGN                                                   !0, &amp;#39;World&amp;#39;
   3     1      CONCAT                                           ~1      &amp;#39;Hallo+&amp;#39;, !0
         2      FREE                                                     ~1
   4     3      ADD_STRING                                       ~2      &amp;#39;Hallo+&amp;#39;
         4      ADD_VAR                                          ~2      ~2, !0
         5      FREE                                                     ~2
         6    &amp;gt; RETURN                                                   1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here you see the reason why concatenation is slightly faster than interpolation in the above example:&lt;/p&gt;

&lt;p&gt;Concatenation uses a single &lt;code&gt;CONCAT&lt;/code&gt; opcode with &lt;code&gt;&amp;#39;Hallo+&amp;#39;, !0&lt;/code&gt; (&lt;code&gt;!0&lt;/code&gt; is &lt;code&gt;$world&lt;/code&gt; here). All &lt;code&gt;CONCAT&lt;/code&gt; basically does is create a new string buffer with the new length, copy &lt;code&gt;&amp;#39;Hallo+&amp;#39;&lt;/code&gt; into it and then copy &lt;code&gt;$world&lt;/code&gt; into it.&lt;/p&gt;

&lt;p&gt;Interpolation on the other hand needs two opcodes: First &lt;code&gt;ADD_STRING&lt;/code&gt; is called, which in this case basically copies &lt;code&gt;&amp;#39;Hallo+&amp;#39;&lt;/code&gt; into a new string buffer. Then &lt;code&gt;ADD_VAR&lt;/code&gt; is called with &lt;code&gt;~2, !0&lt;/code&gt; (&lt;code&gt;~2&lt;/code&gt; is the buffer that &lt;code&gt;&amp;#39;Hallo+&amp;#39;&lt;/code&gt; was copied to and &lt;code&gt;!0&lt;/code&gt; is still &lt;code&gt;$world&lt;/code&gt;), which basically does the same as &lt;code&gt;CONCAT&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you read closely the difference is clear: Interpolation copies the &lt;code&gt;&amp;#39;Hallo+&amp;#39;&lt;/code&gt; part twice, concatenation only copies it one time. (Disclaimer: I&amp;#8217;m actually not perfectly sure that this is really the reason, but it would be my guess. It could also be just the overhead of another opcode.)&lt;/p&gt;

&lt;p&gt;So, interpolation is slower after all, isn&amp;#8217;t it? Well, not really, let&amp;#8217;s consider this more realistic example:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$name&lt;/span&gt;  &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;Anonymous&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$age&lt;/span&gt;   &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;123&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='nv'&gt;$hobby&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;nothing&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='s1'&gt;&amp;#39;Hi! My name is &amp;#39;&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='nv'&gt;$name&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; and I am &amp;#39;&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='nv'&gt;$age&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; years old! I love doing &amp;#39;&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='nv'&gt;$hobby&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;!&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='s2'&gt;&amp;quot;Hi! My name is &lt;/span&gt;&lt;span class='si'&gt;$name&lt;/span&gt;&lt;span class='s2'&gt; and I am &lt;/span&gt;&lt;span class='si'&gt;$age&lt;/span&gt;&lt;span class='s2'&gt; years old! I love doing &lt;/span&gt;&lt;span class='si'&gt;$hobby&lt;/span&gt;&lt;span class='s2'&gt;!&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;If you &lt;a href='http://codepad.viper-7.com/p4aUGN'&gt;run the benchmark&lt;/a&gt; you will get numbers looking approximately like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Concatenation: 0.053942918777466 seconds
Interpolation: 0.049801111221313 seconds&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And&amp;#8230; interpolation is faster now! Why so? Let&amp;#8217;s have a look at the new opcodes:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;compiled vars:  !0 = $name, !1 = $age, !2 = $hobby
line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   2     0  &amp;gt;   ASSIGN                                                   !0, &amp;#39;Anonymous&amp;#39;
   3     1      ASSIGN                                                   !1, 123
   4     2      ASSIGN                                                   !2, &amp;#39;nothing&amp;#39;
   6     3      CONCAT                                           ~3      &amp;#39;Hi%21+My+name+is+&amp;#39;, !0
         4      CONCAT                                           ~4      ~3, &amp;#39;+and+I+am+&amp;#39;
         5      CONCAT                                           ~5      ~4, !1
         6      CONCAT                                           ~6      ~5, &amp;#39;+years+old%21+I+love+doing+&amp;#39;
         7      CONCAT                                           ~7      ~6, !2
         8      CONCAT                                           ~8      ~7, &amp;#39;%21&amp;#39;
         9      FREE                                                     ~8
   7    10      ADD_STRING                                       ~9      &amp;#39;Hi%21+My+name+is+&amp;#39;
        11      ADD_VAR                                          ~9      ~9, !0
        12      ADD_STRING                                       ~9      ~9, &amp;#39;+and+I+am+&amp;#39;
        13      ADD_VAR                                          ~9      ~9, !1
        14      ADD_STRING                                       ~9      ~9, &amp;#39;+years+old%21+I+love+doing+&amp;#39;
        15      ADD_VAR                                          ~9      ~9, !2
        16      ADD_CHAR                                         ~9      ~9, 33
        17      FREE                                                     ~9
        18    &amp;gt; RETURN                                                   1&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As you can see the interpolation opcodes are more optimal here: For once interpolation distinguishes between adding a single character, a string and a variable. Also it reuses the same temporary variable for the whole interpolation (&lt;code&gt;~9&lt;/code&gt;), whereas concatenation uses six of them (&lt;code&gt;~3&lt;/code&gt;, &lt;code&gt;~4&lt;/code&gt;, &lt;code&gt;~5&lt;/code&gt;, &lt;code&gt;~6&lt;/code&gt;, &lt;code&gt;~7&lt;/code&gt;, &lt;code&gt;~8&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;So interpolation becomes more efficient the more is interpolated. For the simple string + var case it is slower, but for more realistic cases with multiple variables it becomes faster.&lt;/p&gt;

&lt;h2 id='lesson_learned'&gt;Lesson learned&lt;/h2&gt;

&lt;p&gt;At this point I want to emphasize again that the above has no practical impact. All above demos use high iteration counts and any differences in execution time are completely negligible.&lt;/p&gt;

&lt;p&gt;Does this still teach us something? Yes, it does: &amp;#8220;Never trust a statistic you didn&amp;#8217;t forge yourself.&amp;#8221; Amen.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Supercolliding a PHP array</title>
      <link>http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array.html</link>
      <pubDate>Wed, 28 Dec 2011 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array</guid>
      <description>&lt;p&gt;Did you know that inserting &lt;code&gt;2^16 = 65536&lt;/code&gt; specially crafted values into a normal PHP array can take &lt;strong&gt;30 seconds&lt;/strong&gt;? Normally this would take only &lt;em&gt;0.01 seconds&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This is the code to reproduce it:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt; &lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;lt;pre&amp;gt;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='nv'&gt;$size&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nx'&gt;pow&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;2&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;16&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt; &lt;span class='c1'&gt;// 16 is just an example, could also be 15 or 17&lt;/span&gt;

&lt;span class='nv'&gt;$startTime&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;microtime&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$key&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$maxKey&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$size&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt; &lt;span class='nv'&gt;$size&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$key&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;=&lt;/span&gt; &lt;span class='nv'&gt;$maxKey&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$key&lt;/span&gt; &lt;span class='o'&gt;+=&lt;/span&gt; &lt;span class='nv'&gt;$size&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$key&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$endTime&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;microtime&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

&lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;Inserting &amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$size&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; evil elements took &amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$endTime&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='nv'&gt;$startTime&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; seconds&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='nv'&gt;$startTime&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;microtime&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$key&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$maxKey&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$size&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$key&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;=&lt;/span&gt; &lt;span class='nv'&gt;$maxKey&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$key&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$key&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='nv'&gt;$endTime&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;microtime&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

&lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;Inserting &amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$size&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; good elements took &amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$endTime&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='nv'&gt;$startTime&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; seconds&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Try it yourself! You may need to adjust the number &lt;code&gt;16&lt;/code&gt; in the &lt;code&gt;$size = pow(2, 16);&lt;/code&gt; line based on what hardware you have. I would start with &lt;code&gt;14&lt;/code&gt; and increase it one at a time. By the way, I intentionally don&amp;#8217;t provide an online demo at the amazing Viper-7 codepad this time so that it doesn&amp;#8217;t get overloaded.&lt;/p&gt;

&lt;p&gt;Here is a sample output from my machine:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Inserting 65536 evil elements took 32.726480007172 seconds
Inserting 65536 good elements took 0.014460802078247 seconds&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id='how_does_this_work'&gt;How does this work?&lt;/h2&gt;

&lt;p&gt;PHP internally uses hashtables to store arrays. The above creates a hashtable with 100% collisions (i.e. all keys will have the same hash).&lt;/p&gt;

&lt;p&gt;For those who don&amp;#8217;t know how hashtables work: When you write &lt;code&gt;$array[$key]&lt;/code&gt; in PHP the &lt;code&gt;$key&lt;/code&gt; is run through a fast hash function that yields an integer. This integer is then used as an offset into a &amp;#8220;real&amp;#8221; C array (here &amp;#8220;array&amp;#8221; means &amp;#8220;chunk of memory&amp;#8221;).&lt;/p&gt;

&lt;p&gt;Because every hash function has collisions this C array doesn&amp;#8217;t actually store the value we want, but a linked list of possible values. PHP then walks these values one by one and does a full key comparison until it finds the right element with our &lt;code&gt;$key&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Normally there will be only a small number of collisions, so in most cases the linked list will only have one value.&lt;/p&gt;

&lt;p&gt;But the above script creates a hash where &lt;em&gt;all&lt;/em&gt; elements collide.&lt;/p&gt;

&lt;h2 id='how_can_you_make_all_of_them_collide'&gt;How can you make all of them collide?&lt;/h2&gt;

&lt;p&gt;In PHP it&amp;#8217;s very simple. If the key is an integer the hash function is just a no-op: The hash is the integer itself. All PHP does is apply a table mask on top of it: &lt;code&gt;hash &amp;amp; tableMask&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This table mask ensures that the resulting hash is within the bounds of the underlying real C array. This underlying C array has always a size which is a power of 2 (this way the hashtable is always reasonably efficient in both space and time). So if you store 10 elements the real size will be 16. If you store 33 it will be 64. If you store 63 it will also be 64. The table mask is the size minus one. So if the size is 64, i.e. &lt;code&gt;1000000&lt;/code&gt; in binary the table mask will be 63, i.e. &lt;code&gt;0111111&lt;/code&gt; in binary.&lt;/p&gt;

&lt;p&gt;So basically the table mask removes all bits that are greater than the hashtable size. And this is what we are exploiting: &lt;code&gt;0 &amp;amp; 0111111 = 0&lt;/code&gt;, but &lt;code&gt;1000000 &amp;amp; 0111111 = 0&lt;/code&gt; too, so is &lt;code&gt;10000000 &amp;amp; 0111111 = 0&lt;/code&gt; and &lt;code&gt;11000000 &amp;amp; 0111111 = 0&lt;/code&gt;. As long as we keep those lower bits the same the result of the hash will also stay the same.&lt;/p&gt;

&lt;p&gt;So if we insert a total of 64 elements, the first one 0, the second one 64, the third one 128, the fourth one 192, etc., all of those elements will have the same hash (namely 0) and all will be put into the same linked list. And that&amp;#8217;s what the code does. Only not with 64 elements, but with a few more.&lt;/p&gt;

&lt;h2 id='why_is_that_so_abysmally_slow'&gt;Why is that so abysmally slow?&lt;/h2&gt;

&lt;p&gt;Well, for every insertion PHP has to traverse the whole linked list, element for element. On the first insertion it needs to traverse 0 elements (there is nothing there yet). On the second one it traverses 1 element. On the third one 2, on the fourth 3 and on the 64th one 63. Those who know a little bit of math probably know that &lt;code&gt;0+1+2+3+...+(n-1) = (n-1)*(n-2)/2&lt;/code&gt;. So the number of elements to traverse is quadratic. For 64 elements it&amp;#8217;s &lt;code&gt;62*63/2 = 1953&lt;/code&gt; traversals. For &lt;code&gt;2^16 = 65536&lt;/code&gt; it&amp;#8217;s &lt;code&gt;65534*65535/2=2147385345&lt;/code&gt;. As you see, the numbers grow fast. And with the number of iteration grows the execution time.&lt;/p&gt;

&lt;h2 id='hashtable_collisions_as_dos_attack'&gt;Hashtable collisions as DOS attack&lt;/h2&gt;

&lt;p&gt;At this point you may wonder what the above is actually useful for. For the casual user: Not useful at all. But the &amp;#8220;bad guys&amp;#8221; can easily exploit behavior like the above to perform a DOS (Denial of Service) attack on a server. Remember that &lt;code&gt;$_GET&lt;/code&gt; and &lt;code&gt;$_POST&lt;/code&gt; and &lt;code&gt;$_REQUEST&lt;/code&gt; are just normal arrays and suffer from the same problems. So by sending a specially crafted POST request you can easily take a server down.&lt;/p&gt;

&lt;p&gt;PHP is not the only language vulnerable to this. Actually pretty much all other languages used for creating websites have similar problems, as was &lt;a href='http://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html'&gt;presented at the 28C3 conference&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But there is hope! PHP already &lt;a href='http://svn.php.net/viewvc?view=revision&amp;amp;revision=321038'&gt;landed a change&lt;/a&gt; (which will ship with PHP 5.3.9) which will add a &lt;code&gt;max_input_vars&lt;/code&gt; ini setting which defaults to &lt;code&gt;1000&lt;/code&gt;. This setting determines the maximum number of POST/GET variables that are accepted, so now only a maximum of 1000 collisions can be created. If you run the above script with &lt;code&gt;2^10 = 1024&lt;/code&gt; elements you will get runtimes in the order of 0.003 seconds, which obviously is far less critical than 30 seconds. (Note though that above I am demonstrating an integer key collision. You can also collide string keys, in which case the traversal will be a good bit slower.)&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Don't be STUPID: GRASP SOLID!</title>
      <link>http://nikic.github.com/2011/12/27/Dont-be-STUPID-GRASP-SOLID.html</link>
      <pubDate>Tue, 27 Dec 2011 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2011/12/27/Dont-be-STUPID-GRASP-SOLID</guid>
      <description>&lt;p&gt;Ever heard of &lt;a href='http://en.wikipedia.org/wiki/SOLID_%28object-oriented_design%29'&gt;SOLID&lt;/a&gt; code? Probably: It is a term describing a collection of design principles for &amp;#8220;good code&amp;#8221; that was coined by Robert C. Martin (aka &lt;a href='http://cleancoder.posterous.com/'&gt;&amp;#8220;uncle bob&amp;#8221;&lt;/a&gt;), our beloved evangelist of clean code.&lt;/p&gt;

&lt;p&gt;Programming is full of acronyms like this. Other examples are &lt;a href='http://en.wikipedia.org/wiki/Don%27t_repeat_yourself'&gt;DRY&lt;/a&gt; (Don&amp;#8217;t Repeat Yourself!) and &lt;a href='http://en.wikipedia.org/wiki/KISS_principle'&gt;KISS&lt;/a&gt; (Keep It Simple, Stupid!). But there obviously are many, many more&amp;#8230;&lt;/p&gt;

&lt;p&gt;So, why not approach the problem from the other side for once? Looking at what makes up &lt;strong&gt;bad&lt;/strong&gt; code.&lt;/p&gt;

&lt;h2 id='sorry_but_your_code_is_stupid'&gt;Sorry, but your code is STUPID!&lt;/h2&gt;

&lt;p&gt;Nobody likes to hear that their code is stupid. It is offending. Don&amp;#8217;t say it. But honestly: Most of the code being written around the globe is an unmaintainable, unreusable mess.&lt;/p&gt;

&lt;p&gt;What characterizes such code? What makes code &lt;strong&gt;STUPID&lt;/strong&gt;?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;em&gt;S&lt;/em&gt;&lt;/strong&gt;ingleton&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;em&gt;T&lt;/em&gt;&lt;/strong&gt;ight coupling&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;em&gt;U&lt;/em&gt;&lt;/strong&gt;ntestability&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;em&gt;P&lt;/em&gt;&lt;/strong&gt;remature Optimization&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;em&gt;I&lt;/em&gt;&lt;/strong&gt;ndescriptive Naming&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;em&gt;D&lt;/em&gt;&lt;/strong&gt;uplication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do you agree with this list? Yes? Great. No? I&amp;#8217;ll explain the individual points in greater detail in the following, so you can better understand why exactly those patterns were chosen.&lt;/p&gt;

&lt;h2 id='singleton'&gt;Singleton&lt;/h2&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;DB&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;static&lt;/span&gt; &lt;span class='nv'&gt;$instance&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;static&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;getInstance&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nb'&gt;isset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;self&lt;/span&gt;&lt;span class='o'&gt;::&lt;/span&gt;&lt;span class='nv'&gt;$instance&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='nx'&gt;self&lt;/span&gt;&lt;span class='o'&gt;::&lt;/span&gt;&lt;span class='nv'&gt;$instance&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;self&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nx'&gt;self&lt;/span&gt;&lt;span class='o'&gt;::&lt;/span&gt;&lt;span class='nv'&gt;$instance&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;final&lt;/span&gt; &lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='cm'&gt;/* something */&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;
    &lt;span class='k'&gt;final&lt;/span&gt; &lt;span class='k'&gt;private&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__clone&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='cm'&gt;/* actual methods here */&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The above is the typical database access implementation you will find in pretty much any PHP tutorial. I actually used something similar to this myself not too long ago.&lt;/p&gt;

&lt;p&gt;Now you wonder: What&amp;#8217;s wrong with that? You can easily access the DB from anywhere using &lt;code&gt;DB::getInstance()&lt;/code&gt; and the code also ensures that you only have one database connection open at a time. What could be bad about that?&lt;/p&gt;

&lt;p&gt;Well, yeah, I thought that too ^^ &amp;#8220;I only need one connection.&amp;#8221; When the application grew larger it turned out that I actually needed a second connection to a different database. And that&amp;#8217;s where the mess began. I changed the singleton to have a &lt;code&gt;-&amp;gt;getSecondInstance()&lt;/code&gt;, thus converting the singleton to a .. erm .. dupleton? Instead I should have realized that the database connection &lt;em&gt;simply isn&amp;#8217;t a singleton&lt;/em&gt; and thus can&amp;#8217;t be sanely implemented as one. The same also applies to any other use of singletons that you will find. The request object surely is a singleton! Ever heard of subrequests? But the logger definitely is! Never wanted to log different things differently?&lt;/p&gt;

&lt;p&gt;And that was only &lt;em&gt;one&lt;/em&gt; issue. Another important issue is that by using &lt;code&gt;DB::getInstance()&lt;/code&gt; in your code you are binding your code to the &lt;code&gt;DB&lt;/code&gt; classname. This means: You can&amp;#8217;t extend the &lt;code&gt;DB&lt;/code&gt; class. At some point I wanted to optionally log query performance data to APC. But the tight coupling to the class name did not allow me to. If my application had used &lt;a href='http://www.martinfowler.com/articles/injection.html'&gt;dependency injection&lt;/a&gt; at that time I could have easily extended &lt;code&gt;DB&lt;/code&gt; and passed the new instance. But the singleton prevented. What I did instead was somethink looking roughly like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='c1'&gt;// original DB class&lt;/span&gt;
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;_DB&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='cm'&gt;/* ... */&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;

&lt;span class='c1'&gt;// extending class&lt;/span&gt;
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;DB&lt;/span&gt; &lt;span class='k'&gt;extends&lt;/span&gt; &lt;span class='nx'&gt;_DB&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='cm'&gt;/* ... */&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;One word: Ugly. One could add some other words like Hacky, Unmaintainable, Crap. Or STUPID.&lt;/p&gt;

&lt;p&gt;One last point to consider: Remember when I said &amp;#8220;You can easily access the DB from anywhere using &lt;code&gt;DB::getInstance()&lt;/code&gt;&amp;#8221;. Well, actually that&amp;#8217;s a bad thing too. Read &amp;#8220;from anywhere&amp;#8221; as &amp;#8220;globally&amp;#8221; and translate to &amp;#8220;A singleton is a global variable with a fancy name.&amp;#8221; When you learned PHP you were probably told that it&amp;#8217;s evil to use the &lt;code&gt;global&lt;/code&gt; keyword. But by using a singleton you are doing just that: creating global state. This creates non-obvious dependencies and thus makes your app hard to reuse and test.&lt;/p&gt;

&lt;h2 id='tight_coupling'&gt;Tight coupling&lt;/h2&gt;

&lt;p&gt;You can actually generalize the Singleton issue to &lt;code&gt;static&lt;/code&gt; methods and properties in general. Whenever you are writing &lt;code&gt;Foo::bar()&lt;/code&gt; in your code you are tightly coupling your code to the &lt;code&gt;Foo&lt;/code&gt; class. This makes extending &lt;code&gt;Foo&lt;/code&gt;s functionality impossible and thus makes your code hard to reuse and hard to test.&lt;/p&gt;

&lt;p&gt;Similarly most other plain uses of class names are code smell too. This also applies to the &lt;code&gt;new&lt;/code&gt; operator:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;House&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;door&lt;/span&gt;   &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Door&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;window&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;Window&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;How would you replace the door or the window in your house now? Simple: You can&amp;#8217;t. As a good developer you will obviously find some dirty hack that &lt;em&gt;does&lt;/em&gt; allow you to somehow replace the door or a window. But why not simply write this instead:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;House&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nx'&gt;Door&lt;/span&gt; &lt;span class='nv'&gt;$door&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nx'&gt;Window&lt;/span&gt; &lt;span class='nv'&gt;$window&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='c1'&gt;// Door, Window are interfaces&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;door&lt;/span&gt;   &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$door&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;window&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$window&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This way one can easily create houses with different doors and windows. The code is easy to extend, easy to reuse and easy to test. What could you want more?&lt;/p&gt;

&lt;p&gt;The above code is actually &amp;#8220;dependency injection&amp;#8221; in a nutshell. Many people associate DI with some Dependency Injection Container (DIC) like Symfony&amp;#8217;s, but the actual concept of DI is so much simpler.&lt;/p&gt;

&lt;h2 id='untestability'&gt;Untestability&lt;/h2&gt;

&lt;p&gt;&lt;a href='http://en.wikipedia.org/wiki/Unit_testing'&gt;Unit testing&lt;/a&gt; is important. If you don&amp;#8217;t test your code, you are bound to ship broken code. But still most people don&amp;#8217;t properly cover their code with tests. Why? Mostly because their code is &lt;em&gt;hard to test&lt;/em&gt;. What makes code hard to test? Mainly the previous point: Tight coupling. Unit testing - it might seem obvious - tests units of code (usually classes). But how can you test individual classes if they are tightly coupled? You probably can somehow, with even more dirty hacks. But people usually don&amp;#8217;t do those efforts and just leave their code untested and broken.&lt;/p&gt;

&lt;p&gt;Whenever you don&amp;#8217;t write unit tests because you &amp;#8220;don&amp;#8217;t have time&amp;#8221; the real cause probably is that your code is bad. If your code is good you can test it in no time. Only with bad code unit testing becomes a burden.&lt;/p&gt;

&lt;h2 id='premature_optimization'&gt;Premature Optimization&lt;/h2&gt;

&lt;p&gt;Here is a code snippet from an old version of a website I wrote:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;isset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$frm&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;title_german&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;][&lt;/span&gt;&lt;span class='nb'&gt;strcspn&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$frm&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;title_german&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;],&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;lt;&amp;gt;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)]))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='c1'&gt;// ...&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Guess what it does!&lt;/p&gt;

&lt;p&gt;How long did it take you to realize that this code does nothing more than check whether the German title contains the &lt;code&gt;&amp;lt;&lt;/code&gt; or &lt;code&gt;&amp;gt;&lt;/code&gt; character? Did you realize it at all?&lt;/p&gt;

&lt;p&gt;Let me explain: If &lt;code&gt;&amp;lt;&lt;/code&gt; and &lt;code&gt;&amp;gt;&lt;/code&gt; are both not contained, &lt;a href='http://php.net/strcspn'&gt;&lt;code&gt;strcspn&lt;/code&gt;&lt;/a&gt; will return the length of the string. So the code will basically be &lt;code&gt;isset($str[strlen($str)])&lt;/code&gt;. As the maximum offset set in a string is the length minus one, this will return &lt;code&gt;false&lt;/code&gt;. If one of the characters is contained the function will return a number smaller than the length of the string and thus the whole expression will be &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So why did I write this unintelligible piece of code? Why didn&amp;#8217;t I write this instead:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;strlen&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$frm&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;title_german&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;])&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='nb'&gt;strcspn&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$frm&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;title_german&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;],&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;lt;&amp;gt;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;)))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='c1'&gt;// ...&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Because some days ago I read that &lt;code&gt;isset&lt;/code&gt; is so much faster than &lt;code&gt;strlen&lt;/code&gt;&amp;#8230; But that code still is not particularly intelligible, because you need to know the exact semantics of the &lt;code&gt;strcspn&lt;/code&gt; function (which probably most PHP programmers do not). So why not just write this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;(&amp;lt;|&amp;gt;)&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$frm&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;title_german&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;]))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='c1'&gt;// ...&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Because I read that regex is slow&amp;#8230; (Which by the way is mostly a lie: regular expressions are much faster and much more powerful than you think.)&lt;/p&gt;

&lt;p&gt;So, what did I gain from these &amp;#8220;optimizations&amp;#8221;, apart from unreadable code? Nothing. Even now, when this site attracts approximately fourty million pageviews every month (at the time I wrote this code it was faaar less), even now this micro-optimization won&amp;#8217;t make &lt;em&gt;any&lt;/em&gt; difference. Simply because it is not the bottleneck of the application. The actual bottleneck is a tripple &lt;code&gt;JOIN&lt;/code&gt; in the most-frequently accessed controller (the bottleneck of your application probably is similar).&lt;/p&gt;

&lt;p&gt;You will find many more micro-optimization tips like that on the internet. Like &amp;#8220;use single quotes, they are faster&amp;#8221;. Don&amp;#8217;t listen. Most of the advice is plain wrong and even if it isn&amp;#8217;t, it won&amp;#8217;t make your code measurably faster, it&amp;#8217;ll only waste your time.&lt;/p&gt;

&lt;h2 id='indescriptive_naming'&gt;Indescriptive Naming&lt;/h2&gt;

&lt;p&gt;By the way, do you know what the &lt;a href='http://php.net/strpbrk'&gt;&lt;code&gt;strpbrk&lt;/code&gt;&lt;/a&gt; PHP function does? No? You didn&amp;#8217;t even know it existed? I&amp;#8217;m not surprised. Nobody who wants to search a string for a list of characters looks for a function named &lt;code&gt;strpbrk&lt;/code&gt;. Where did that name come from anyways? This function was inherited from C and its name stands for &amp;#8220;string pointer break&amp;#8221;. Yeah, really nice, especially in a language that doesn&amp;#8217;t actually have pointers (I mean PHP, not C).&lt;/p&gt;

&lt;p&gt;Oh, and reading the code snippet in the above section, did you know what the &lt;a href='http://php.net/strcspn'&gt;&lt;code&gt;strcspn&lt;/code&gt;&lt;/a&gt; function does off the top of your head? No? Again, I&amp;#8217;m not surprised. It&amp;#8217;s short for &amp;#8220;string complement span&amp;#8221;, just in case you didn&amp;#8217;t know.&lt;/p&gt;

&lt;p&gt;The lesson from this: Please, name your classes, methods, variables properly, so that people actually know what you mean. I&amp;#8217;m not arguing about variables like &lt;code&gt;$i&lt;/code&gt;, those are short, but still self-explanatory. The problem is functions like the ones named above. Functions like &lt;code&gt;strpbrk&lt;/code&gt; or variables like &lt;code&gt;$yysstk&lt;/code&gt; may be obvious the author, but also &lt;em&gt;only&lt;/em&gt; to the author.&lt;/p&gt;

&lt;h2 id='duplication'&gt;Duplication&lt;/h2&gt;

&lt;p&gt;I think everybody agrees that code that is particularly short, concise and pertinent is considered especially elegant (oh, and I &lt;em&gt;don&amp;#8217;t&lt;/em&gt; mean the Perl/Ruby style &amp;#8220;short&amp;#8221;). On the other hand most people would consider long and redundant code rather ugly. This is what the aforementioned DRY (Don&amp;#8217;t Repeat Yourself!) and KISS (Keep It Simple, Stupid!) design principle want to teach you.&lt;/p&gt;

&lt;p&gt;So where does code duplication come from? Programmers are lazy animals, so it would be only natural to type as little code as possible. Still duplication prevails.&lt;/p&gt;

&lt;p&gt;I think the most common reason for duplication is following the second STUPID principle: Tight Coupling. If your code is tightly coupled, you just can&amp;#8217;t reuse it. And here comes your duplication.&lt;/p&gt;

&lt;h2 id='dont_be_stupid_grasp_solid'&gt;Don&amp;#8217;t be STUPID: GRASP SOLID!&lt;/h2&gt;

&lt;p&gt;So what are the alternatives to writing STUPID code? Simple, &lt;a href='http://en.wikipedia.org/wiki/SOLID_%28object-oriented_design%29'&gt;SOLID&lt;/a&gt; and &lt;a href='http://en.wikipedia.org/wiki/GRASP_%28object-oriented_design%29'&gt;GRASP&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;SOLID deciphers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;em&gt;S&lt;/em&gt;&lt;/strong&gt;ingle responsibility principle&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;em&gt;O&lt;/em&gt;&lt;/strong&gt;pen/closed principle&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;em&gt;L&lt;/em&gt;&lt;/strong&gt;iskov substitution principle&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;em&gt;I&lt;/em&gt;&lt;/strong&gt;nterface segregation principle&lt;/li&gt;

&lt;li&gt;&lt;strong&gt;&lt;em&gt;D&lt;/em&gt;&lt;/strong&gt;ependency inversion principle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GRASP stands for General Responsibility Assignment Software Principles, which are the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Information Expert&lt;/li&gt;

&lt;li&gt;Creator&lt;/li&gt;

&lt;li&gt;Controller&lt;/li&gt;

&lt;li&gt;Low Coupling&lt;/li&gt;

&lt;li&gt;High Cohesion&lt;/li&gt;

&lt;li&gt;Polymorphism&lt;/li&gt;

&lt;li&gt;Pure Fabrication&lt;/li&gt;

&lt;li&gt;Indirection&lt;/li&gt;

&lt;li&gt;Protected Variations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy coding and a happy new year to you!&lt;/p&gt;

&lt;p&gt;PS: If you wonder where STUPID comes from: The idea &lt;a href='http://chat.stackoverflow.com/transcript/11?m=2182205#2182205'&gt;came up&lt;/a&gt; on the &lt;a href='http://chat.stackoverflow.com/rooms/11/php'&gt;PHP chatroom on StackOverlow&lt;/a&gt; and the individual components were formed by edorian (&lt;a href='http://chat.stackoverflow.com/rooms/11/php'&gt;StackOverflow&lt;/a&gt;, &lt;a href='https://github.com/edorian'&gt;Github&lt;/a&gt;, &lt;a href='http://twitter.com/#!/__edorian'&gt;Twitter&lt;/a&gt;), &lt;a href='http://stackoverflow.com/users/336242/james-butler'&gt;James&lt;/a&gt; and me. Gordon (&lt;a href='http://stackoverflow.com/users/208809/gordon'&gt;StackOverflow&lt;/a&gt;, &lt;a href='https://github.com/gooh'&gt;Github&lt;/a&gt;, &lt;a href='http://twitter.com/#!/go_oh'&gt;Twitter&lt;/a&gt;) &lt;a href='https://twitter.com/#!/go_oh/status/150187219020300288'&gt;came up&lt;/a&gt; with the title of this blog post.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>How big are PHP arrays (and values) really? (Hint: BIG!)</title>
      <link>http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html</link>
      <pubDate>Mon, 12 Dec 2011 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG</guid>
      <description>&lt;p&gt;Upfront I want to thank &lt;a href='http://schlueters.de/blog/'&gt;Johannes&lt;/a&gt; and &lt;a href='http://www.tyrael.hu/'&gt;Tyrael&lt;/a&gt; for their help in finding some of the more hidden memory usage.&lt;/p&gt;

&lt;p&gt;In this post I want to investigate the memory usage of PHP arrays (and values in general) using the following script as an example, which creates 100000 unique integer array elements and measures the resulting memory usage:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$startMemory&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;memory_get_usage&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;range&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;100000&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='nb'&gt;memory_get_usage&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='nv'&gt;$startMemory&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; bytes&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;How much would you expect it to be? Simple, one integer is 8 bytes (on a 64 bit unix machine and using the &lt;code&gt;long&lt;/code&gt; type) and you got 100000 integers, so you obviously will need 800000 bytes. That&amp;#8217;s something like 0.76 MBs.&lt;/p&gt;

&lt;p&gt;Now try and run the above code. &lt;a href='http://codepad.viper-7.com/pjB3Wm'&gt;You can do it online if you want.&lt;/a&gt; This gives me &lt;code&gt;14649024 bytes&lt;/code&gt;. Yes, you heard right, that&amp;#8217;s 13.97 MB - eightteen times more than we estimated.&lt;/p&gt;

&lt;p&gt;So, where does that extra factor of 18 come from?&lt;/p&gt;

&lt;h2 id='summary'&gt;Summary&lt;/h2&gt;

&lt;p&gt;For those who don&amp;#8217;t want to know the full story, here is a quick summary of the memory usage of the different components involved:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;                             |  64 bit   | 32 bit
---------------------------------------------------
zval                         |  24 bytes | 16 bytes
+ cyclic GC info             |   8 bytes |  4 bytes
+ allocation header          |  16 bytes |  8 bytes
===================================================
zval (value) total           |  48 bytes | 28 bytes
===================================================
bucket                       |  72 bytes | 36 bytes
+ allocation header          |  16 bytes |  8 bytes
+ pointer                    |   8 bytes |  4 bytes
===================================================
bucket (array element) total |  96 bytes | 48 bytes
===================================================
total total                  | 144 bytes | 76 bytes&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The above numbers will vary depending on your operating system, your compiler and your compile options. E.g. if you compile PHP with debug or with thread-safety, you will get different numbers. But I think that the sizes given above are what you will see on an average 64-bit production build of PHP 5.3 on Linux.&lt;/p&gt;

&lt;p&gt;If you multiply those 144 bytes by our 100000 elements you get 14400000 bytes, which is 13.73 MB. That&amp;#8217;s pretty close to the real number - the rest is mostly pointers for uninitialized buckets, but I&amp;#8217;ll cover that later.&lt;/p&gt;

&lt;p&gt;Now, if you want to have a more detailed analysis of the values mentioned above, read on :)&lt;/p&gt;

&lt;h2 id='the__union'&gt;The &lt;code&gt;zvalue_value&lt;/code&gt; union&lt;/h2&gt;

&lt;p&gt;First have a look at how PHP stores values. As you know PHP is a weakly typed language, so it needs some way to switch between the various types fast. PHP uses a &lt;a href='http://en.wikipedia.org/wiki/Union_%28computer_science%29'&gt;&lt;code&gt;union&lt;/code&gt;&lt;/a&gt; for this, which is defined as follows in &lt;a href='http://lxr.php.net/opengrok/xref/PHP_5_4/Zend/zend.h#307'&gt;&lt;code&gt;zend.h#307&lt;/code&gt;&lt;/a&gt; (comments mine):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;typedef&lt;/span&gt; &lt;span class='k'&gt;union&lt;/span&gt; &lt;span class='n'&gt;_zvalue_value&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='kt'&gt;long&lt;/span&gt; &lt;span class='n'&gt;lval&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;                &lt;span class='c1'&gt;// For integers and booleans&lt;/span&gt;
    &lt;span class='kt'&gt;double&lt;/span&gt; &lt;span class='n'&gt;dval&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;              &lt;span class='c1'&gt;// For floats (doubles)&lt;/span&gt;
    &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;                  &lt;span class='c1'&gt;// For strings&lt;/span&gt;
        &lt;span class='kt'&gt;char&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;val&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;            &lt;span class='c1'&gt;//     consisting of the string itself&lt;/span&gt;
        &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;len&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;              &lt;span class='c1'&gt;//     and its length&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;str&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='n'&gt;HashTable&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;ht&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;            &lt;span class='c1'&gt;// For arrays (hash tables)&lt;/span&gt;
    &lt;span class='n'&gt;zend_object_value&lt;/span&gt; &lt;span class='n'&gt;obj&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;    &lt;span class='c1'&gt;// For objects&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;zvalue_value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;If you don&amp;#8217;t know C, that isn&amp;#8217;t a problem as the code is pretty straightforward: A &lt;code&gt;union&lt;/code&gt; is a means to make some value accessible as various types. For example if you do a &lt;code&gt;zvalue_value-&amp;gt;lval&lt;/code&gt; you&amp;#8217;ll get the value interpreted as an integer. If you use &lt;code&gt;zvalue_value-&amp;gt;ht&lt;/code&gt; on the other hand the value will be interpreted as a pointer to a hashtable (aka array).&lt;/p&gt;

&lt;p&gt;But let&amp;#8217;s not get too much into this here. Important for us only is that the size of a &lt;code&gt;union&lt;/code&gt; equals the size of its largest component. The largest component here is the string struct (the &lt;code&gt;zend_object_value&lt;/code&gt; struct has the same size as the &lt;code&gt;str&lt;/code&gt; struct, but I&amp;#8217;ll leave that out for simplicity). The string struct stores a pointer (8 bytes) and an integer (4 bytes), which is 12 bytes in total. Due to memory alignment (structs with 12 bytes aren&amp;#8217;t cool because they aren&amp;#8217;t a multiple of 64 bits / 8 bytes) the total size of the struct will be 16 bytes though and that will also be the size of the union as a whole.&lt;/p&gt;

&lt;p&gt;So now we know that we don&amp;#8217;t need 8 bytes for every value, but 16 - due to PHP&amp;#8217;s dynamic typing. Multiplying by 100000 values gives us 1600000 bytes, i.e. 1.53 MB. But the real value is 13.97 MB, so we can&amp;#8217;t be there yet.&lt;/p&gt;

&lt;h2 id='the__struct'&gt;The &lt;code&gt;zval&lt;/code&gt; struct&lt;/h2&gt;

&lt;p&gt;And this is quite logical - the union only stores the value itself, but PHP obviously also needs to store its type and some garbage collection information. The structure holding this information is called a &lt;code&gt;zval&lt;/code&gt; and you may have already have heard of it. For more information on why PHP needs it I&amp;#8217;d recommend to read &lt;a href='http://blog.golemon.com/2007/01/youre-being-lied-to.html'&gt;an article by Sara Golemon&lt;/a&gt;. Anyways, this struct is &lt;a href='http://lxr.php.net/xref/PHP_5_4/Zend/zend.h#318'&gt;defined as follows&lt;/a&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;_zval_struct&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;zvalue_value&lt;/span&gt; &lt;span class='n'&gt;value&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;     &lt;span class='c1'&gt;// The value&lt;/span&gt;
    &lt;span class='n'&gt;zend_uint&lt;/span&gt; &lt;span class='n'&gt;refcount__gc&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// The number of references to this value (for GC)&lt;/span&gt;
    &lt;span class='n'&gt;zend_uchar&lt;/span&gt; &lt;span class='n'&gt;type&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;        &lt;span class='c1'&gt;// The type&lt;/span&gt;
    &lt;span class='n'&gt;zend_uchar&lt;/span&gt; &lt;span class='n'&gt;is_ref__gc&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;  &lt;span class='c1'&gt;// Whether this value is a reference (&amp;amp;)&lt;/span&gt;
&lt;span class='p'&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The size of a struct is determined by the sum of the sizes of its components: The &lt;code&gt;zvalue_value&lt;/code&gt; is 16 bytes (as computed above), the &lt;code&gt;zend_uint&lt;/code&gt; is 4 bytes and the &lt;code&gt;zend_uchar&lt;/code&gt;s are 1 byte each. That&amp;#8217;s a total of 22 bytes. Again due to memory alignment the real size will be 24 bytes though.&lt;/p&gt;

&lt;p&gt;So if we store 100000 elements á 24 bytes that would be 2400000 in total, which is 2.29 MB. The gap is closing, but the real value is still more than six times larger.&lt;/p&gt;

&lt;h2 id='the_cycles_collector_as_of_php_53'&gt;The cycles collector (as of PHP 5.3)&lt;/h2&gt;

&lt;p&gt;PHP 5.3 introduced a new &lt;a href='http://php.net/manual/en/features.gc.collecting-cycles.php'&gt;garbage collector for cyclic references&lt;/a&gt;. For this to work PHP has to store some additional data. I don&amp;#8217;t want to explain how the algorithm works here, you can read that up on the linked page from the manual. Important for our size calculations is that PHP will wrap every &lt;code&gt;zval&lt;/code&gt; into a &lt;a href='http://lxr.php.net/opengrok/xref/PHP_5_4/Zend/zend_gc.h#zval_gc_info'&gt;&lt;code&gt;zval_gc_info&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;typedef&lt;/span&gt; &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;_zval_gc_info&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;zval&lt;/span&gt; &lt;span class='n'&gt;z&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;union&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='n'&gt;gc_root_buffer&lt;/span&gt;       &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;buffered&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;_zval_gc_info&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;next&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;u&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;zval_gc_info&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As you can see Zend only adds a union on top of it, which consists of two pointers. As you hopefully remember the size of a union is the size of its largest component: Both union components are pointers, thus both have a size of 8 bytes. So the size of the union is 8 bytes too.&lt;/p&gt;

&lt;p&gt;If we add that on top of the 24 bytes we already have we get 32 bytes. Multiply that by the 100000 elements and we get a memory usage of 3.05 MB.&lt;/p&gt;

&lt;h2 id='the_zend_mm_allocator'&gt;The Zend MM allocator&lt;/h2&gt;

&lt;p&gt;C unlike PHP does not manage memory for you. You need to keep track of your allocations yourself. For this purpose PHP uses a custom memory manager that is optimized specifially for its needs: &lt;a href='http://php.net/manual/en/internals2.memory.php'&gt;The Zend Memory Manager&lt;/a&gt;. The Zend MM is based on &lt;a href='http://g.oswego.edu/dl/html/malloc.html'&gt;Doug Lea&amp;#8217;s malloc&lt;/a&gt; and adds some PHP specific optimizations and features (like memory limit, cleaning up after each request and stuff like that).&lt;/p&gt;

&lt;p&gt;What is important for us here is that the MM adds an allocation header to every allocation done through it. It is &lt;a href='http://lxr.php.net/xref/PHP_5_4/Zend/zend_alloc.c#336'&gt;defined as follows&lt;/a&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;typedef&lt;/span&gt; &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;_zend_mm_block&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;zend_mm_block_info&lt;/span&gt; &lt;span class='n'&gt;info&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='cp'&gt;#if ZEND_DEBUG&lt;/span&gt;
    &lt;span class='kt'&gt;unsigned&lt;/span&gt; &lt;span class='kt'&gt;int&lt;/span&gt; &lt;span class='n'&gt;magic&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='cp'&gt;# ifdef ZTS&lt;/span&gt;
    &lt;span class='n'&gt;THREAD_T&lt;/span&gt; &lt;span class='n'&gt;thread_id&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='cp'&gt;# endif&lt;/span&gt;
    &lt;span class='n'&gt;zend_mm_debug_info&lt;/span&gt; &lt;span class='n'&gt;debug&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='cp'&gt;#elif ZEND_MM_HEAP_PROTECTION&lt;/span&gt;
    &lt;span class='n'&gt;zend_mm_debug_info&lt;/span&gt; &lt;span class='n'&gt;debug&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='cp'&gt;#endif&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;zend_mm_block&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

&lt;span class='k'&gt;typedef&lt;/span&gt; &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;_zend_mm_block_info&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='cp'&gt;#if ZEND_MM_COOKIES&lt;/span&gt;
    &lt;span class='kt'&gt;size_t&lt;/span&gt; &lt;span class='n'&gt;_cookie&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='cp'&gt;#endif&lt;/span&gt;
    &lt;span class='kt'&gt;size_t&lt;/span&gt; &lt;span class='n'&gt;_size&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// size of the allocation&lt;/span&gt;
    &lt;span class='kt'&gt;size_t&lt;/span&gt; &lt;span class='n'&gt;_prev&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// previous block (not sure what exactly this is)&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;zend_mm_block_info&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As you can see the definitions are cluttered with lots of compile option checks. So if one of those options is set the allocation header will be bigger and will be largest if you build PHP with heap protection, multi-threading, debug and MM cookies.&lt;/p&gt;

&lt;p&gt;For this example though we will assume that all those options are disabled. In that case the only thing left are the two &lt;code&gt;size_t&lt;/code&gt;s &lt;code&gt;_size&lt;/code&gt; and &lt;code&gt;_prev&lt;/code&gt;. A &lt;code&gt;size_t&lt;/code&gt; has 8 bytes (on 64 bit), so the allocation header has a total size of 16 bytes - and that header is added on &lt;em&gt;every&lt;/em&gt; allocation.&lt;/p&gt;

&lt;p&gt;So now we need to adjust our &lt;code&gt;zval&lt;/code&gt; size again. In reality it isn&amp;#8217;t 32 bytes, but it&amp;#8217;s 48, due to that allocation header. Multiplied by our 100000 elements that&amp;#8217;s 4.58 MB. The real value is 13.97 MB, so we already got approximately a third covered.&lt;/p&gt;

&lt;h2 id='buckets'&gt;Buckets&lt;/h2&gt;

&lt;p&gt;Until now we have only considered single values. But array structures in PHP also take up lots of space: &amp;#8220;Array&amp;#8221; actually is a badly chosen term here. PHP arrays are hash tables / dictionaries in reality. So how do hash tables work? Basically for every key a hash is generated and that hash is used as an offset into a &amp;#8220;real&amp;#8221; C array. As the hashes can clash, all elements that have the same hash are stored in a linked list. When accessing an element PHP first computes the hash, looks for the right bucket and the traverses the link list, comparing the exact key, element by element. A bucket is defined as follows (see &lt;a href='http://lxr.php.net/opengrok/xref/PHP_5_4/Zend/zend_hash.h#54'&gt;&lt;code&gt;zend_hash.h#54&lt;/code&gt;&lt;/a&gt;):&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='k'&gt;typedef&lt;/span&gt; &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;bucket&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='n'&gt;ulong&lt;/span&gt; &lt;span class='n'&gt;h&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;                  &lt;span class='c1'&gt;// The hash (or for int keys the key)&lt;/span&gt;
    &lt;span class='n'&gt;uint&lt;/span&gt; &lt;span class='n'&gt;nKeyLength&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;          &lt;span class='c1'&gt;// The length of the key (for string keys)&lt;/span&gt;
    &lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pData&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;              &lt;span class='c1'&gt;// The actual data&lt;/span&gt;
    &lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pDataPtr&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;           &lt;span class='c1'&gt;// ??? What&amp;#39;s this ???&lt;/span&gt;
    &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;bucket&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pListNext&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// PHP arrays are ordered. This gives the next element in that order&lt;/span&gt;
    &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;bucket&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pListLast&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// and this gives the previous element&lt;/span&gt;
    &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;bucket&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pNext&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;     &lt;span class='c1'&gt;// The next element in this (doubly) linked list&lt;/span&gt;
    &lt;span class='k'&gt;struct&lt;/span&gt; &lt;span class='n'&gt;bucket&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;pLast&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;     &lt;span class='c1'&gt;// The previous element in this (doubly) linked list&lt;/span&gt;
    &lt;span class='k'&gt;const&lt;/span&gt; &lt;span class='kt'&gt;char&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;arKey&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;        &lt;span class='c1'&gt;// The key (for string keys)&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;Bucket&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As you can see one needs to store loads of data to get the kind of abstract array data structure that PHP uses (PHP arrays are arrays, dictionaries and linked lists at the same time, that sure needs much info). The sizes of the individual components are 8 bytes for the unsigned long, 4 bytes for the unsigned int and 7 times 8 bytes for the pointers. That&amp;#8217;s a total of 68. Add alignment and you get 72 bytes.&lt;/p&gt;

&lt;p&gt;Buckets like zvals need to be allocated on the head, so we need to add the 16 bytes for the allocation header again, giving us 88 bytes. Also we need to store pointers to those buckets in the &amp;#8220;real&amp;#8221; C array (&lt;code&gt;Bucket **arBuckets;&lt;/code&gt;) I mentioned above, which adds another 8 bytes per element. So all in all every bucket needs 96 bytes of storage.&lt;/p&gt;

&lt;p&gt;So if we need a bucket for every value, that&amp;#8217;s 96 bytes for the bucket and 48 bytes for the zval, which is 144 bytes in total. For 100000 elements that&amp;#8217;s 14400000 bytes aka 13.73 MB.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mystery solved.&lt;/em&gt;&lt;/p&gt;

&lt;h2 id='wait_theres_another_024_mb_left'&gt;Wait, there&amp;#8217;s another 0.24 MB left!&lt;/h2&gt;

&lt;p&gt;Those last 0.24 MB are due to uninitialized buckets: The size of the real C array storing the buckets should ideally be approximately the same as the number of array elements stored. This way you have the least number of collisions (unless you want to waste lots of memory.) But PHP obviously can&amp;#8217;t reallocate the whole array every time an element is added - that would be &lt;em&gt;reeeally&lt;/em&gt; slow. Instead PHP always doubles the size of the internal bucket array if it hits the limit. So the size of the array is always a power of two.&lt;/p&gt;

&lt;p&gt;In our case it is &lt;code&gt;2^17 = 131072&lt;/code&gt;. But we need only 100000 of those buckets, so we are leaving 31072 buckets unused. Those buckets will not be allocated (so we don&amp;#8217;t need to spend the full 96 bytes), but the memory for the bucket pointer (the one stored in the internal bucket array) still needs to be allocated. So we additionally use 8 bytes (a pointer) * 31072 elements. This is 248576 bytes or 0.23 MB. That matches the missing memory. (Sure, there are still a few bytes missing, but I don&amp;#8217;t really want to cover there. They are things like the hash table structure itself, variables, etc.)&lt;/p&gt;

&lt;p&gt;Mystery &lt;em&gt;really&lt;/em&gt; solved.&lt;/p&gt;

&lt;h2 id='what_does_this_tell_us'&gt;What does this tell us?&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;PHP ain&amp;#8217;t C&lt;/strong&gt;. That&amp;#8217;s all this should tell us. You can&amp;#8217;t expect that a super dynamic language like PHP has the same highly efficient memory usage that C has. You just can&amp;#8217;t.&lt;/p&gt;

&lt;p&gt;But if you do want to save memory you could consider using an &lt;a href='http://php.net/SplFixedArray'&gt;&lt;code&gt;SplFixedArray&lt;/code&gt;&lt;/a&gt; for large, static arrays.&lt;/p&gt;

&lt;p&gt;Have a look a this modified script:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$startMemory&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;memory_get_usage&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;SplFixedArray&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;100000&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;100000&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='k'&gt;echo&lt;/span&gt; &lt;span class='nb'&gt;memory_get_usage&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='nv'&gt;$startMemory&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39; bytes&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;It basically does the same thing, &lt;a href='http://codepad.viper-7.com/mznxIH'&gt;but if you run it&lt;/a&gt;, you&amp;#8217;ll notice that it uses &amp;#8220;only&amp;#8221; 5600640 bytes. That&amp;#8217;s 56 bytes per element and thus much less than the 144 bytes per element a normal array uses. This is because a fixed array doesn&amp;#8217;t need the bucket structure: So it only requires one zval (48 bytes) and one pointer (8 bytes) for each element, giving us the observed 56 bytes.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>PCRE and newlines</title>
      <link>http://nikic.github.com/2011/12/10/PCRE-and-newlines.html</link>
      <pubDate>Sat, 10 Dec 2011 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2011/12/10/PCRE-and-newlines</guid>
      <description>&lt;p&gt;PCRE (this is what you use when you do a &lt;code&gt;preg_*&lt;/code&gt; call in PHP) has a plethora of newline related escape sequences and options, but only few know about those. I want to shed light on some of those options in this post.&lt;/p&gt;

&lt;h2 id='__and_'&gt;&lt;code&gt;\r\n&lt;/code&gt;, &lt;code&gt;\n&lt;/code&gt; and &lt;code&gt;\r&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;Okay, everyone knows about those escape sequences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;\r\n&lt;/code&gt; is the CRLF newline Windows uses&lt;/li&gt;

&lt;li&gt;&lt;code&gt;\n&lt;/code&gt; is the LF newline Unix uses&lt;/li&gt;

&lt;li&gt;&lt;code&gt;\r&lt;/code&gt; is the CR newline that old versions of Mac OS used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A common combination is &lt;code&gt;(?&amp;gt;\r\n|\n|\r)&lt;/code&gt;, which matches any CRLF type newline.&lt;/p&gt;

&lt;h2 id='meet_'&gt;Meet &lt;code&gt;\R&lt;/code&gt;!&lt;/h2&gt;

&lt;p&gt;But apart from &lt;code&gt;\r&lt;/code&gt; and &lt;code&gt;\n&lt;/code&gt; PCRE also has another character group matching newlines: &lt;code&gt;\R&lt;/code&gt;. By default &lt;code&gt;\R&lt;/code&gt; matches Unicode newlines sequences, but it can be configured using several options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/(*BSR_ANYCRLF)\R/&lt;/code&gt;: Matches any CRLF type newline sequence and thus is equivalent to the &lt;code&gt;/(?&amp;gt;\r\n|\n|\r)/&lt;/code&gt; regex&lt;/li&gt;

&lt;li&gt;&lt;code&gt;/\R/&lt;/code&gt;: Matches any Unicode newline sequence that is in the ASCII range. I.e. it is equivalent to &lt;code&gt;/(?&amp;gt;\r\n|\n|\r|\f|\x0b|\x85)/&lt;/code&gt;. So additionally to the CRLF type newlines it also matches FF formfeeds (&lt;code&gt;\f&lt;/code&gt;), VT vertical tabs (&lt;code&gt;\x0b&lt;/code&gt;) and the NEL next line character (&lt;code&gt;\x85&lt;/code&gt;).&lt;/li&gt;

&lt;li&gt;&lt;code&gt;/\R/u&lt;/code&gt; (&lt;code&gt;u&lt;/code&gt; means UTF-8 mode): Matches any Unicode newline sequence including newline characters outside the ASCII range, which is equivalent to &lt;code&gt;/(?&amp;gt;\r\n|\n|\r|\f|\x0b|\x85|\x{2028}|\x{2029})/&lt;/code&gt;. This means only two additonal characters are added: The LS line separator (&lt;code&gt;\x{2028}&lt;/code&gt;) and the PS paragraph separator (&lt;code&gt;\x{2029}&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that &lt;code&gt;\R&lt;/code&gt; is not special in character classes. So &lt;code&gt;/[^\R]/u&lt;/code&gt; for example will &lt;em&gt;not&lt;/em&gt; match any non-newline character. Instead it will simply match any character which is not &lt;code&gt;R&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id='the_magic__dot'&gt;The magic &lt;code&gt;.&lt;/code&gt; dot&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;.&lt;/code&gt; is usally said to be &amp;#8220;any character&amp;#8221;. In a default configuration this is not quite true. &lt;code&gt;.&lt;/code&gt; is &amp;#8220;any character &lt;em&gt;which is not a newline&lt;/em&gt;&amp;#8221;. What exactly a &amp;#8220;newline&amp;#8221; means here can also be configured similarly to &lt;code&gt;\R&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/(*CRLF)./&lt;/code&gt;: &lt;code&gt;.&lt;/code&gt; matches everything apart from CRLF (&lt;code&gt;\r\n&lt;/code&gt;) characters&lt;/li&gt;

&lt;li&gt;&lt;code&gt;/(*LF)./&lt;/code&gt;: &lt;code&gt;.&lt;/code&gt; matches everything apart from LF (&lt;code&gt;\n&lt;/code&gt;) characters&lt;/li&gt;

&lt;li&gt;&lt;code&gt;/(*CR)./&lt;/code&gt;: &lt;code&gt;.&lt;/code&gt; matches everything apart from CR (&lt;code&gt;\r&lt;/code&gt;) characters&lt;/li&gt;

&lt;li&gt;&lt;code&gt;/(*ANYCRLF)./&lt;/code&gt;: &lt;code&gt;.&lt;/code&gt; matches everything apart from CRLF style newlines, i.e. everything apart from CRLF, LF or CR.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;/(*ANY)./&lt;/code&gt;: &lt;code&gt;.&lt;/code&gt; matches everything apart from Unicode newline sequences (ASCII)&lt;/li&gt;

&lt;li&gt;&lt;code&gt;/(*ANY)./u&lt;/code&gt;: &lt;code&gt;.&lt;/code&gt; matches everything apart from Unicode newline sequences (any)&lt;/li&gt;

&lt;li&gt;&lt;code&gt;/./&lt;/code&gt;: What this matches depends on how PCRE was compiled, but the default is to behave as if &lt;code&gt;(*LF)&lt;/code&gt; was specified. I.e. unless you compiled PCRE with some other newline configuration &lt;code&gt;.&lt;/code&gt; will match everything apart from &lt;code&gt;\n&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To sum up, have a look at the following code:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;/^a.+b$/&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;        &lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='se'&gt;\r\n&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;  &lt;span class='c1'&gt;// 0 (Newline \n is     contained by \r\n)&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;/^a.+b$/&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;        &lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;    &lt;span class='c1'&gt;// 0 (Newline \n is     contained by \n)&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;/^a.+b$/&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;        &lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='se'&gt;\r&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;    &lt;span class='c1'&gt;// 1 (Newline \n is not contained by \r)&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;/(*CR)^a.+b$/&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;   &lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;    &lt;span class='c1'&gt;// 1 (Newline \r is not contained by \n)&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;/(*CR)^a.+b$/&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;   &lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='se'&gt;\r&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;    &lt;span class='c1'&gt;// 0 (Newline \r is     contained by \r)&lt;/span&gt;
&lt;span class='nv'&gt;$LS&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\xE2\x80\xA8&lt;/span&gt;&lt;span class='s2'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// Line Separator in UTF-8&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;/(*ANY)^a.+b$/u&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='si'&gt;{&lt;/span&gt;&lt;span class='nv'&gt;$LS&lt;/span&gt;&lt;span class='si'&gt;}&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt; &lt;span class='c1'&gt;// 0 (Newline LS is     contained by LS)&lt;/span&gt;
&lt;span class='nb'&gt;var_dump&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;/(*ANY)^a.+b$/&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;  &lt;span class='s2'&gt;&amp;quot;a&lt;/span&gt;&lt;span class='si'&gt;{&lt;/span&gt;&lt;span class='nv'&gt;$LS&lt;/span&gt;&lt;span class='si'&gt;}&lt;/span&gt;&lt;span class='s2'&gt;b&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt; &lt;span class='c1'&gt;// 1 (u modifier was not specified, so LS isn&amp;#39;t a newline anymore)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;h2 id='_and_'&gt;&lt;code&gt;PCRE_DOTALL&lt;/code&gt; and &lt;code&gt;\N&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;PCRE also provides an option which instructs &lt;code&gt;.&lt;/code&gt; to &lt;em&gt;really&lt;/em&gt; match any character. This option is called &lt;code&gt;PCRE_DOTALL&lt;/code&gt; and can be specified using the &lt;code&gt;s&lt;/code&gt; modifier in PHP. So &lt;code&gt;/./s&lt;/code&gt; will match absolutely any character, including newlines.&lt;/p&gt;

&lt;p&gt;But even in &lt;code&gt;DOTALL&lt;/code&gt; mode you can get the behavior of the &amp;#8220;normal&amp;#8221; dot: The &lt;code&gt;\N&lt;/code&gt; escape sequence behaves the same as &lt;code&gt;.&lt;/code&gt;, but is not affected by the &lt;code&gt;s&lt;/code&gt; modifier. So &lt;code&gt;\N&lt;/code&gt; will always match any character which is not a newline (where &amp;#8220;newline&amp;#8221; is again defined by the above options).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\N&lt;/code&gt;, just like &lt;code&gt;\R&lt;/code&gt;, looses it&amp;#8217;s special meaning within character groups.&lt;/p&gt;

&lt;h2 id='whitespace_character_groups'&gt;Whitespace character groups&lt;/h2&gt;

&lt;p&gt;As a small addendum I would also like to point out what the different whitespace character groups contain, as this isn&amp;#8217;t quite clear to most people:&lt;/p&gt;

&lt;p&gt;The commonly used &lt;code&gt;\s&lt;/code&gt; group matches LF (&lt;code&gt;\n&lt;/code&gt;), CR (&lt;code&gt;\r&lt;/code&gt;), HT (tab), FF (form feed) and space characters. So it does &lt;em&gt;not&lt;/em&gt; contain the VT vertical tab character. The POSIX character group &lt;code&gt;[:space:]&lt;/code&gt; on the other hand includes the vertical tab, too.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;\pZ&lt;/code&gt; Unicode character property for separators does &lt;em&gt;not&lt;/em&gt; contain the &amp;#8220;classic&amp;#8221; newlines. But there are two special, PCRE specific, character properties for that purpose: &lt;code&gt;p{Xsp}&lt;/code&gt; contains &lt;code&gt;\pZ&lt;/code&gt; as well as LF, CR and FF. &lt;code&gt;\p{Xps}&lt;/code&gt; additionally contains VT.&lt;/p&gt;

&lt;p&gt;Those two Unicode properties are also internally used in &lt;code&gt;UCP&lt;/code&gt; mode (&lt;code&gt;UCP&lt;/code&gt; mode makes the normal &lt;code&gt;\s&lt;/code&gt; style character groups behave like the Unicode character properties). I.e. &lt;code&gt;(*UCP)\s&lt;/code&gt; is equivalent to &lt;code&gt;\p{Xsp}&lt;/code&gt; and &lt;code&gt;(*UCP)[:space:]&lt;/code&gt; is equivalent to &lt;code&gt;\p{Xps}&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There are also two more character groups for whitespace matching, namely &lt;code&gt;\v&lt;/code&gt; for vertical whitespace and &lt;code&gt;\h&lt;/code&gt; for horizontal. Contrary to the other &lt;code&gt;\s&lt;/code&gt; style character groups these two match non-ASCII characters in UTF-8 mode even when not in &lt;code&gt;UCP&lt;/code&gt; mode. The reason for this is that they were added only quite late, whereas the others existed pretty much from the beginning. &lt;code&gt;\v&lt;/code&gt; matches CR, LF, VT, FF, NEL, LS, PS. &lt;code&gt;\h&lt;/code&gt; matches HT, space and 17 other horizontal spaces which you wouldn&amp;#8217;t normally know. Those contain things like &amp;#8220;Six-per-em space&amp;#8221; or &amp;#8220;Medium mathematical space&amp;#8221;.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Manually installing PEAR on Windows</title>
      <link>http://nikic.github.com/2011/12/03/Manually-installing-PEAR-on-Windows.html</link>
      <pubDate>Sat, 03 Dec 2011 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2011/12/03/Manually-installing-PEAR-on-Windows</guid>
      <description>&lt;p&gt;So, you tried installing &lt;a href='http://pear.php.net/'&gt;PEAR&lt;/a&gt; on Windows using their &lt;a href='http://pear.php.net/manual/en/installation.getting.php'&gt;&lt;code&gt;go-pear.phar&lt;/code&gt; installer&lt;/a&gt; and it - for some unknown reason - didn&amp;#8217;t work? Then you still have one options left: Installing it by hand. This is slightly more cumbersome than just running an installer, but at least it works.&lt;/p&gt;

&lt;h2 id='download_pear'&gt;Download PEAR&lt;/h2&gt;

&lt;p&gt;First &lt;a href='http://pear.php.net/package/PEAR/download'&gt;download the PEAR package&lt;/a&gt;, extract the &lt;code&gt;.tgz&lt;/code&gt; file (e.g. using &lt;a href='http://www.7-zip.org/'&gt;7-Zip&lt;/a&gt;) so you get a &lt;code&gt;.tar&lt;/code&gt; file. Extract that file again into some folder (e.g. &lt;code&gt;PEAR-files/&lt;/code&gt;). If you enter that folder you should see three entries:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;PEAR-x.y.z/
package.xml
package2.xml&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Go into the &lt;code&gt;PEAR-x.y.z/&lt;/code&gt; folder, where &lt;code&gt;x.y.z&lt;/code&gt; is the PEAR version you downloaded and copy all the files to the location you want to install PEAR in (I&amp;#8217;ll use the &lt;code&gt;D:\htdocs\PEAR&lt;/code&gt; path in this example).&lt;/p&gt;

&lt;h2 id='write_a_pearbat'&gt;Write a pear.bat&lt;/h2&gt;

&lt;p&gt;Unless you happen to be the rare case where the &lt;code&gt;pear.bat&lt;/code&gt; just works you will now need to create a custom &lt;code&gt;pear.bat&lt;/code&gt; file (it&amp;#8217;s located in the &lt;code&gt;script/&lt;/code&gt; directory). Open the file in some text editor (e.g. &lt;a href='http://notepad-plus-plus.org/'&gt;Notepad++&lt;/a&gt;) and insert the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@ECHO OFF

REM The directory we want to put everything in
SET &amp;quot;PREFIX=D:\htdocs\PEAR&amp;quot;

REM The PEAR environment variables (using the %PREFIX% defined above)
SET &amp;quot;PHP_PEAR_INSTALL_DIR=%PREFIX%&amp;quot;
SET &amp;quot;PHP_PEAR_SYSCONF_DIR=%PREFIX%&amp;quot;
SET &amp;quot;PHP_PEAR_EXTENSION_DIR=%PREFIX%\ext&amp;quot;
SET &amp;quot;PHP_PEAR_DOC_DIR=%PREFIX%\docs&amp;quot;
SET &amp;quot;PHP_PEAR_BIN_DIR=%PREFIX%\scripts&amp;quot;
SET &amp;quot;PHP_PEAR_DATA_DIR=%PREFIX%\data&amp;quot;
SET &amp;quot;PHP_PEAR_CFG_DIR=%PREFIX%\cfg&amp;quot;
SET &amp;quot;PHP_PEAR_WWW_DIR=%PREFIX%\www&amp;quot;
SET &amp;quot;PHP_PEAR_TEST_DIR=%PREFIX%\tests&amp;quot;
SET &amp;quot;PHP_PEAR_TEMP_DIR=%PREFIX%\temp&amp;quot;

REM The PHP binary to use (here a XAMPP binary, yours might be elsewhere)
SET &amp;quot;PHP_PEAR_PHP_BIN=C:\xampp\php\php.exe&amp;quot;

&amp;quot;%PHP_PEAR_PHP_BIN%&amp;quot; -C -d date.timezone=UTC -d output_buffering=1 -d safe_mode=0 -d open_basedir=&amp;quot;&amp;quot; -d auto_prepend_file=&amp;quot;&amp;quot; -d auto_append_file=&amp;quot;&amp;quot; -d variables_order=EGPCS -d register_argc_argv=&amp;quot;On&amp;quot; -d &amp;quot;include_path=&amp;#39;%PHP_PEAR_INSTALL_DIR%&amp;#39;&amp;quot; -f &amp;quot;%PHP_PEAR_BIN_DIR%\pearcmd.php&amp;quot; -- %1 %2 %3 %4 %5 %6 %7 %8 %9

@ECHO ON&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Change the &lt;code&gt;PREFIX&lt;/code&gt; variable to your PEAR installation directory (i.e. the location you moved the files to). Also change the &lt;code&gt;PHP_PEAR_PHP_BIN&lt;/code&gt; variable to the PHP binary you want to use. You can also adjust the other env variables, though changing some of them will require you to take further actions. E.g. if you want to change the &lt;code&gt;BIN_DIR&lt;/code&gt; to &lt;code&gt;%PREFIX%\bin&lt;/code&gt; you&amp;#8217;ll need to rename the &lt;code&gt;scripts/&lt;/code&gt; folder too.&lt;/p&gt;

&lt;h2 id='download_additional_packages'&gt;Download additional packages&lt;/h2&gt;

&lt;p&gt;If you now open the command line, &lt;code&gt;cd&lt;/code&gt; into the &lt;code&gt;scripts/&lt;/code&gt; dir and run &lt;code&gt;pear&lt;/code&gt; all you will get is a bunch of &lt;code&gt;include&lt;/code&gt; errors. This is because &lt;code&gt;PEAR&lt;/code&gt; requires some more packages to function. You&amp;#8217;ll need to install those manually too:&lt;/p&gt;

&lt;p&gt;Download the &lt;a href='http://pear.php.net/package/Console_Getopt/download'&gt;Console_Getopt&lt;/a&gt; package and extract the &lt;code&gt;.tgz&lt;/code&gt; file just like you extracted the main PEAR package. Now go into the folder that you have extracted the files into. You&amp;#8217;ll find a &lt;code&gt;package.xml&lt;/code&gt; there and a folder named &lt;code&gt;Console_Getopt-x.y.z/&lt;/code&gt;. Enter that folder. It&amp;#8217;ll contain a yet another folder named &lt;code&gt;Console/&lt;/code&gt;. Copy this folder into your &lt;code&gt;PEAR&lt;/code&gt; installation (i.e. &lt;code&gt;D:\htdocs\PEAR&lt;/code&gt; in this example).&lt;/p&gt;

&lt;p&gt;If you now run &lt;code&gt;pear&lt;/code&gt; again it should work fine. Run some commands, like &lt;code&gt;pear update-channels&lt;/code&gt; and &lt;code&gt;pear config-set auto_discover 1&lt;/code&gt;. Both should work fine. But if you now run &lt;code&gt;pear install --alldeps pear.phpunit.de/PHPUnit&lt;/code&gt; you&amp;#8217;ll again get a bunch of errors. This is because PEAR requires two further packages: &lt;a href='http://pear.php.net/package/Archive_Tar/download'&gt;Archive_Tar&lt;/a&gt; and &lt;a href='http://pear.php.net/package/Structures_Graph/download'&gt;Structures_Graph&lt;/a&gt;. You can install them in the same manner as the Console_Getopt package. Only difference is that these packages may also contain &lt;code&gt;docs/&lt;/code&gt; and &lt;code&gt;tests/&lt;/code&gt; directories. Just copy them, too.&lt;/p&gt;

&lt;p&gt;If you now try &lt;code&gt;pear install --alldeps pear.phpunit.de/PHPUnit&lt;/code&gt; again everything should work fine. PEAR should have created a &lt;code&gt;PHPUnit/&lt;/code&gt; directory in your installation path and there should be a &lt;code&gt;phpunit.bat&lt;/code&gt; in your &lt;code&gt;scripts/&lt;/code&gt; directory.&lt;/p&gt;

&lt;h2 id='adjusting_your_include_path'&gt;Adjusting your include_path&lt;/h2&gt;

&lt;p&gt;The last step is to add your PEAR installation to the &lt;code&gt;include_path&lt;/code&gt;. Just open your &lt;code&gt;php.ini&lt;/code&gt; and find the line specifying the &lt;code&gt;include_path&lt;/code&gt;. Replace it with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;include_path = &amp;quot;.;D:\htdocs\PEAR&amp;quot; ; or whatever your install dir might be&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now you&amp;#8217;re done :)&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>PHP internals: When does foreach copy?</title>
      <link>http://nikic.github.com/2011/11/11/PHP-Internals-When-does-foreach-copy.html</link>
      <pubDate>Fri, 11 Nov 2011 00:00:00 -0800</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2011/11/11/PHP-Internals-When-does-foreach-copy</guid>
      <description>&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: This post requires some knowledge of PHP&amp;#8217;s internal workings, namely &lt;code&gt;zval&lt;/code&gt;s, &lt;code&gt;refcount&lt;/code&gt;s and copy-on-write behavior. If those terms don&amp;#8217;t mean anything to you, you will need to read up on them in oder to understand this post. I would recommend &lt;a href='http://blog.golemon.com/2007/01/youre-being-lied-to.html'&gt;an article by Sara Golemon&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;PHP&amp;#8217;s &lt;code&gt;foreach&lt;/code&gt; is a very neat and to-the-point language construct. Still some people don&amp;#8217;t like to use it, because they think it is slow. One reason usually named is that &lt;code&gt;foreach&lt;/code&gt; copies the array it iterates. Thus some people recommend to write:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$keys&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;array_keys&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nv'&gt;$size&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;count&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='nv'&gt;$size&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$key&lt;/span&gt;   &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$keys&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;];&lt;/span&gt;
    &lt;span class='nv'&gt;$value&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$key&lt;/span&gt;&lt;span class='p'&gt;];&lt;/span&gt;

    &lt;span class='c1'&gt;// ...&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Instead of the much more intuitive and straightforward:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$key&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='c1'&gt;// ...&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;There are two problems with this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href='http://blog.ircmaxell.com/2011/08/on-optimization-in-php.html'&gt;Microoptimization is evil&lt;/a&gt;. Usually it only wastes your time and doesn&amp;#8217;t give any measurable performance improvements.&lt;/li&gt;

&lt;li&gt;The copying behavior of &lt;code&gt;foreach&lt;/code&gt; is somewhat more complicated than most people think. Often the &amp;#8220;optimized&amp;#8221; variant happens to be slower than the original.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id='when_does_foreach_copy'&gt;When does foreach copy?&lt;/h2&gt;

&lt;p&gt;Whether or not &lt;code&gt;foreach&lt;/code&gt; copies the array and how much of it depends on three things: Whether the iterated array is referenced, how high its &lt;code&gt;refcount&lt;/code&gt; is and whether the iteration is done by reference.&lt;/p&gt;

&lt;h3 id='not_referenced_refcount__1'&gt;Not referenced, refcount == 1&lt;/h3&gt;

&lt;p&gt;In the below code &lt;code&gt;$array&lt;/code&gt; is not referenced and has a refcount of &lt;code&gt;1&lt;/code&gt;. In this case &lt;code&gt;foreach&lt;/code&gt; will &lt;strong&gt;not&lt;/strong&gt; copy the array (&lt;a href='http://codepad.viper-7.com/MFNrmk'&gt;proof&lt;/a&gt;) - contrary to the popular belief that &lt;code&gt;foreach&lt;/code&gt; always copies the iterated array if it isn&amp;#8217;t referenced.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nx'&gt;test&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;test&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;range&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;100000&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$key&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// ...&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The reason is simple: Why should it? The only thing that &lt;code&gt;foreach&lt;/code&gt; modifies about &lt;code&gt;$array&lt;/code&gt; is it&amp;#8217;s internal array pointer. This is expected behavior and thus doesn&amp;#8217;t need to be prevented.&lt;/p&gt;

&lt;h3 id='not_referenced_refcount__1'&gt;Not referenced, refcount &amp;gt; 1&lt;/h3&gt;

&lt;p&gt;The following code looks very similar to the previous one. The only difference is that the array is now passed as an argument. This seems like an insignificant difference, but it does change the behavior of &lt;code&gt;foreach&lt;/code&gt;: It now &lt;strong&gt;will&lt;/strong&gt; copy the array structure, but &lt;strong&gt;not&lt;/strong&gt; the values (&lt;a href='http://codepad.viper-7.com/mixX1H'&gt;proof&lt;/a&gt;; if you want to see that this is really only the structure being copied compare &lt;a href='http://codepad.viper-7.com/nNX1Rt'&gt;this&lt;/a&gt; and &lt;a href='http://codepad.viper-7.com/lQC1K6'&gt;that&lt;/a&gt; script. The first only copies the structure, the second copies both).&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;range&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;100000&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nx'&gt;test&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;test&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$key&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// ...&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This might seem odd at first: Why would it copy when the array is passed through an argument, but not if it is defined in the function? The reason is that the array &lt;code&gt;zval&lt;/code&gt; is now shared between multiple variables: The &lt;code&gt;$array&lt;/code&gt; variable outside the function and the &lt;code&gt;$array&lt;/code&gt; variable inside it. If &lt;code&gt;foreach&lt;/code&gt; would iterate the array without copying its structure it would not only change the array pointer of the &lt;code&gt;$array&lt;/code&gt; variable in the function, but also the pointer of the &lt;code&gt;$array&lt;/code&gt; variable outside the function. Thus &lt;code&gt;foreach&lt;/code&gt; needs to copy the array structure (i.e. the hash table). The values on the other hand still can share zvals and thus don&amp;#8217;t need to be copied.&lt;/p&gt;

&lt;h3 id='referenced'&gt;Referenced&lt;/h3&gt;

&lt;p&gt;The next case is very similar to the previous one. The only difference is that the array is passed by reference. In this case the array again will &lt;strong&gt;not&lt;/strong&gt; be copied (&lt;a href='http://codepad.viper-7.com/9bKciB'&gt;proof&lt;/a&gt;).&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;range&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;100000&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='nx'&gt;test&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;test&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$array&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$key&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nv'&gt;$value&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='c1'&gt;// ...&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;In this case the same reasoning applies as with the previous case: The outer &lt;code&gt;$array&lt;/code&gt; and the inner &lt;code&gt;$array&lt;/code&gt; share &lt;code&gt;zval&lt;/code&gt;s. The difference is that they now are references (&lt;code&gt;isref == 1&lt;/code&gt;). Thus in this case it is expected that any change to the inner array will also be done to the outer array. So if the array pointer of the inner array is changed, the array pointer of the outer array should change, too. That&amp;#8217;s why &lt;code&gt;foreach&lt;/code&gt; doesn&amp;#8217;t need to copy.&lt;/p&gt;

&lt;h3 id='iterated_by_reference'&gt;Iterated by reference&lt;/h3&gt;

&lt;p&gt;The above examples were all iterating by value. For iteration by reference the same rules apply, but the additional value reference changes copying behavior of the array &lt;em&gt;values&lt;/em&gt; (the behavior about &lt;em&gt;structure&lt;/em&gt; copying stays).&lt;/p&gt;

&lt;p&gt;The case &amp;#8220;Not referenced, refcount == 1&amp;#8221; doesn&amp;#8217;t change. By reference iteration means we want to change the original array if there are any changes to the &lt;code&gt;$value&lt;/code&gt;, so the array isn&amp;#8217;t copied (&lt;a href='http://codepad.viper-7.com/VkqVQ1'&gt;proof&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The case &amp;#8220;Referenced&amp;#8221; also stays the same, as in this case a change to &lt;code&gt;$value&lt;/code&gt; should change all variables referencing the iterated array (&lt;a href='http://codepad.viper-7.com/qKMkDo'&gt;proof&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Only the &amp;#8220;Not referenced, refcount &amp;gt; 1&amp;#8221; case changes, as now both the array structure and its values need are be copied. The array structure because otherwise the array pointer of the &lt;code&gt;$array&lt;/code&gt; variable outside the function would change and the values because a change to &lt;code&gt;$value&lt;/code&gt; would also change the outside &lt;code&gt;$array&lt;/code&gt; values (&lt;a href='http://codepad.viper-7.com/O584we'&gt;proof&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id='summary'&gt;Summary&lt;/h2&gt;

&lt;p&gt;To summarize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;foreach&lt;/code&gt; will copy the array &lt;em&gt;structure&lt;/em&gt; if and only if the iterated array is not referenced and has a &lt;code&gt;refcount &amp;gt; 1&lt;/code&gt;&lt;/li&gt;

&lt;li&gt;&lt;code&gt;foreach&lt;/code&gt; will additionally copy the array &lt;em&gt;values&lt;/em&gt; if and only if the previous point applies and the iteration is done by reference&lt;/li&gt;
&lt;/ul&gt;</description>
    </item>
    
    <item>
      <title>Improving lexing performance in PHP</title>
      <link>http://nikic.github.com/2011/10/23/Improving-lexing-performance-in-PHP.html</link>
      <pubDate>Sun, 23 Oct 2011 00:00:00 -0700</pubDate>
      <author>nikic@php.net (NikiC)</author>
      <guid>http://nikic.github.com/2011/10/23/Improving-lexing-performance-in-PHP</guid>
      <description>&lt;p&gt;&lt;a href='http://en.wikipedia.org/wiki/Lexical_analysis'&gt;Lexing&lt;/a&gt; isn&amp;#8217;t something you would normally do in PHP, simply because other languages like C can easily outperform PHP by several orders of magnitude. Still it sometimes is desirable to write lexers in PHP, e.g. for tokenizing templates, doccomments and other &lt;a href='http://en.wikipedia.org/wiki/Domain-specific_language'&gt;DSLs&lt;/a&gt;. So I want to share some thoughts on how to write fast lexers in PHP.&lt;/p&gt;

&lt;h2 id='lexing_csv_files'&gt;Lexing CSV files&lt;/h2&gt;

&lt;p&gt;PHP already provides functions for parsing CSV &lt;a href='http://php.net/fgetcsv'&gt;files&lt;/a&gt; and &lt;a href='http://php.net/str_getcsv'&gt;strings&lt;/a&gt;. I will use CSV lexing as an example anyways, simply because it&amp;#8217;s so simple. Here an example of a CSV line (as PHP implements it):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Field,Another Field,&amp;quot;comma -&amp;gt; , &amp;lt;- comma&amp;quot;,&amp;quot;quote -&amp;gt; \&amp;quot; &amp;lt;- quote&amp;quot;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If we define the tokens in terms of regular expressions, they would look roughly like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$tokenMap&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
    &lt;span class='s1'&gt;&amp;#39;~[^&amp;quot;,\r\n]+~A&amp;#39;&lt;/span&gt;                     &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nx'&gt;T_PLAIN_FIELD&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='s1'&gt;&amp;#39;~&amp;quot;[^&amp;quot;\\\\]*(?:\\\\.[^&amp;quot;\\\\]*)*&amp;quot;~A&amp;#39;&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nx'&gt;T_QUOTED_FIELD&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='s1'&gt;&amp;#39;~,~A&amp;#39;&lt;/span&gt;                              &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nx'&gt;T_FIELD_SEPARATOR&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='s1'&gt;&amp;#39;~\r\n?|\n~A&amp;#39;&lt;/span&gt;                       &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nx'&gt;T_LINE_SEPARATOR&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;h2 id='looping_through_the_regexes'&gt;Looping through the regexes&lt;/h2&gt;

&lt;p&gt;The most obvious approach to lexing in PHP is to simply loop through the regexes and match them against the current position, until there are no characters left:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;lex&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;array&lt;/span&gt; &lt;span class='nv'&gt;$tokenMap&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='nv'&gt;$tokens&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;

    &lt;span class='nv'&gt;$offset&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// current offset in string&lt;/span&gt;
    &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;isset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$offset&lt;/span&gt;&lt;span class='p'&gt;]))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt; &lt;span class='c1'&gt;// loop as long as we aren&amp;#39;t at the end of the string&lt;/span&gt;
        &lt;span class='k'&gt;foreach&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tokenMap&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;$regex&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='nv'&gt;$token&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$regex&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$matches&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$offset&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='nv'&gt;$tokens&lt;/span&gt;&lt;span class='p'&gt;[]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
                    &lt;span class='nv'&gt;$token&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;      &lt;span class='c1'&gt;// token ID      (e.g. T_FIELD_SEPARATOR)&lt;/span&gt;
                    &lt;span class='nv'&gt;$matches&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;],&lt;/span&gt; &lt;span class='c1'&gt;// token content (e.g. ,)&lt;/span&gt;
                &lt;span class='p'&gt;);&lt;/span&gt;
                &lt;span class='nv'&gt;$offset&lt;/span&gt; &lt;span class='o'&gt;+=&lt;/span&gt; &lt;span class='nb'&gt;strlen&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$matches&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;]);&lt;/span&gt;
                &lt;span class='k'&gt;continue&lt;/span&gt; &lt;span class='mi'&gt;2&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='c1'&gt;// continue the outer while loop&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;LexingException&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;sprintf&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;Unexpected character &amp;quot;%s&amp;quot;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$offset&lt;/span&gt;&lt;span class='p'&gt;]));&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$tokens&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The drawback of this code is fairly obvious: On every new offset one needs to iterate through all regexes and match them one at the time. This maybe isn&amp;#8217;t such a problem in this case as we only have four regexes, but when lexing some real language one usually deals with hundreds of them.&lt;/p&gt;

&lt;h2 id='compiling_into_a_single_regex'&gt;Compiling into a single regex&lt;/h2&gt;

&lt;p&gt;The solution is to compile all regexes into a single big one. Our above regex would be converted to:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='nv'&gt;$regex&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;~&lt;/span&gt;
&lt;span class='s1'&gt;    ([^&amp;quot;,\r\n]+)&lt;/span&gt;
&lt;span class='s1'&gt;  | (&amp;quot;[^&amp;quot;\\\\]*(?:\\\\.[^&amp;quot;\\\\]*)*&amp;quot;)&lt;/span&gt;
&lt;span class='s1'&gt;  | (,)&lt;/span&gt;
&lt;span class='s1'&gt;  | (\r\n?|\n)&lt;/span&gt;
&lt;span class='s1'&gt;~xA&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;But how can we use this? How can we know which of the four subregexes matched the string? Simple: As all the subregexes are enclosed in parenthesis they are capturing groups. So our &lt;code&gt;$matches&lt;/code&gt; array will contain the matched string at position &lt;code&gt;[0]&lt;/code&gt; and at the position of the matched subregex. All other groups will be empty.&lt;/p&gt;

&lt;p&gt;An example: If the regex matched the &lt;code&gt;,&lt;/code&gt; subregex, we would get this &lt;code&gt;$matches&lt;/code&gt; array:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
    &lt;span class='mi'&gt;0&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;,&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='mi'&gt;1&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='mi'&gt;2&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='mi'&gt;3&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;,&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='mi'&gt;4&lt;/span&gt; &lt;span class='o'&gt;=&amp;gt;&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;From this we know that the 3rd subregex matched, as it is the one which has a value.&lt;/p&gt;

&lt;p&gt;The resulting code for an abstract lexer class would be:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='php'&gt;&lt;span class='cp'&gt;&amp;lt;?php&lt;/span&gt;
&lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Lexer&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$regex&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='k'&gt;protected&lt;/span&gt; &lt;span class='nv'&gt;$offsetToToken&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;__construct&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;array&lt;/span&gt; &lt;span class='nv'&gt;$tokenMap&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;regex&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;((&amp;#39;&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='nb'&gt;implode&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;)|(&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nb'&gt;array_keys&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tokenMap&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='o'&gt;.&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;))A&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;offsetToToken&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;array_values&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$tokenMap&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;

    &lt;span class='k'&gt;public&lt;/span&gt; &lt;span class='k'&gt;function&lt;/span&gt; &lt;span class='nf'&gt;lex&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
        &lt;span class='nv'&gt;$tokens&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;();&lt;/span&gt;

        &lt;span class='nv'&gt;$offset&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
        &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;isset&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$offset&lt;/span&gt;&lt;span class='p'&gt;]))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
            &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='nb'&gt;preg_match&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;regex&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$matches&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;null&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$offset&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
                &lt;span class='k'&gt;throw&lt;/span&gt; &lt;span class='k'&gt;new&lt;/span&gt; &lt;span class='nx'&gt;LexingException&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;sprintf&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s1'&gt;&amp;#39;Unexpected character &amp;quot;%s&amp;quot;&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nv'&gt;$string&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$offset&lt;/span&gt;&lt;span class='p'&gt;]));&lt;/span&gt;
            &lt;span class='p'&gt;}&lt;/span&gt;

            &lt;span class='c1'&gt;// find the first non-empty element (but skipping $matches[0]) using a quick for loop&lt;/span&gt;
            &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt; &lt;span class='s1'&gt;&amp;#39;&amp;#39;&lt;/span&gt; &lt;span class='o'&gt;===&lt;/span&gt; &lt;span class='nv'&gt;$matches&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;];&lt;/span&gt; &lt;span class='o'&gt;++&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;

            &lt;span class='nv'&gt;$tokens&lt;/span&gt;&lt;span class='p'&gt;[]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;array&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$matches&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;],&lt;/span&gt; &lt;span class='nv'&gt;$this&lt;/span&gt;&lt;span class='o'&gt;-&amp;gt;&lt;/span&gt;&lt;span class='na'&gt;offsetToToken&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='nv'&gt;$i&lt;/span&gt; &lt;span class='o'&gt;-&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;]);&lt;/span&gt;

            &lt;span class='nv'&gt;$offset&lt;/span&gt; &lt;span class='o'&gt;+=&lt;/span&gt; &lt;span class='nb'&gt;strlen&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nv'&gt;$matches&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;]);&lt;/span&gt;
        &lt;span class='p'&gt;}&lt;/span&gt;

        &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='nv'&gt;$tokens&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
    &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;h2 id='how_much_does_this_really_change'&gt;How much does this really change?&lt;/h2&gt;

&lt;p&gt;I am seeing approximately 30% performance improvement for the average case (&lt;a href='http://codepad.viper-7.com/B5Arj5'&gt;online demo&lt;/a&gt;). But this value varies with different regexes and input. If the number of regexes increases the performance improvement is bigger. Additionally in the compiled-regex variant the order of the regexed has less influence on the execution time (&lt;a href='http://codepad.viper-7.com/hng1i0'&gt;online demo&lt;/a&gt;). I.e. it is not that important to put the more probable regexes first and the less probable ones last. (But it still is important, only less important!)&lt;/p&gt;</description>
    </item>
    

  </channel>
</rss>
