Implementing Cache in Your PHP based Web Application

If you Web Application grows or it involves a lot of computations to generate dynamic content, performance is going to get poor. There are some data why requires plenty of calculation but they do not change on every request. It’s wise to start caching such data or objects. Recently, in one of the application I have been working, there was considerably unnecessary delay because there were some heavy data that were being calculated every time a user sends a request. And it was a high time I should’ve started caching those results.

There are plenty of extensions and frameworks available for caching in PHP. But instead of spending my time looking for best extension/ framework and learning it, I decided to implement cache on my own without using any fancy things.

I am going to explain two-way of caching of the content:

  1. Server Side Caching
  2. Client Side Caching

Server Side Caching

In server-side caching, we are going to cache those data and/or objects on server and use them the next time user requests for it instead of recalculating those data.

For caching these things, we are going to use files. I assume, your calculation is going to take more time than reading a file, otherwise there is no point in using caching. What we are going to do is that we will write each calculated data or object inside a file and store it on the server.

It’s always a good idea to define few constants or configuration variables for few settings. You should be having at least following few parameters in the configuration. I would make a config file with following constants:

  1. CACHE_ENABLED: A parameter to turn on or off the caching
  2. CACHE_PATH: Path to your cache directory. I prefer to create a `cache` directory in my Application Root. If you are on Linux, don’t forgot to give your server write permission on cache folder
  3. CACHE_EXPIRY: A parameter that force the cache to be invalidated after specified time  from the creation of the cache.
php
 //! Config parameter to turn on/off the caching
 define("CACHE_ENABLED", true);
 //! Path to store your cache files
 define("CACHE_PATH", "/var/www/YourApp/cache/");
 //! Cache expiration time
 define("CACHE_EXPIRY", 3600);
?>

Now come the real caching part. Before you start caching, there are few choices that you need to make considering various factors.

  • What data are you going to cache? Well, don’t cache something that requires less time computing than reading or writing it to a file. You should also not cache something which changes on each request.
  • When do you need to purge a cache? It is as important to know when you should purge your cache. Make list of all the events or condition that can invalidate your cache. If you are not purging the data on appropriate events or condition, your application may have inconsistent or corrupted data or state.

Once you have decided on these things, you should start caching things.  We will be computing the object and storing it inside the file. It a wise thing to use a proper naming convention for storing objects. A good naming convention will make your task for purging cache easier.

In my application, there were few types of objects. And all these object’s value were different for each users. So I decided to use this follow convention: `{OBJ_TYPE}_{OBJ_TYPE_ID}_uid_{UID}.cache`

{UID} is User ID of a user. So in case when cache for a particular object becomes invalid, I can delete all files that matches `{OBJ_TYPE}_{OBJ_TYPE_ID}_*.cache`

And in case if all cache for a single user expires, I can delete all files that matches `*_uid_{UID}.cache`

Now let’s start caching object. Suppose we are serving an expensive data as a JSON for some AJAX request. Here is an example of such sample snippet:

<?php
require_once "loader.php"

$aExpensiveData = null;

if(CACHE_ENABLED) {

    //! Calculate Cache path based on your naming convention
    $sCachePath = CACHE_PATH."obj_{$oid}_uid_{$iUserID}.cache";
    if(file_exists($sCachePath)) {

        //! Get the last-modified-date of this very file
        $lastModified=filemtime($sCachePath);

        //! Calculate the expiry
        $sExpiry = time() - CACHE_EXPIRY*3600;

        //!Purge the cache if cache is older the foced expiry
        if($lastModified<$sExpiry) {
            unlink($sCachePath);
        }
        else {
            $aExpensiveData = unserialize(file_get_contents($sCachePath));
        }

    }
}

//! If Cache Miss, let's do the expensive calculation
if($aExpensiveData===null) {
    $aExpensiveData = doExpensiveCalculation();

    //! Cache the result if cache enabled
    if(CACHE_ENABLED) {
        $sCachePath = CACHE_PATH."obj_{$oid}_uid_{$iUserID}.cache";
        //! Don't forget to serialize the object, before writing
        file_put_contents($sCachePath, serialize($aExpensiveData));
    }
}

header('Content-Type: application/json');
$json = json_encode($aExpensiveData );
echo $json;
?>

 

If you look at the code, you are first checking if cache is already available. If cache is available and still valid, use it instead of doing the expensive computation.  After doing expensive computation, we are caching the result so that next time we don’t miss a cache.

 

Client Side Caching

Well, you just saved a great amount of computing power by not recalculating an expensive object. You are saving computation time by using server-side caching. What if I tell you that you can save bandwidth and data transfer delay by using client side caching?

We are still considering the previous example of an AJAX request that expects expensive object in JSON format. Now suppose, that’s an expensive as well as a big object. It surely takes time to transfer the object from server to client and you are also exhausting using your network’s bandwidth.

You can start using Client Side Caching to save bandwidth and data-transfer delay. Data transfer delay may be slowing your application’s response for clients with slower internet connection. They will benefit much from this.

To cache a “page” on client, you need to tell client that this page is valid up to certain hours or days. But again, you don’t want your client to use wrong data in case some event has invalidated that data. So we will ask client to always validate its local cache before using it. So, to let your client’s browser do this, it’s important to send, Last-Modified and Expires HTTP Header properly.

Let’s implement Client Side Caching in the previous example:

<?php
require_once "loader.php"

$aExpensiveData = null;

if(CACHE_ENABLED) {

    //! Calculate Cache path based on your naming convention
    $sCachePath = CACHE_PATH."obj_{$oid}_uid_{$iUserID}.cache";
    if(file_exists($sCachePath)) {

        //! Get the last-modified-date of this very file
        $lastModified=filemtime($sCachePath);

        //! Calculate the expiry
        $sExpiry = time() - CACHE_EXPIRY*3600;

        if($lastModified<$sExpiry) {
            unlink($sCachePath);
            $lastModified = time();
        }
       
        //Get the HTTP_IF_MODIFIED_SINCE header if set
        $ifModifiedSince=(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']) ? $_SERVER['HTTP_IF_MODIFIED_SINCE'] : false);

        //Set last-modified header.
        header("Last-Modified: ".gmdate("D, d M Y H:i:s", $lastModified)." GMT");
        
         //tell client to revalidate local cache before using it
        header('Cache-Control: must-revalidate');

        header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', $lastModified + CACHE_EXPIRY*3600));

        //check if page has changed. If not, send 304 and exit. Client will use it's own cache
        if (strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])==$lastModified)
        {
               header("HTTP/1.1 304 Not Modified");
               exit;
        }

        //! It didn't exit, that mean client doesn't havethe latest copy your data.
        $aExpensiveData = unserialize(file_get_contents($sCachePath));
        

    }
}

//! If Cache Miss, let's do the expensive calculation
if($aExpensiveData===null) {
    $aExpensiveData = doExpensiveCalculation();

    //! Cache the result if cache enabled
    if(CACHE_ENABLED) {
        //! Don't forget to serialize the object, before writing
        file_put_contents($sCachePath, serialize($aExpensiveData));

         //Set last-modified header.
        header("Last-Modified: ".gmdate("D, d M Y H:i:s", $lastModified)." GMT");

        //tell client to revalidate local cache before using it
        header('Cache-Control: must-revalidate');

        header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', $lastModified + CACHE_EXPIRY*3600));
    }
}

header('Content-Type: application/json');
$json = json_encode($aExpensiveData );
echo $json;
?>

This time, we are setting Last-Modified header to time of the creation of cache. And we are setting Expires based on CACHE_EXPIRY.

If client has a local cache, it’s modification date will be set in `$_SERVER[‘HTTP_IF_MODIFIED_SINCE’] `

Thus you will only send HTTP Code 304 saying “Content isn’t modified, use your local cache.”.

 

These examples are targeting a single scenario. But it’s easy to use same concept in any scenario with little modifications because techniques remains same.

Extent of the benefit of using cache will greatly depend on what you are caching, how  expensive your calculation is and how often your cache becomes invalid. I got several fold performance improvement in my application after implementing these cache techniques. Feel free to drop your questions and feedback in comments.

PS: These  examples are very quick and dirty examples of these techniques. Main motive of this blog post and examples in it is to make Developers familiar with the caching techniques. If your application is going to use caching seriously, I suggest you to invest some time in learning popular extensions or frameworks. And if you are planning to implement your own caching, it’s better to define proper classes and methods to make your code more structured and easy to maintain. 

How PHP Sessions can cause Concurrency Issues?

A web application without sessions is hard to imagine. People use it very liberally to maintain session data, so do I. But what most people don’t know that it can cause issues with your application’s concurrency if not used properly. Even though it is an obvious thing, I never knew (or thought!) about it till today.

What’s the problem?

PHP Session Lock can block your applications concurrent requests from a single client. If a client is sending multiple requests concurrently and each requests involves session usage, each request will be served sequentially instead of processing them concurrently.

Why does this happen?

PHP, by default, use files to store session data. For each new session, PHP will create a file and keep writing session data to it. (hint: Blocking IO). So everytime when you are calling `session_start()` function, it will open your session file and it will acquire an exclusive lock on the file. So if one of your scripts is taking time to process request and your client sends another request which also requires session, it will be blocked until previous request is completed. Second request will also call session_start() but it will have to wait because first request has already acquired an exclusive lock on the session file. Once previous request is fulfilled, PHP will close session file at the end of the script execution and release the lock. Now second process will get a chance to acquire a lock on session file and proceed it’s execution.

However, this can lead to concurrency issues for same client only. Request from a client cannot block another client’s request in such case because they both will be having different sessions and hence different session files.

When can it become a bottleneck?

It will be hard to noticed this blocking period if your scripts are short (in term of execution!). But if you have slightly long running scripts, you are in trouble. This can become a bottleneck if you are working with AJAX and fetching data from several requests on the same page; which is quite common is today’s web applications.

Consider this scenario when you are fetching several data from different background AJAX requests and displaying them on UI. These requests use session. Each asynchronous request is fired immediately and together. But first requests to reach server will receive the session lock while other requests have to wait. So all these requests will be processed sequentially even though they are not dependent on each other.  Taking an example, 5 requests, each taking approximately 500ms to complete, are being sent concurrently. But because of this blocking, each request is not executed concurrently and so the last, 5th, request will start executing at 5th second and will complete execution after 5.5 seconds even though it required only 500ms to process. This can be a serious problem if some of the scripts require more processing or the number of the requests are greater.

I wouldn’t have noticed this if I hadn’t added `sleep(2);` in my code on local machine to simulate natural use from a slow connection. My page was sending 5 requests and each request was being severed every 2 seconds, sequentially!

So what’s the solution?

Close sessions once you are done using it!

PHP has this method to close session write: `session_write_close()`. Calling this method will end current session and write session data to file and release the lock to the file. So it will not block further requests even if current script is still pending processing.

Important thing to note is that once you close session using `session_write_close()`, you will not be use session further in the current script which is being executed.

 

How do I simulate this problem?

If you want to see this problem in action, try following code:

Blocking Example


 

Now send 5 ajax requests to this file. Example code with jQuery:


	
		Send Log:
		

Complete Log:

Non-Blocking Example


Now send 5 ajax requests to this file. Example code with jQuery:

Send Log:

	
		Send Log:
		

Complete Log:

You will be able to see how this small thing can greatly affect your application.