Greg Hughes

Using Varnish with Django for high performance caching

Varnish is an open source web cache. It sits in front of Apache (or nginx, gunicorn, etc) and caches entire pages by URL, thus vastly improving the speed of any dynamic website. Instead of repeatedly generating a page which makes many expensive SQL queries or performs some other computationally expensive task, the resulting page and its headers are cached in memory or to a file on disk. Subsequent requests for a URL are fetched directly from the cache. This provides a significant speed boost to any non-trivial Django project.

Of course, one can use Django’s built-in cache framework with or without Varnish. Django is great at caching parts of generated pages, but in my opinion it’s better to use a dedicated HTTP accelerator if you want to cache whole pages.

Configuring Varnish for use with Django

Varnish uses VCL files for configuration. These give precise control over every aspect of the request pipeline; VCL is essentially a mini language for specifying how an incoming request should be handled.

I use the following VCL for production Django sites, most of which is borrowed from Chase Seibert:

It makes the following adjustments to Varnish’s behaviour:

  • All cookies except for csrftoken and sessionid are ignored. Varnish will stop serving cached pages when cookies are set; this includes cookies set by Google Analytics or other client-side JavaScript code. By ignoring these cookies, Varnish will still serve cached content to unauthenticated users. Authenticated users will never see a cached page.

  • Accept-Encoding is normalized to ensure that Varnish doesn’t cache multiple copies of the same page. Browsers tend to differ slightly in how they specify Accept-Encoding, and Varnish uses it as a differentiator when calculating the hash key for a given request.

  • URLs beginning with /static are always cached. Static content is likely to remain the same whether a user is logged in or not, so we remove cookies from these requests to ensure that they get cached.

Caveats

Even with these adjustments in place, there are still two major caveats:

  1. If an object changes after being cached, there will be an unspecified delay before those changes become visible to unauthenticated users.

  2. Once a user logs in or accesses a form with CSRF protection, they will never see a cached page again unless they erase the csrftoken and/or sessionid cookies, even after logging out. Django doesn’t delete the sessionid cookie upon logout; instead, the session is regenerated and a new sessionid is set.

python-varnish and django-varnish

To solve caveat #1, we must purge an object’s URL from the Varnish cache after it has been changed. django-varnish does just that: it monitors certain models and purges an object’s get_absolute_url when the object is updated. I use a fork which can also purge the entire cache when certain models are updated.

django-varnish depends on python-varnish, which provides a simple Python API for management of Varnish servers via their management ports.

Installation

Unfortunately the original maintainer’s code no longer works with the current version of Varnish, but alternative up-to-date forks are available. These can installed via pip:

1
2
pip install -e git://github.com/kennu/python-varnish#egg=varnish
pip install -e git://github.com/kennu/django-varnish#egg=django-varnish

Configuration

Once installed, just add django.contrib.humanize and varnishapp to INSTALLED_APPS, add (r'^admin/varnish/', include('varnishapp.urls')) to your URLconf, and add some settings:

  • VARNISH_WATCHED_MODELS is a list of installed models whose get_absolute_urls you want to purge from the Varnish cache upon saving. Example: ('auth.user','profiles.profile')

  • VARNISH_GLOBAL_WATCHED_MODELS is a list of installed models who will purge the entire Varnish cache upon saving.

  • VARNISH_MANAGEMENT_ADDRS is a list of Varnish cache addresses with their management ports. Example: ('server1:6082','server2:6082')

  • VARNISH_SECRET is the shared secret used to authenticated with the Varnish server. This can be found at /etc/varnish/secret on an Ubuntu/Debian installation of Varnish.

Once you’ve set all of that up, changes to an object will result in the object’s absolute URL being purged from your Varnish cache.

Delete Django session cookies upon logout

Solving caveat #2 requires a behavioural change to Django: we need to delete the csrftoken and sessionid cookies after the django.contrib.auth.views.logout view is called. We could try to monkey patch this view, or we can write a small piece of middleware to do the job. Monkey patching is a terrible, terrible crime, so I went for the middleware option instead.

Create a file named middleware.py in your project’s root directory:

Then add this new middleware to the beginning of MIDDLEWARE_CLASSES in your project’s settings file.

Results

I ran ab against the homepage of a project I’ve been working on recently, testing against both Varnish and Apache. At the time of testing, each direct request for the page resulted in a total of 50 database queries. I tested from the machine where the site is hosted.

I used the following arguments: -n 1000 -c 50 -k. This instructs it to perform 1000 requests with a concurrency level of 50, with HTTP persistent connections enabled.

The results (below) highlight why it’s absolutely necessary to employ a proper caching strategy for your production Django deployments. Whether you use Varnish, or another HTTP accelerator, or even Django’s built-in whole page cache, the implications are clear: if you allow users to hit Apache directly on every page load, your site won’t stay online for long.

requests/sec 50% served within 95% served within
Apache 2.2.21 4.10 2441ms 3096ms
Varnish 3.0.2 20204.88 1ms 5ms

(Update: It was suggested within the comments that I run ab with the -k argument to enable HTTP persistent connections, so I repeated the test and amended the results above.)

Comments