Varnish is an open source web cache. It sits in front of Apache (or nginx, gunicorn, etc) and caches entire pages by URL, thus vastly improving the speed of any dynamic website. Instead of repeatedly generating a page which makes many expensive SQL queries or performs some other computationally expensive task, the resulting page and its headers are cached in memory or to a file on disk. Subsequent requests for a URL are fetched directly from the cache. This provides a significant speed boost to any non-trivial Django project.
Of course, one can use Django’s built-in cache framework with or without Varnish. Django is great at caching parts of generated pages, but in my opinion it’s better to use a dedicated HTTP accelerator if you want to cache whole pages.
Configuring Varnish for use with Django
Varnish uses VCL files for configuration. These give precise control over every aspect of the request pipeline; VCL is essentially a mini language for specifying how an incoming request should be handled.
I use the following VCL for production Django sites, most of which is borrowed from Chase Seibert:
It makes the following adjustments to Varnish’s behaviour:
All cookies except for
csrftokenandsessionidare ignored. Varnish will stop serving cached pages when cookies are set; this includes cookies set by Google Analytics or other client-side JavaScript code. By ignoring these cookies, Varnish will still serve cached content to unauthenticated users. Authenticated users will never see a cached page.Accept-Encodingis normalized to ensure that Varnish doesn’t cache multiple copies of the same page. Browsers tend to differ slightly in how they specifyAccept-Encoding, and Varnish uses it as a differentiator when calculating the hash key for a given request.URLs beginning with
/staticare always cached. Static content is likely to remain the same whether a user is logged in or not, so we remove cookies from these requests to ensure that they get cached.
Caveats
Even with these adjustments in place, there are still two major caveats:
If an object changes after being cached, there will be an unspecified delay before those changes become visible to unauthenticated users.
Once a user logs in or accesses a form with CSRF protection, they will never see a cached page again unless they erase the
csrftokenand/orsessionidcookies, even after logging out. Django doesn’t delete thesessionidcookie upon logout; instead, the session is regenerated and a newsessionidis set.
python-varnish and django-varnish
To solve caveat #1, we must purge an object’s URL from the Varnish cache after it has been changed. django-varnish does just that: it monitors certain models and purges an object’s get_absolute_url when the object is updated. I use a fork which can also purge the entire cache when certain models are updated.
django-varnish depends on python-varnish, which provides a simple Python API for management of Varnish servers via their management ports.
Installation
Unfortunately the original maintainer’s code no longer works with the current version of Varnish, but alternative up-to-date forks are available. These can installed via pip:
1 2 | |
Configuration
Once installed, just add django.contrib.humanize and varnishapp to INSTALLED_APPS, add (r'^admin/varnish/', include('varnishapp.urls')) to your URLconf, and add some settings:
VARNISH_WATCHED_MODELSis a list of installed models whoseget_absolute_urls you want to purge from the Varnish cache upon saving. Example:('auth.user','profiles.profile')VARNISH_GLOBAL_WATCHED_MODELSis a list of installed models who will purge the entire Varnish cache upon saving.VARNISH_MANAGEMENT_ADDRSis a list of Varnish cache addresses with their management ports. Example:('server1:6082','server2:6082')VARNISH_SECRETis the shared secret used to authenticated with the Varnish server. This can be found at/etc/varnish/secreton an Ubuntu/Debian installation of Varnish.
Once you’ve set all of that up, changes to an object will result in the object’s absolute URL being purged from your Varnish cache.
Delete Django session cookies upon logout
Solving caveat #2 requires a behavioural change to Django: we need to delete the csrftoken and sessionid cookies after the django.contrib.auth.views.logout view is called. We could try to monkey patch this view, or we can write a small piece of middleware to do the job. Monkey patching is a terrible, terrible crime, so I went for the middleware option instead.
Create a file named middleware.py in your project’s root directory:
Then add this new middleware to the beginning of MIDDLEWARE_CLASSES in your project’s settings file.
Results
I ran ab against the homepage of a project I’ve been working on recently, testing against both Varnish and Apache. At the time of testing, each direct request for the page resulted in a total of 50 database queries. I tested from the machine where the site is hosted.
I used the following arguments: -n 1000 -c 50 -k. This instructs it to perform 1000 requests with a concurrency level of 50, with HTTP persistent connections enabled.
The results (below) highlight why it’s absolutely necessary to employ a proper caching strategy for your production Django deployments. Whether you use Varnish, or another HTTP accelerator, or even Django’s built-in whole page cache, the implications are clear: if you allow users to hit Apache directly on every page load, your site won’t stay online for long.
| requests/sec | 50% served within | 95% served within | |
|---|---|---|---|
| Apache 2.2.21 | 4.10 | 2441ms | 3096ms |
| Varnish 3.0.2 | 20204.88 | 1ms | 5ms |
(Update: It was suggested within the comments that I run ab with the -k argument to enable HTTP persistent connections, so I repeated the test and amended the results above.)