Performance Tuning Tactics For A Django SaaS

9 min readSep 19, 2020

Over the last few weeks, I’ve been working on improving the overall performance of our SaaS platform (OnCare) for user experience gains and profit. Hopefully, some of my fresh musings could be useful to you.

Philosophy on performance focus

TLDR; Focus on building and marketing features, then fix performance issues if and when they emerge.

We’ve been running the business lean, taking in requirements, building solutions from first principles and iterating on feedback from customers once it’s been shipped. It’s a tricky balance spinning all the technical plates while growing your business, one of which is performance. Performance is something we focus on sparingly during our early product lifecycles as premature optimisation is a killer if you build something people don’t end up wanting or using.

Given that performance can take a back seat, tech debt will need to be paid off from time to time, here’s how I tackle it.

Start with the numbers

TLDR; find a cloud provider where you can profile live traffic. Typically known as application performance management (APM).

Our infrastructure provider AWS provides a useful live profiling tool called Xray that’s been fairly core to learning what the bottlenecks you have and to some extent why they are a problem.

Minutiae on Xray is that it’ll only profile a fraction of your traffic so occasionally I’ll increase samples for certain endpoints I’ve earmarked for improvements to ensure I’m looking at statistically significant traces.

Once I’ve got a handle on slow requests, it’s time to dig deeper and look at the waterfall charts in individual traces. There tends to be a theme at this stage; N+1 inefficiencies or slow SQL queries (complex joins) all originating at the DB level. Here’s an example:

Beyond AWS I’ve found DataDog and New Relic to be useful services with their APM offerings but I favour the convenience, cost and consolidation AWS provides.

Locally Django Silk has been of some use but given how much of a slowdown it incurs on response times, it’s seldom used. It’s also not representative of users problems as our dev environments are highly unoptimised and have comparably little data to work off.

N+1 issues

TLDR; optimise queries by taking advantage of Django’s slightly hidden ORM helpers (prefetch_related, _id & annotate).

I’ll tend to chip away at N+1 queries first as they can often be resolved with a prefetch_related or select_related on the queryset.

Word of caution here; overzealously “optimising” querysets with prefetch_related and select_related can lead to worse results as you may overfetch and have inefficient memory allocation.

Fixing inefficient queries

This is a big topic that probably deserves a book by an expert in this field but there’s a few simple things you can do to get by on;

Use _id over .pk on your foreign key model lookups. Idiomatic Django would suggest using code like this to get related IDs:

reports = Report.objects.all()
# Get all reports Creator IDs
[report.creator.pk for report in reports]

Django will be lazy here and assume you want to get a creator instance so will iterate over each report and do a separate lookup of each creator to just pluck out its PK.

Instead, just use the somewhat hidden _id convention:

reports = Report.objects.all()
# Get all reports Creator IDs
[report.creator_id for report in reports]

Taking this foreign key example one step further and let’s say you wanted to get a list of all the related creator email addresses you could start with this:

reports = Report.objects.all()
# Get all reports Creator emails
[report.creator.email for report in reports]

This, of course, will lead to the same N+1 problem as above but now we can’t simply use the _id trick as we don’t want the ID. Instead, bring in the select_related method:

reports = Report.objects.select_related('creator').all()
# Get all reports Creator emails without N+1 lookups
[report.creator.email for report in reports]

This will be much faster if you’re covering a big dataset. You can take this one step further, though and be explicit on which related fields you’re pulling through:

reports = Report.objects.select_related(
    'creator'
).only(
    'creator__email'
).all()
# Get all reports Creator emails without N+1 lookups
[report.creator_email for report in reports]

reports = Report.objects.annotate(
    creator_email=F('creator__email')
).all()
[report.creator_email for report in reports]

GraphQL

TLDR; GraphQL will create inefficient queries, use a library to help improve it.

We’ve embraced the benefits of GraphQL to simplify our interfaces when getting and mutating data but it’s come with a cost; maintaining efficient queries.

Using Graphene, one of the most popular Python GraphQL frameworks it won’t take you too long until you hit N+1 issues (e.g “Is there `n+1` queries issue?”). This is inherent because Graphene will cleverly traverse your models and lazily fetch related M2M and foreign keys through Django’s models but provides no mechanism out the box to annotate what fields could be optimised.

I found there was a good library available to help with most of the Graphene performance issues; graphene-django-optimizer. The main benefits we’ve taken from this library are the resolver hints to have more explicit control over optimisations. E.g:

import graphene
import graphene_django_optimizer as gql_optimizerclass ItemType(gql_optimizer.OptimizedDjangoObjectType):
    name = graphene.String()@gql_optimizer.resolver_hints(
        select_related=('product', 'shipping'),
        only=('product__name', 'shipping__name'),
    )
    def resolve_name(root, info):
        return '{} {}'.format(root.product.name, root.shipping.name)

The slow emergence of performance issues

TLDR; anticipate that any features that gain adoption will have a performance tax you’ll need to pay after a few weeks or months.

Performance issues tend to sneak up on you at an exponential rate and carry on wreaking havoc until they’re addressed.

Here’s the scenario: you release a new feature that will demand a constant read & write throughput on the system (end clients through to the DB). During development, testing and to some extent live early adoption you won’t spot many performance issues as the corpus is too small to put a strain on the system. E.g you’ll have very few rows in a DB table for a slow query to be noticeable.

Over time as new users are signed up to the new feature, the corpus will grow and you’ll get a slow but steady degradation of the system as DB transactions put locks on tables, CPU loads rise etc which will all lead to requests backing up.

The way I’ve handled this challenge is by setting up alarms (AWS’s CloudWatch alarms) around spikes in metrics such as average response time, sustained DB CPU load and general anomaly detection. Once alarms are triggered, typically I’m looking for isolated faults but serendipitously I’ll discover high load on services in the same way you’d spot in your OS process monitor. Once a performance issue is detected, it’s noted and triaged with the product & bug backlog and in the meantime, it’s normally best to horizontally or vertically scale to temporarily stymie the issue.

Detecting problems early

TLDR; Enable verbose DB logging and when you spot more activity than normal, you know you’ve got a problem.

I’ve found simply logging SQL queries on dev machines to be the most effective indicator of inefficient lookups. Simply add:

LOGGING = {
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console'],
        }
    }
}

Now when calling any API endpoints or whole pages, you’ll see a lot of activity when things are bad.

The Django toolbar is also great for surfacing problems with queries but you have to be looking at it consciously and it doesn’t work with API (GraphQL & REST) requests so I favour the serendipitous discovery through a noisy terminal.

Lateral problem solving

TLDR; When faced with a performance issue, it might be cheaper and more reliable to re-architect whole parts of the system.

It’s helpful to have a grasp of the end-to-end service when approaching performance issues as you can find solutions that scale well into the future.

A recent example was our mobile app’s data fetching lifecycle. Without dropping to our domain model, let’s use a fictional app called InstaCalendar which gets all your friend’s and their calendar events for the day. We were (analogously) loading a list of all your friends, then for each of those friends, loading their calendar events. Performance issues aren’t felt if you have few friends and those friends have little going on in their lives but the popular ones (the power users), will quickly feel the performance hit as they’ll have a lot of friends and a lot of events to fetch because they’re also busy.

In this scenario, you’ll pick up on this when profiling the app after getting reports that the power users are suffering bad UX and you try and replicate their data conditions.

Once you spot this problem and realise how it scales badly (as you’re at the mercy of lots of network request), you have two main options;

Add realtime/push updates (e.g. WebSockets) to only fetch data (event) deltas
Batch the event fetching into one request, ideally decoupled from the loading of friends

In this decision making phase, I’d err towards the boring but reliable tech choice (2) as it’ll incur less risk and you’ll get the solution to users, faster. Here, you’ll be required to orchestrate a change at the mobile and backend which, if you are broken up into functional teams, will yield some friction but it’s a useful route to familiarising yourself with, given the macro benefits and tendency to reoccur.

Async task processing

Another example we’ve used was around requests that require a lot of processing and data transfer. In our case, this was CSV generation and download.

We initially had a simple view & controller for generating CSV files but once new fields, filters and computation was added it became untenable in its current form, even with heavy use of memory-efficient generators.

When addressing any requests where it’s likely to invoke a non-trivial amount of processing (you should be able to predict this), I’d advise moving to an asynchronous processing model which will typically be a task queue (e.g Celery) or something more exotic (Node.js/Twisted/AWS Lambda).

We’re using Zappa for a serverless infrastructure so we ported our view to an async task (Zappa task) which put the results on S3 which a frontend script would poll and download from S3 when ready. This saved a heap of issues with server load but required a fair amount of engineering to make it all work seamlessly and securely.

Caching

No performance post would be complete without talking about caching but I tend to avoid caching data where possible despite the huge gains it can provide. This comes with the caveat that we operate at a modest scale in a regulated space, handling sensitive data and have a ~3:1 read-write ratio of data.

I take this stance as I feel there’s a much bigger cost of serving stale or incorrect data, managing cache invalidation and maintaining more subsystems vs. the gain you can get from tuning your queries. It turns out most databases can be blisteringly fast at getting data if you have the right indexes and queries.

The only data caches we use tend to be ephemeral, low-risk lookups of things like internal metrics and deterministic, expensive calculations. I’ve found lru_cache to be super easy to drop into projects of this kind.

Our static assets, on the other hand, are highly cached, making use of hashed filenames with long-lived cache expiry and ETags fronted by CloudFront CDN.

Generators

My preference is to avoid generators as I feel it leads to obtuse code at the benefit of the machine over the human, but I have found them a necessary evil to handle memory consuming operations like CSV generation. Here it’s a case of using the right tool for the job, even if you’re not fond of the tool.

Results

We spent a 2-week “sprint” fixing performance issues as we were spending a lot on our infrastructure (relative to the past) and customers were complaining. The main objective of this sprint was a faster experience for the end-user (measured in the speed of the slowest, most frequent requests) as we tend to do everything to optimises for the user. The infrastructure and cost issues should benefit from fixes applied to the user’s UX improvements.

The process followed largely what has been mentioned above, start with the metrics, fix the problems and repeat until you run out of time on the sprint.

The results from our latest performance sprint were significant:

We saw a big decrease in the number of complaints about performance. Yay!
We saw a net decrease in the amount of time it took our users to get the data they required.
We saw a significant reduction in required AWS capacity which led to a big cost saving. Here’s a graph that illustrates that:

In summary, there’s a certain art to performance tuning but there’s no short supply of resources to get help. The Django docs themselves are a great further read https://docs.djangoproject.com/en/3.1/topics/db/optimization/.

A reminder that this post was written with the bias towards a service that values up-to-date data and security over absolute speed. If you’re a media publisher, I’d advise finding a post that goes deep on caching strategies.

Say hello to me on Twitter @kulor