Announcement

Saturday, February 05, 2011

Optimizing Django database access : some ideas/experiments

Originally published on BootstrapToday Blog  

As we added more features to BootStrapToday, we started facing issues of performance. Page access was getting progressively slower. Recently we analyzed page performance using excellent Django Debug Toolbar and discovered that in worst there were more than 500 database calls in a page. Obviously that was making page display slow. After looking at various calls, analyzing the code and making changes, we were able to bring it down to about 80 calls and dramatically improving the performance.  Django has excellent general purpose caching framework. However, it’s better to minimize the calls and then add caching for remaining queries. In this article, I am going to show a simple but effective idea for improving the database access performance.

In Django, a common reason for increased database calls in ForeignKey fields. When you try to access the variable representing foreign key typically it results in a database access. Usually suggested solution to this problem is use of ‘select_related’ function of Django query set API. It can substantially reduce the number of database calls. Sometimes it is not possible to use ‘select_related’ (e.g. you don’t want to change how a third party app works or changing the queries requires significant change in the logic etc).

In our case, we have Foreign Key fields like Priority, Status etc on Ticket models. These fields are there because later we want to add ability to ‘customize’ these fields. However, the values in these tables rarely change. Hence usually this results in multiple queries to get the same data.  Usually these are ‘get’ calls. If we can add a simple caching to ‘get’ queries for status, priority etc, then we can significantly reduce the number of database calls.

In Django, a new connection is opened to handle a new request. Hence if we add model instance cache to ‘connection’ then query results will be cached during handling of one request. New request will make the database query again. With this strategy we don’t need complicated logic to clear ‘stale’ items from the cache.



from django.db import connection
from django.core import signals

def install_cache(*args, **kwargs):
    setattr(connection, 'model_cache', dict())

def clear_cache(*args, **kwargs):
    delattr(connection, 'model_cache')

signals.request_started.connect(install_cache)
signals.request_finished.connect(clear_cache)

class YourModelManager(models.Manager):
    def get(self, *args, **kwargs):
        '''
        Added single row level caching for per connection. Since each request
        opens a new connection, this essentially translates to per request
        caching
        '''
        model_key = (self.model, repr(kwargs))

        model = connection.model_cache.get(model_key)
        if model is not None:
            return model

        # One of the cache queries missed, so we have to get the object from the database:
        model = super(YourModelManager, self).get(*args, **kwargs)
        if model is not None and model.pk is not None:
            connection.model_cache[model_key]=model
        return model


As a side benefit, since the same model instance is returned for same query in subsequent calls, number of duplicate instances is reduced and hence memory foot print is also reduced.

9 comments:

Tweets that mention Optimizing Django database access : some ideas/experiments « BootStrapToday's Official Blog -- Topsy.com said...

[...] This post was mentioned on Twitter by vishwajeets, vishwajeets and Django Ireland, Anand Agarwal. Anand Agarwal said: Optimizing Django database access : some ideas/experiments: http://t.co/osczqTF [...]

Brian Luft said...

Based on what you've described, johnny-cache might work well with the added benefit of not needing to override your model managers. You just drop it in and for read-heavy sites you get a tremendous benefit.

Robert Smol said...

Hi,

sounds nice, but I am getting Error during clear_cache.

To me it looks like the connection does not have the 'model_cache' when the 'request_finished' signal is fired...

nitinbhide said...

'model_cache' attribute is added when during 'install_cache' function which will get called during 'request_started' signal. It seems that has not happened. Try changing the install_cache code to

def install_cache(*args, **kwargs):
setattr(connection, 'model_cache', dict())

Enhancement in BootStrapToday Release 1.7 « BootStrapToday's Official Blog said...

[...] 1) There has been significant improvement in the performance. [...]

Mike said...

I'm getting the same problem in clear_cache. I put in a breakpoint, and I can getattr on the connection, but delattr fails:

(Pdb) getattr(connection, 'model_cache')
{}
(Pdb) delattr(connection, 'model_cache')
*** AttributeError: 'DefaultConnectionProxy' object has no attribute 'model_cache'

Any ideas?

Mike said...

Ok, I figured it out. connection is now a DefaultConnectionProxy in Django (even more magic!) which has an overriden __getattr__ that puts the models cache on a different object. So trying to delete it doesn't work, becuase it's not actually on the connection object.

Potential fix: add a __delattr__ override to DefaultConnectionProxy similar to get/set that passes the command to Django's connections array.

nitinbhide said...

Hi Mike,
I think you are using Django 1.4. We have tested code in this article till Django 1.3. Django 1.4 has many changes and this code may require changes for support 1.4.

nitinbhide said...

Hi Mike,
Thanks for the suggestion. I will try it out and update the article.