VSzA techblog

Optimizing Django ORM in f33dme


As I was hacking around with django-oursql vs f33dme, I started sniffing the network traffic between the Python process and the MySQL server to follow up on a bug in oursql. I found that the following queries (yes, plural!) ran every time a feed item was marked as read.

SELECT `f33dme_item`.`id`, `f33dme_item`.`title`, `f33dme_item`.`content`,
  `f33dme_item`.`url`, `f33dme_item`.`date`, `f33dme_item`.`added`,
  `f33dme_item`.`feed_id`, `f33dme_item`.`score`, `f33dme_item`.`archived`
FROM `f33dme_item` WHERE `f33dme_item`.`id` = ?

SELECT (1) AS `a` FROM `f33dme_item` WHERE `f33dme_item`.`id` = ?  LIMIT 1

UPDATE `f33dme_item` SET `title` = ?, `content` = ?, `url` = ?, `date` = ?,
  `added` = ?, `feed_id` = ?, `score` = ?, `archived` = ?
WHERE `f33dme_item`.`id` = ?

The above queries not only multiply the round-trip overhead by three, but the first and the last ones generate quite a bit of a traffic, by sending the content of all the fields (including the content which might contain a full-blown blog post like this) to and from the ORM, respectively. The innocent-looking lines of code that generated them were the following ones.

item = Item.objects.get(id=item_id)
if not item:
  return HttpResponse('No item found')
item.archived = True

By looking at the queries above first, it's pretty clear, that the get method needs to query all the columns, since later code might access any of the fields. The same can be said about the update, which knows nothing about the contents of the database – it even has to check if a row with the ID specified exists to figure out whether to use an INSERT or and UPDATE DML query.

Of course, the developers of the Django ORM met this situation as well, and even documented it along with nice examples in the QuerySet API reference. All I needed was to adapt the code a little bit, and as the documentation states, the update method even “returns the number of affected rows”, so the check for the existence of the item can be preserved. The improved code is the following.

if Item.objects.filter(id=item_id).update(archived=True) != 1:
  return HttpResponse('No item found')

The first line is a bit longish (although still 62 characters only), but replaced four lines of the original code. When read carefully, one might even find it readable, and it produces the following SQL queries in the background.

UPDATE `f33dme_item` SET `archived` = ? WHERE `f33dme_item`.`id` = ?


next posts >
< prev post

Proudly powered by Utterson