Friends are expensive »
FERDY CHRISTANT - NOV 13, 2009 (02:08:29 PM)
Friends are expensive, in a web application. Allow me to explain. Like many other web applications, JungleDragon allows users to make friends with other users. It is trivial to model this relationship in a database:
The UserFriend table simply maps the relationship between a user and his friend. This makes it very easy to get a list of friends for a specific user. For one particular user, it is also easy to check if it concerns a friend of the current user.
However, there is a case where this model becomes highly ineffective and you are left with a scalability problem. The case concerns the situation where a page shows a large number of users who may or may not be a friend of the user, and you want to indicate whether this is a friend or not. An example of such a case is a large comment thread.
The problem with a scenario like that is that either you have to trigger a "friend" query for each user you render on the screen, or you could do this all in a single, very expensive join query. The cost of this query is N x M and will put a serious load on your database. Since the database is the hardest thing to scale, I decided to drop this feature, since it is a nice-to-have. Note that I'm not dropping the friends feature, only the friend indicator in lists of users. You can still see who your friends are from your profile.
The scalability issue presented above is quite common, this article explains how Digg has the same issue.Of course their user base is huge, but the issue is the same. They are even looking into exotic non-relational systems just in order to be able to mark a user, story or comment as green, meaning a friend of the current user.
Case two
Whilst on the subject of optimizing the database, I decided to denormalize a small part of the database too. Everywhere you go in JungleDragon, a user's karma, level and class are shown. In the database, this was modeled like so:
Based on a user's karma, the level is calculated. The level is determined by the minimum and maximum karma. For example, if you have 550 karma, it could mean that you are level 3, because level 3 concerns the range of 200 - 600 karma. Based on your level, you are assigned a class. Again this works based on a range. For example, level 3-5 could mean you have the class "Grizzly Bear".
Functionally, this model is nice. I can rearrange the level and class ranges and all users will automatically get assigned the correct level and class. However, displaying a user's level and class occurs multiple times on each page. You always see your own level and class, but also those of your friends, commenters, image uploaders, etc. For each user to display, I need to do a double inner join to calculate the level and class, again putting a large load on the database. Therefore, I decided to denormalize this. The new model looks like this:
As you can see, the user table now holds the level and class. The Level and Class tables are not gone, they are used to calculate the latest level and class of the user each time they get assigned new karma. This way, I can directly query the user table for these values, without the need of an expensive query. In the rare event that I change the Level and Class definitions, I run an update script that stamps the corrected level and class on the profile of each user.
Premature optimization is considered the root of all evil, yet I consider these cases to be best practices.



Comments: 4
COMMENT: GARTH THOMAS
NOV 17, 2009 - 08:18:08 PM
COMMENT: FERDY
NOV 18, 2009 - 08:09:57 AM
It's a nice suggestion, but some people may have a lot of friends. I guess it depends on how you store your session. Cookies only allow for 3KB of data. Still, it's an interesting suggestion. Perhaps memcached can help too. «
COMMENT: GARTH THOMAS
DEC 9, 2009 - 06:46:13 PM
If you want to you can update the cache on the server side as people change their friends list and the javascript can rerun the AJAX routine say every 10 mins so people can see updates in near realtime without ever hitting the backend. «
COMMENT: GARTH THOMAS
DEC 9, 2009 - 06:53:19 PM
i guess you could go further here and then add javascript events for creating online chat and presence awareness also «