Monday, April 23, 2007

The problem with conventional databases

They were designed for old computers, and lot has changed in the last 10 or 20 years.

Here are stats on some random hard disk from 1991, the Seagate ST-2106N:
Capacity: 106 MB
Average seek: 18 ms
Max full seek: 35 ms

And here are some numbers from a newer drive, the Seagate ST3500830NS:
Capacity: 500,000 MB
Random write seek: less than 10 ms

In the past 15 years, hard disk capacity has increased about 5000x. However, the ability to seek has only increased 3.5x.

In those same 15 years, DRAM capacity has also increased dramatically, probably more than 5000x. My friend just bought a computer with 32 GB of RAM -- that's 320x the capacity of the 1991 hard disk!

Here's another important stat:
1991: strlen("Paul Buchheit") = 13
2007: strlen("Paul Buchheit") = 13

Surprised? Probably not. My point? Our databases are still storing a lot of the same things they were in 1991 (or 1970).

The problem? Most databases are still storing my name using the same techniques that they did 15 or more years ago -- they seek the disk head to some specific location and then read or write my name to that location (this seek can be deferred with write ahead logging, but it will need to happen eventually). These databases rely on the one operation that DID NOT dramatically improve in the past 15 years! They still perform like it's 1991.

Here's another way of looking at what has happened:


Relative to storage capacity, seeks/sec is approaching zero. Disks should no longer be thought of as a "random access" devices.

Some people try to beat that exponential curve by simply buying more disks. Here are a few more stats to consider (actual measurements from my computer):
Disk - sequential read/write: 52 MB/sec
Disk - random seeks: 100/sec
DRAM - sequential read/write: 3237 MB/sec
DRAM - random read/write: 10,000,000/sec

For sequential access, DRAM is about 62x the speed of disk -- disk is way slower, but only by a few orders of magnitude.

However, for random access, DRAM is 100,000 times the speed of disk! Buying and maintaining 100,000 disks would certainly be a hassle -- I don't recommend it.

What can you do instead of buying 100,000 disks? Keep your data in memory, or at least all of the small items, such as names, tags, etc. Fortunately, there are some easy tools available for doing that, such as memcached. If your db is rarely updated, that may be enough. However, if you also have frequent updates, then your database will be back to thrashing the disk around updating its b-trees, and you will be back to 1991 performance. To fix that, you may need to switch to a different method of storing data, one which is log based (meaning that the db updates are all written to sequential locations instead of random locations). Maybe I'll address that in another post.

Finally, one more interesting stat: 8 GB of flash memory cost about $80

Flash has some weird performance characteristics, but those can be overcome with smarter controllers. I expect that flash will replace disk for all applications other than large object storage (such as video streams) and backup.

Update: I'm not suggesting that everyone should get rid of their databases and replace them with some kind of custom data storage. That would be a big mistake. If your database is working fine, then you shouldn't waste much time worrying about these issues. The intent of this post is to help you understand the performance challenges faced by databases that rely on disk. If your database is having performance problems, the correct solution will depend on your situation, and may be as simple as tuning the db configuration.

I also neglected to mention something very important, which is that most databases have a configurable buffer cache. Increasing the size of that cache may be one of the easiest and most effective ways to improve db performance, since it can reduce the number of disk reads.

0 comments:

Post a Comment