So, when I was working back at Pronto.com we had a scaling problem.

Product images (images of the products that we had crawled and indexed) were always growing and never pruned. We started out with a pair of Linux machines using reiserfs and apache with some tweaked syncing software that I wrote. This worked until we had a backlog of about 2M images per day that needed to be sync’ed and the sync service couldn’t keep up.

We bought a SUN 7410 “SAN” (not really a SAN, but we called it a SAN) due to a recommendation from the sysadmin for Gifts.com. It had 22TB of raw storage and used OpenSolaris and ZFS under the covers with 2x 100GB SSD’s for cache. Should be more than enough horsepower to serve-up some images, right, but alas, there became a time where the number of files in a directory started to choke ZFS. Yea, I know ZFS rocks and there is no better … well, under these circumstances (nothing could be cached - all of the caching was taken care of upstream of the storage engine). Alas, I was not around long enough to move the directory structure around like I had planned to fix the issue using the SAN.

So, just weeks before I was laid off, I put together the requirements for a system that would be the ‘next generation’ of the image storing/serving platform. A very capable engineer and friend of mine, Tony Cassandra, wrote it in Java. The features were: key/value, distributed, scalable, low-latency, HTTP interface, redundant, … He built a system that worked well and was very well designed and is currently in production today. There were a few problems with the system: any failed writes or writes to a single system were just logged and not actually handled automatically, so logging had to be monitored and then items had to be re-synced later, and consistency was just assumed.

Enter Riak.

About 4 months after leaving Pronto, I was researching some erlang-based database engines (there are many of them out now, couchdb and others) and I came upon Riak by http://www.basho.com/ . It’s they exact system that Tony had built at Pronto to do image serving, however it wasn’t written in Java, and had some more interesting features (links, buckets, automatic eventual consistency, metadata, charset, encoding, adding a node increases performance (reduces latency), hooks directly into hadoop’s DFS and it’s written in Erlang which makes it F’ing Fast). Riak was inspired by Amazon’s Dynamo database engine (hmmm, could they be having similar problems to Pronto?) because Amazon’s relational DBs weren’t able to scale very well over a large number of machines.

Riak pushes multiple copies to its neighbors, so it tolerates multiple node failures. Reads and writes can be done to any node, searching the DB can be done by uploading a javascript snippet as a query, so perfect for a load-balanced and/or proxied environment.

I’m currently implementing Riak for a personal project of my own because I love key/value databases and I love having two databases (one local for development, and one remote for production) that sync over the Internet and are eventually consistent.

Tweet
submit to reddit