AWS Official Blog

Amazon DevCon – Margo Seltzer – Another View

by Jeff Barr | on | | Comments

New Amazon CTO Werner Wogels, who called himself as a “recovering academic, introduced
Margo Seltzer from Sleepycat software and lauded Berkeley DB for its model of

“Try to do less, but better,” he told us, rather than trying to do everything, but poorly.

Margo went on to describe what it means for Berkeley DB to be an “enterprise data mangement” system, sketching out its high-level architecture and feature sets.

How do people use Berkeley DB? Margo listed a slide full of customers, from google to Nokia, Sun to Cisco.

Margo was happy to be able to actually tell people what she does. Programmers often have this problem, in explaining nitty-gritty technical work to everyday people.

She just tells people that when they go to the Amazon web page and look at items, she helped to do that.

Diving into technical details, Margo told us about how Berkeley DB is neither a relational database nor an object store. It has no intrinsic schema.

It doesn’t yet support partitioning — often useful at the terabytescale and beyond.

As for how data in such stores actually get accessed, almost nobody uses random access, Margo told us, even if developers claim otherwise. There’s pretty much always locality.

During the rest of the talk, Margo fielded highly technical questions from a number of Amazon developers. They wanted to know about practical limitations of BDB, how garbage collection affects performance, and about other detailed issues.

Margo was proud that Berkeley DB developers live, eat and breath the mantra “separate mechanism from policy”. That is, allow the application developer to specify policy, whether in concurrency or transactions, cache sizes or locking. Such flexibility allows for a wide range of developers to apply the technology to their own niches.

“The good news is that it’s flexible,” Margo told us. “The bad news is that it’s flexible.”

Amazon developers cared a lot about replication and performance, and Margo tried to field all the questions that we threw at her.

Towards the end, she told us about how to do DB replication. Do not build in replication from day one, she said. Get the transaction system working, add a communication infrastructure, then add the modification architecture.

She ended the talk with a cycle that we here at Amazon go through every single day: “Build, test, test, test, test and deploy”