Amazon DevCon – Brian Aker

Brian Aker is the Director of Architecture at MySQL. Brian was invited to speak at Amazon DevCon and told that he could talk about anything. Brian chose Writing Storage Engines, because his favorite thing to talk aobut is the guts of MySQL and the storage engine. Brian is the author of lots of stuff. See Freshmeat…

A bit about MySQL: 5 million installations, 180 employees, 20+ countries. Most North American developers are in Seattle. MySQL is a high performance relational DB. Aiming to be SQL-99 compliant. Stable, scalable, usable, modular. This last characteristic makes MySQL different than most relational DBs, but similar to most open source technology. There are client support libraries in many languages. All MySQL code written in simplified C++ — don’t look inside the code if you’re a C++ purist.

Brian asked the crowd: How many people are using MySQL today? (quite a few hands went up) How many are using it but haven’t told their bosses yet? (fewer hands, but still some… this got a good laugh)

Overview of architecture

Kernel: Parser, Optimizer (cost-based), Storage Engine

The storage engine is separate from Optimizer. You don’t have to worry about what a join is. Don’t have to parse SQL syntax yourself. You get basic set of primitive operations.

Dozens of storage engines have been written. Some examples are:

MyISAM
Innodb – Typical transaction engine. Think about it as Oracle. Easy for Oracle DBAs to get it.
NDB – Different, designed for telecom environment. In telecom environment, you pick up the phone and you dial – response has to be instant. High availability also required. Storage arena with storage nodes in clustered environment. Just keep throwing more boxes at it and you’ll scale. Interconnects supported. All data and indexes in memory – it’s a memory hog.
HEAP – Design in env where you don’t care if DB goes down. Need fastest lookup possible. You throw data in it – if your data goes away, you know where else you can find it. Used in combination with MyISAM for Slashdot. Mixing and matching is useful.
Merge – Primitive partitioning engine.

The list of engines goes on and one. CSV engines, archiving engines, etc.

More on storage engines. We hand them data, and they store data formats on disk. Don’t make the assumption that there’s one way to get data in and out. There are many ways and you may need different storage engines based on your app. Operating systems dictate how the storage engine should behave.

Understanding storage engines and their architecture

Every row is made up of fields. Each field is an object. Optimize for the smallest amount of space possible. Integer, tiny integer, etc. Faster writes, faster reads, etc.

Knowledge of required methods for coding

Berkeley Engine – SQL interface to BDBs

Storage Engine Methods

There are four groups of methods in the storage engine interface.

1) Table control methods
::create() – whatever you need to do to create a table, i.e., write a description — see ha_example

::open() – a little more difficult, header code creates a char – for every table open, create a char – manage state of table (any level of lock – table, row, page, etc.), for the most part, don’t need to touch get char code – open any files you need to access

::close()

::delete() – can skip depending on how you code – pass in name of table

2 – Optimizer methods
::info() – Feed information to your engine so that it can decide whether to talk to indexes or scan all the data. It’s a cost-based optimizer. info() asks basic questions: What table is being deleted? How big are the records? How much data? How big are the indexes?

::records_in_range() – You have enough rows, there’s an index. Returns an estimate of how many rows you’ll touch based on the index. Cost optimizer tuned to 30%. It does a complete scan if it is already touching more than 30%, rather than using indexes. Can tweak that 30% number too.

3 – SQL Modifier methods
::delete_row()

::insert_row() Pass in write_row record. Let engines decide how timestamp will be applied. Allow you to do basic increments and sequences. Write out data however you need to write it.

::update_row() Handle old data and new data. Some storage engines can fix rows in place (fixed row format) and avoid updating all rows – basically non-existent fragmentation. Otherwise, dynamic indexing.

4 – SQL Read methods – two types

Scan Reads
::rnd_init()

::rnd_next()

::position()

::rnd_pos()

Index Reads – a little trickier
::index_read()

::index_next()

Brian showed this table Scan example that shows sequence of methods:

ha_example::store_lock
ha_example::external_lock
ha_example::info
ha_example::rnd_init
ha_example::extra Cash record in HA_rnd()
ha_example::rnd_next
ha_example::rnd_next
ha_example::rnd_next
ha_example::extra End cacheing of records(def)
ha_example::external_lock
ha_example::extra Reset database to after open

Brian said that to create a storage engine, you simply take the interface and write these very easy methods described above.

There are some more methods you can write:

Transaction methods
Bulk load methods
Defrag methods
And many more (read handler.h for more of the interface)

You’ll need to touch a few more files to get your storage engine going:

autoconf files
sql/Makefile.am
sql/handler.h
sql/mysql_priv.h
sql/handler.cc
sql/myseld.cc
sql/set_var.cc

You can also take advantage of the test case system — open source and easy to use, highly recommended for testing. See mysql-test/.

Other things to consider: Read only? Do you need durable data? Do you have a network database? If so, you can do optimizing for a federated environment.

Brian recommended looking into sql/ha_example if you want to explore more.

Q&A

Q: Roadmap for full text support?

A: Goal is to get full text searching performance up in 5.1. full text support not performing well in 4.1. Timeline is end of 2005 or first quarter of 2006.

Q: What’s the next big thing in 5.0?

A: 5.0 is views, triggers, stored procedures, XA support. additional precision map components. also federated support. 5.1 – NDB will have attribute storage for blogs, moving to hybrid replication model, XML/XPath support for native XML types, event based triggers inside MySQL, partition engine that can use any other storage engine underneath (partition based on range, hash, etc.).

AWS News Blog

Amazon DevCon – Brian Aker

Resources

Follow

Learn

Resources

Developers

Help