Using an Example of Sharding with Hibernate
Amazon developer manager James Horsley provides an example application that shows how to use Amazon RDS and Hibernate sharding to help scale your database.
Submitted By: JamesH@AWS
AWS Products Used: Amazon RDS
Language(s): Java, SQL, XML
Created On: March 10, 2011
Using an Example of Sharding with Hibernate
Overview
Amazon Relational Database Service (Amazon RDS) can be combined with sharding to help scale your database. In Scaling Databases With RDS we described how this combination works. Now we'll walk you through a practical example. In this article we'll build a code sample that shards news articles with Amazon RDS and Hibernate Shards. The news application needs to store and retrieve news articles across a range of topics. We will shard our articles based on their category. For our application we have seven categories, which we define as business, entertainment, health, science, sports, technology, and world.
Virtual Shards
In this example, we assume our application currently doesn't have enough load to need a physical shard for each category, but we want to plan ahead with growth in mind. To make future growth easier we make use of virtual shards. So our application code will act as if it has seven shards, but Hibernate will map those seven shards onto a smaller number of physical shards. Each physical shard will map to a MySQL database instance. By using this mapping we can distribute the load to best suit our needs. For our application assume that sports and entertainment generate as much load as the other five categories combined. These two categories will map to one physical shard and the other five categories will map to the other physical shard. The two physical shards will be mapped as follows.

Creating Amazon RDS DB Instances
For our sample we need two database instances, one for each of our physical shards. To reduce the schema setup overhead we use the snapshot and restore capabilities of Amazon RDS to do the following:
- Create a single "seed" DB Instance.
- Set up the schema on that database.
- Snapshot the database.
- Create more databases from that snapshot using the RDS RestoreDBInstanceFromSnapshot API.
Create a Seed Database Instance
You can create Amazon RDS DB Instances through the AWS Management Console, the RDS command line interface (CLI), or the CreateDBInstance API. We'll use the CLI in this article. Directions for creating a DB Instance with the RDS CLI can be found on the Amazon Web Services Blog, but here's a sample create call we can use for our seed database:
rds-create-db-instance sports_entertainment \ --engine mysql5.1 \ --master-username my_username \ --master-user-password - \ --allocated-storage 5 \ --db-instance-class db.m1.small \ --db-name news
Create the Schema on the Seed DB Instance
Once the seed database is up and running you can connect to it using a standard SQL client and create the schema. In this example we use a simple, single tabled schema. The article_id column is going to be generated by Hibernate's SharedUUIDGenerator, so the column type will end up being fairly large.
CREATE TABLE articles ( article_id numeric(64,0) NOT NULL PRIMARY KEY, category char(13), CHECK category IN ('BUSINESS', 'ENTERTAINMENT', 'HEALTH', 'SCIENCE', 'SPORTS', 'TECHNOLOGY', 'WORLD'), title char(128), submit_time timestamp );
Take a Snapshot and Create Other DB Instances
We can now take a snapshot of the seed database. The snapshot will become the base image for our database shards.
rds-create-db-snapshot business_sciences_world --db-snapshot-identifier news-seed-database
Once the snapshot is available it can be used to create any number of DB Instances. We're only creating two DB Instances in this example, but it could easily be many more. Also, for our trivial setup, the only efficiency we gain is that we don't need to create the schema on each. However, in a typical scenario there might be a lot more setup involved in creating the seed (e.g., application configuration data, user accounts, permissions, etc.).
The following command can be used to create new DB Instances from the seed snapshot:
rds-restore-db-instance-from-db-snapshot business_sciences_world \ --db-snapshot-identifier news-seed-database \ --db-instance-class db.m1.small
Maven
If you're using Apache Maven, here is a pom.xml that should get you set up for the example application:
pom.xml
xmlns="https://maven.apache.org/POM/4.0.0" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://maven.apache.org/POM/4.0.0
https://maven.apache.org/xsd/maven-4.0.0.xsd">4.0.0 com.amazonaws.rds ShardingWithHibernate 0.0.1-SNAPSHOT org.hibernate hibernate-core 3.3.2.GA org.hibernate hibernate-annotations 3.4.0.GA hibernate shards 3.0.0B2 mysql mysql-connector-java 5.1.12 junit junit 4.8.1 commons-logging commons-logging 1.1.1 javassist ;javassist 3.8.0.GA org.slf4j slf4j-log4j12 1.5.8 HiveDB https://www.hivedb.org/maven/
Hibernate Configuration and Setup
The Hibernate configuration for sharding is very similar to the non-sharded configuration. See the Hibernate Reference Docs for details.
shard.*.hibernate.cfg.xml
There needs to be one of these files for each physical database to which the application will connect. Each hibernate config file refers to a physical shard ID, typically starting at zero and incrementing for each new shard. See the Virtual Shards section of the Amazon RDS/EDS Tech Tips for how we're mapping virtual and physical shard IDs.
version="1.0" encoding="UTF-8"?> PUBLIC "-//Hibernate/Hibernate Configuration DTD 3.0//EN" "https://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">name="dialect">org.hibernate.dialect.MySQLInnoDBDialect name="connection.driver_class">com.mysql.jdbc.Driver name="connection.url">jdbc:mysql://my_sports_entertainment_db_url/news name="connection.username">my_username name="connection.password">my_password name="hibernate.connection.shard_id">0 name="hibernate.shard.enable_cross_shard_relationship_checks"> true name="dialect">org.hibernate.dialect.MySQLInnoDBDialect> name="connection.driver_class">com.mysql.jdbc.Driver name="connection.url">jdbc:mysql://my_business_sciences_world_db_url/news name="connection.username">my_username name="connection.password">my_password name="hibernate.connection.shard_id">1 name="hibernate.shard.enable_cross_shard_relationship_checks"> true
SessionFactoryHelper.java
This is a helper class for constructing a shard-aware Hibernate session factory.
import java.io.File; import java.util.Arrays; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; import org.hibernate.SessionFactory; import org.hibernate.cfg.AnnotationConfiguration; import org.hibernate.cfg.Configuration; import org.hibernate.shards.ShardId; import org.hibernate.shards.ShardedConfiguration; import org.hibernate.shards.cfg.ConfigurationToShardConfigurationAdapter; import org.hibernate.shards.cfg.ShardConfiguration; import org.hibernate.shards.strategy.ShardStrategy; import org.hibernate.shards.strategy.ShardStrategyFactory; import org.hibernate.shards.strategy.ShardStrategyImpl; import org.hibernate.shards.strategy.access.SequentialShardAccessStrategy; import org.hibernate.shards.strategy.resolution.AllShardsShardResolutionStrategy; public class StandardSessionFactoryHelper { private final File TS_ENTERTAINMENT_FILE = new File("src/main/resources/news/shard.sports_entertainment.hibernate.cfg.xml"); private final File BUSINESS_SCIENCES_WORLD_FILE = new File("src/main/resources/news/shard.business_sciences_world.cfg.xml"); public SessionFactory createSessionFactory() { AnnotationConfiguration prototypeConfig = new AnnotationConfiguration() .configure(SPORTS_ENTERTAINMENT_FILE) .addAnnotatedClass(Article.class); ShardedConfiguration shardedConfig = new ShardedConfiguration( prototypeConfig, buildPhysicalShardConfigurations(), buildShardStrategyFactory(), buildVirtualShardMap()); return shardedConfig.buildShardedSessionFactory(); } private ListbuildPhysicalShardConfigurations() { return Arrays.asList( buildShardConfig(SPORTS_ENTERTAINMENT_FILE), buildShardConfig(BUSINESS_SCIENCES_WORLD_FILE) ); } private Map buildVirtualShardMap() { final int sportsEntertainmentPhysicalShardId = 0; final int businessSciencesWorldPhysicalShardId = 1; Map virtualShardMap = new HashMap (); virtualShardMap.put( getVirtualShardId(Category.ENTERTAINMENT), sportsEntertainmentPhysicalShardId); virtualShardMap.put( getVirtualShardId(Category.SPORTS), sportsEntertainmentPhysicalShardId); virtualShardMap.put( getVirtualShardId(Category.BUSINESS), businessSciencesWorldPhysicalShardId); virtualShardMap.put( getVirtualShardId(Category.HEALTH), businessSciencesWorldPhysicalShardId); virtualShardMap.put( getVirtualShardId(Category.SCIENCE), businessSciencesWorldPhysicalShardId); virtualShardMap.put( getVirtualShardId(Category.TECHNOLOGY), businessSciencesWorldPhysicalShardId); virtualShardMap.put( getVirtualShardId(Category.WORLD businessSciencesWorldPhysicalShardId); return Collections.unmodifiableMap(virtualShardMap); } private Integer getVirtualShardId(Category category) { return new CategoryShardSelector().getShardIdForCategory(category).getId(); } private ShardStrategyFactory buildShardStrategyFactory() { return new ShardStrategyFactory() { public ShardStrategy newShardStrategy(List shardIds) { return new ShardStrategyImpl( new CategoryShardSelector(), new AllShardsShardResolutionStrategy(shardIds), new SequentialShardAccessStrategy()); } }; } private ShardConfiguration buildShardConfig(File configFile) { return new ConfigurationToShardConfigurationAdapter( new Configuration().configure(configFile) ); } }
Model/Entity
Category.java
In our example the application is sharding based on the category of an article. The list of categories is fairly stable so we can use an enum to enumerate the categories. The mapping of categories and virtual shard IDs is maintained in the SessionFactoryHelper.
public enum Category { BUSINESS, ENTERTAINMENT, HEALTH, SCIENCE, SPORTS, TECHNOLOGY, WORLD }
Article.java
This is the main model/entity class for the application.
import java.math.BigInteger; import java.util.Date; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.EnumType; import javax.persistence.Enumerated; import javax.persistence.GeneratedValue; import javax.persistence.Id; import javax.persistence.Table; import org.hibernate.annotations.GenericGenerator; @Entity @Table(name = "articles") public class Article { @Id @GeneratedValue( generator = "ArticleIdGenerator") @GenericGenerator( name = "ArticleIdGenerator", strategy = "org.hibernate.shards.id.ShardedUUIDGenerator") @Column( name = "article_id" ) private BigInteger articleId; @Enumerated(EnumType.STRING) @Column(name = "category") private Category category; @Column(name = "title") private String title; @Column(name = "submit_time") private Date submitTime; // Getters and setters }
Application Specific Hibernate Sharding Code
ContinentBasedShardSelector.java
This class is used by Hibernate Shards to figure out which shard to put an object on. For our case it's a pretty simple mapping from an Article class's category value to a shard ID.
import java.util.Collections; import java.util.HashMap; import java.util.Map; import org.hibernate.shards.ShardId; import org.hibernate.shards.strategy.selection.ShardSelectionStrategy; public class ContinentBasedShardSelector implements ShardSelectionStrategy { private final MapcontinentShardMap; public ContinentBasedShardSelector() { Map tempMap = new HashMap (); tempMap.put(Continent.AFRICA, new ShardId(0)); tempMap.put(Continent.ANTARCTICA, new ShardId(1)); tempMap.put(Continent.ASIA, new ShardId(2)); tempMap.put(Continent.AUSTRALIA, new ShardId(3)); tempMap.put(Continent.EUROPE, new ShardId(4)); tempMap.put(Continent.NORTH_AMERICA, new ShardId(5)); tempMap.put(Continent.SOUTH_AMERICA, new ShardId(6)); continentShardMap = Collections.unmodifiableMap(tempMap); } @Override public ShardId selectShardIdForNewObject(Object obj) { if(obj instanceof WeatherReport) { return continentShardMap.get(((WeatherReport)obj).getContinent()); } throw new IllegalArgumentException( "Can only shard " + WeatherReport.class.getName()); } }
Example Application
StandardShardingWithRdsApplication.java
This is a "driver" application to test our sharding setup. It saves an article for each category and then verifies that each article went to the correct physical database shard.
import static com.amazonaws.rds.samples.news.Category.BUSINESS; import static com.amazonaws.rds.samples.news.Category.TECHNOLOGY; import static com.amazonaws.rds.samples.news.Category.ENTERTAINMENT; import static com.amazonaws.rds.samples.news.Category.SCIENCE; import static com.amazonaws.rds.samples.news.Category.SPORTS; import static com.amazonaws.rds.samples.news.Category.HEALTH; import static com.amazonaws.rds.samples.news.Category.WORLD; import static java.util.Arrays.asList; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; import java.util.Date; import java.util.List; import org.hibernate.Session; import org.hibernate.SessionFactory; import org.hibernate.Transaction; import org.junit.Assert; import org.junit.Test; public class ShardingWithRdsApplicationTest { private static final String JDBC_URL_SPORTS_ENTERTAINMENT = "jdbc:mysql://my_sports_entertainment_db_url/news"; private static final String JDBC_URL_BUSINESS_SCIENCES_WORLD = "jdbc:mysql://my_business_sciences_world_db_url/news"; private final String DATABASE_USERNAME = "my_username"; private final String DATABASE_PASSWORD = "my_password"; @Test public void testArticlesAreSavedToCorrectShards() throws Exception { StandardSessionFactoryHelper sessionFactoryHelper = new StandardSessionFactoryHelper(); SessionFactory sessionFactory = sessionFactoryHelper.createSessionFactory(); Session session = sessionFactory.openSession(); Transaction tx = session.beginTransaction(); for (Category category : Category.values()) { session.save(createArticle(category)); } tx.commit(); session.close(); verifyDatabaseContainsOnlyGivenCategories( asList( BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, WORLD), "JDBC_URL_SPORTS_ENTERTAINMENT); verifyDatabaseContainsOnlyGivenCategories( asList(HEALTH, SCIENCE), "JDBC_URL_BUSINESS_SCIENCES_WORLD); } private Article createArticle(Category category) { Article article = new Article(); article.setCategory(category); article.setTitle("Article about " + category); article.setSubmitTime(new Date()); return article; } private void verifyDatabaseContainsOnlyGivenCategories( ListexpectedCategories, String databaseUrl) throws Exception { Class.forName("com.mysql.jdbc.Driver"); Connection connection = null; Statement statement = null; ResultSet rs = null; try { connection = DriverManager.getConnection(databaseUrl, DATABASE_USERNAME, DATABASE_PASSWORD); statement = connection.createStatement(); rs = statement.executeQuery("select * from articles"); while (rs.next()) { Category categoryInDatabase = Category.valueOf(rs.getString("categories")); Assert.assertTrue(expectedCategories.contains(categoryInDatabase)); } } finally { if (rs != null) { rs.close(); } if (statement != null) { statement.close(); } if (connection != null) { connection.close(); } } } }