Using an Example of Sharding with Hibernate

Articles & Tutorials>Using an Example of Sharding with Hibernate
Amazon developer manager James Horsley provides an example application that shows how to use Amazon RDS and Hibernate sharding to help scale your database.

Details

Submitted By: JamesH@AWS
AWS Products Used: Amazon RDS
Language(s): Java, SQL, XML
Created On: March 10, 2011 11:22 PM GMT
Last Updated: March 10, 2011 11:22 PM GMT

Using an Example of Sharding with Hibernate

Overview

Amazon Relational Database Service (Amazon RDS) can be combined with sharding to help scale your database. In Scaling Databases With RDS we described how this combination works. Now we'll walk you through a practical example. In this article we'll build a code sample that shards news articles with Amazon RDS and Hibernate Shards. The news application needs to store and retrieve news articles across a range of topics. We will shard our articles based on their category. For our application we have seven categories, which we define as business, entertainment, health, science, sports, technology, and world.

Virtual Shards

In this example, we assume our application currently doesn't have enough load to need a physical shard for each category, but we want to plan ahead with growth in mind. To make future growth easier we make use of virtual shards. So our application code will act as if it has seven shards, but Hibernate will map those seven shards onto a smaller number of physical shards. Each physical shard will map to a MySQL database instance. By using this mapping we can distribute the load to best suit our needs. For our application assume that sports and entertainment generate as much load as the other five categories combined. These two categories will map to one physical shard and the other five categories will map to the other physical shard. The two physical shards will be mapped as follows.

RDS Shard Architecture Diagram

Creating Amazon RDS DB Instances

For our sample we need two database instances, one for each of our physical shards. To reduce the schema setup overhead we use the snapshot and restore capabilities of Amazon RDS to do the following:

  1. Create a single "seed" DB Instance.
  2. Set up the schema on that database.
  3. Snapshot the database.
  4. Create more databases from that snapshot using the RDS RestoreDBInstanceFromSnapshot API.

Create a Seed Database Instance

You can create Amazon RDS DB Instances through the AWS Management Console, the RDS command line interface (CLI), or the CreateDBInstance API. We'll use the CLI in this article. Directions for creating a DB Instance with the RDS CLI can be found on the Amazon Web Services Blog, but here's a sample create call we can use for our seed database:

rds-create-db-instance sports_entertainment \
    --engine mysql5.1 \
    --master-username my_username \
    --master-user-password - \
    --allocated-storage 5 \
    --db-instance-class db.m1.small \
    --db-name news

Create the Schema on the Seed DB Instance

Once the seed database is up and running you can connect to it using a standard SQL client and create the schema. In this example we use a simple, single tabled schema. The article_id column is going to be generated by Hibernate's SharedUUIDGenerator, so the column type will end up being fairly large.

CREATE TABLE articles (
    article_id numeric(64,0) NOT NULL PRIMARY KEY,
    category char(13),
    CHECK category IN ('BUSINESS', 'ENTERTAINMENT', 'HEALTH', 'SCIENCE', 'SPORTS', 'TECHNOLOGY', 'WORLD'),
    title char(128),
    submit_time timestamp
);

Take a Snapshot and Create Other DB Instances

We can now take a snapshot of the seed database. The snapshot will become the base image for our database shards.

rds-create-db-snapshot business_sciences_world --db-snapshot-identifier news-seed-database

Once the snapshot is available it can be used to create any number of DB Instances. We're only creating two DB Instances in this example, but it could easily be many more. Also, for our trivial setup, the only efficiency we gain is that we don't need to create the schema on each. However, in a typical scenario there might be a lot more setup involved in creating the seed (e.g., application configuration data, user accounts, permissions, etc.).

The following command can be used to create new DB Instances from the seed snapshot:

rds-restore-db-instance-from-db-snapshot business_sciences_world \
	--db-snapshot-identifier news-seed-database \
	--db-instance-class db.m1.small

Maven

If you're using Apache Maven, here is a pom.xml that should get you set up for the example application:

pom.xml

<project 
    xmlns="http://maven.apache.org/POM/4.0.0" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.amazonaws.rds</groupId> <artifactId>ShardingWithHibernate</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.hibernate</groupId> <artifactI>hibernate-core</artifactId> <version>3.3.2.GA</version> </dependency> <dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-annotations</artifactId> <version>3.4.0.GA</version> </dependency> <dependency> <groupId>hibernate</groupId> <artifactId>shards</artifactId> <version>3.0.0B2</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.12</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.8.1</version> </dependency> <dependency> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> <version>1.1.1</version> </dependency> <dependency> <groupId>javassist</groupId> <artifactId>;javassist</artifactId> <version>3.8.0.GA</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> <version>1.5.8</version> </dependency> </dependencies> <repositories> <repository> <id>HiveDB</id> <url>http://www.hivedb.org/maven/</url> </repository> </repositories> </project>

Hibernate Configuration and Setup

The Hibernate configuration for sharding is very similar to the non-sharded configuration. See the Hibernate Reference Docs for details.

shard.*.hibernate.cfg.xml

There needs to be one of these files for each physical database to which the application will connect. Each hibernate config file refers to a physical shard ID, typically starting at zero and incrementing for each new shard. See the Virtual Shards section of the Amazon RDS/EDS Tech Tips for how we're mapping virtual and physical shard IDs.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-configuration PUBLIC
        "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
        "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
 
<hibernate-configuration>
<session-factory>
    <property name="dialect">org.hibernate.dialect.MySQLInnoDBDialect</property>
    <property name="connection.driver_class">com.mysql.jdbc.Driver</property>
    <property
name="connection.url">jdbc:mysql://my_sports_entertainment_db_url/news</property>
    <property name="connection.username">my_username</property>
    <property name="connection.password">my_password</property>
    <property name="hibernate.connection.shard_id">0</property>
    <property name="hibernate.shard.enable_cross_shard_relationship_checks">
      true
    </property>
  </session-factory>
</hibernate-configuration>
<hibernate-configuration>
  <session-factory>
    <property name="dialect">org.hibernate.dialect.MySQLInnoDBDialect</property>
    <property name="connection.driver_class">com.mysql.jdbc.Driver</property>
    <property name="connection.url">jdbc:mysql://my_business_sciences_world_db_url/news</property>
    <propertyname="connection.username">my_username</property>
    <property name="connection.password">my_password</property>
    <property name="hibernate.connection.shard_id">1</property>
    <property name="hibernate.shard.enable_cross_shard_relationship_checks">
      true
    </property>
  </session-factory>
</hibernate-configuration>

SessionFactoryHelper.java

This is a helper class for constructing a shard-aware Hibernate session factory.

import java.io.File;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.hibernate.SessionFactory;
import org.hibernate.cfg.AnnotationConfiguration;
import org.hibernate.cfg.Configuration;
import org.hibernate.shards.ShardId;
import org.hibernate.shards.ShardedConfiguration;
import org.hibernate.shards.cfg.ConfigurationToShardConfigurationAdapter;
import org.hibernate.shards.cfg.ShardConfiguration;
import org.hibernate.shards.strategy.ShardStrategy;
import org.hibernate.shards.strategy.ShardStrategyFactory;
import org.hibernate.shards.strategy.ShardStrategyImpl;
import org.hibernate.shards.strategy.access.SequentialShardAccessStrategy;
import org.hibernate.shards.strategy.resolution.AllShardsShardResolutionStrategy;

public class StandardSessionFactoryHelper {

    private final File TS_ENTERTAINMENT_FILE =
        new File("src/main/resources/news/shard.sports_entertainment.hibernate.cfg.xml");
    
    private final File BUSINESS_SCIENCES_WORLD_FILE = 
        new File("src/main/resources/news/shard.business_sciences_world.cfg.xml");

    public SessionFactory createSessionFactory() {

        AnnotationConfiguration prototypeConfig = new AnnotationConfiguration()
                .configure(SPORTS_ENTERTAINMENT_FILE)
                .addAnnotatedClass(Article.class);

        ShardedConfiguration shardedConfig = new ShardedConfiguration(
                prototypeConfig,
                buildPhysicalShardConfigurations(),
                buildShardStrategyFactory(),
                buildVirtualShardMap());

        return shardedConfig.buildShardedSessionFactory();
    }

    private List<ShardConfiguration> buildPhysicalShardConfigurations() {
        return
            Arrays.asList(
                buildShardConfig(SPORTS_ENTERTAINMENT_FILE),
                buildShardConfig(BUSINESS_SCIENCES_WORLD_FILE)
            );
    }

    private Map<Integer, Integer> buildVirtualShardMap() {
        final int sportsEntertainmentPhysicalShardId = 0;
        final int businessSciencesWorldPhysicalShardId = 1;

        Map<Integer, Integer> virtualShardMap = new HashMap<Integer, Integer>();

        virtualShardMap.put( 
                getVirtualShardId(Category.ENTERTAINMENT), 
                                  sportsEntertainmentPhysicalShardId);        
        virtualShardMap.put( 
                getVirtualShardId(Category.SPORTS),             
                                  sportsEntertainmentPhysicalShardId);

        virtualShardMap.put( 
                getVirtualShardId(Category.BUSINESS),   
                                  businessSciencesWorldPhysicalShardId);        
        virtualShardMap.put( 
                getVirtualShardId(Category.HEALTH),
                                  businessSciencesWorldPhysicalShardId);
        virtualShardMap.put( 
                getVirtualShardId(Category.SCIENCE),    
                                  businessSciencesWorldPhysicalShardId);
        virtualShardMap.put( 
                getVirtualShardId(Category.TECHNOLOGY), 
                                  businessSciencesWorldPhysicalShardId);
        virtualShardMap.put( 
                getVirtualShardId(Category.WORLD    
                                  businessSciencesWorldPhysicalShardId);

        return Collections.unmodifiableMap(virtualShardMap);
    }

    private Integer getVirtualShardId(Category category) {
        return new CategoryShardSelector().getShardIdForCategory(category).getId();
    }

    private ShardStrategyFactory buildShardStrategyFactory() {
        return
            new ShardStrategyFactory() {
                public ShardStrategy newShardStrategy(List<ShardId> shardIds) {
                    return new ShardStrategyImpl(
                            new CategoryShardSelector(),
                            new AllShardsShardResolutionStrategy(shardIds),
                            new SequentialShardAccessStrategy());
                }
            };
    }

    private ShardConfiguration buildShardConfig(File configFile) {
        return
            new ConfigurationToShardConfigurationAdapter(
                    new Configuration().configure(configFile) );
    }

}	

Model/Entity

Category.java

In our example the application is sharding based on the category of an article. The list of categories is fairly stable so we can use an enum to enumerate the categories. The mapping of categories and virtual shard IDs is maintained in the SessionFactoryHelper.

public enum Category {
    BUSINESS, ENTERTAINMENT, HEALTH, SCIENCE, SPORTS, TECHNOLOGY, WORLD
}

Article.java

This is the main model/entity class for the application.

import java.math.BigInteger;
import java.util.Date;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.EnumType;
import javax.persistence.Enumerated;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.Table;

import org.hibernate.annotations.GenericGenerator;

@Entity
@Table(name = "articles")
public class Article {

    @Id
    @GeneratedValue( generator = "ArticleIdGenerator")
    @GenericGenerator( name = "ArticleIdGenerator",
                       strategy = "org.hibernate.shards.id.ShardedUUIDGenerator")
    @Column( name = "article_id" )
    private BigInteger articleId;

    @Enumerated(EnumType.STRING)
    @Column(name = "category")
    private Category category;

    @Column(name = "title")
    private String title;

    @Column(name = "submit_time")
    private Date submitTime;

    // Getters and setters
}

Application Specific Hibernate Sharding Code

ContinentBasedShardSelector.java

This class is used by Hibernate Shards to figure out which shard to put an object on. For our case it's a pretty simple mapping from an Article class's category value to a shard ID.

import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import org.hibernate.shards.ShardId;
import org.hibernate.shards.strategy.selection.ShardSelectionStrategy;

public class ContinentBasedShardSelector implements ShardSelectionStrategy {

    private final Map<Continent, ShardId> continentShardMap;

    public ContinentBasedShardSelector() {
        Map<Continent, ShardId> tempMap = new HashMap<Continent, ShardId>();
        tempMap.put(Continent.AFRICA,        new ShardId(0));
        tempMap.put(Continent.ANTARCTICA,    new ShardId(1));
        tempMap.put(Continent.ASIA,          new ShardId(2));
        tempMap.put(Continent.AUSTRALIA,     new ShardId(3));
        tempMap.put(Continent.EUROPE,        new ShardId(4));
        tempMap.put(Continent.NORTH_AMERICA, new ShardId(5));
        tempMap.put(Continent.SOUTH_AMERICA, new ShardId(6));

        continentShardMap = Collections.unmodifiableMap(tempMap);
    }

    @Override
    public ShardId selectShardIdForNewObject(Object obj) {
        if(obj instanceof WeatherReport) {
            return continentShardMap.get(((WeatherReport)obj).getContinent());
        }
        throw new IllegalArgumentException(
                "Can only shard " + WeatherReport.class.getName());
    }

}

Example Application

StandardShardingWithRdsApplication.java

This is a "driver" application to test our sharding setup. It saves an article for each category and then verifies that each article went to the correct physical database shard.

import static com.amazonaws.rds.samples.news.Category.BUSINESS;
import static com.amazonaws.rds.samples.news.Category.TECHNOLOGY;
import static com.amazonaws.rds.samples.news.Category.ENTERTAINMENT;
import static com.amazonaws.rds.samples.news.Category.SCIENCE;
import static com.amazonaws.rds.samples.news.Category.SPORTS;
import static com.amazonaws.rds.samples.news.Category.HEALTH;
import static com.amazonaws.rds.samples.news.Category.WORLD;
import static java.util.Arrays.asList;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import java.util.Date;
import java.util.List;

import org.hibernate.Session;
import org.hibernate.SessionFactory;
import org.hibernate.Transaction;
import org.junit.Assert;
import org.junit.Test;

public class ShardingWithRdsApplicationTest {

    private static final String JDBC_URL_SPORTS_ENTERTAINMENT = 
        "jdbc:mysql://my_sports_entertainment_db_url/news";
    private static final String JDBC_URL_BUSINESS_SCIENCES_WORLD = 
        "jdbc:mysql://my_business_sciences_world_db_url/news";
    
    private final String DATABASE_USERNAME = "my_username";
    private final String DATABASE_PASSWORD = "my_password";

    @Test
    public void testArticlesAreSavedToCorrectShards() throws Exception {

        StandardSessionFactoryHelper sessionFactoryHelper = 
            new StandardSessionFactoryHelper();
        SessionFactory sessionFactory = sessionFactoryHelper.createSessionFactory();

        Session session = sessionFactory.openSession();
        Transaction tx = session.beginTransaction();

        for (Category category : Category.values()) {
            session.save(createArticle(category));
        }

        tx.commit();

        session.close();

        verifyDatabaseContainsOnlyGivenCategories(
                asList(
BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, WORLD),
                       "JDBC_URL_SPORTS_ENTERTAINMENT);

        verifyDatabaseContainsOnlyGivenCategories(
                asList(HEALTH, SCIENCE),
                       "JDBC_URL_BUSINESS_SCIENCES_WORLD);
    }

    private Article createArticle(Category category) {
        Article article = new Article();
        article.setCategory(category);
        article.setTitle("Article about " + category);
        article.setSubmitTime(new Date());
        return article;
    }

    private void verifyDatabaseContainsOnlyGivenCategories(
            List expectedCategories, String databaseUrl)
    throws Exception {

        Class.forName("com.mysql.jdbc.Driver");

        Connection connection = null;
        Statement statement = null;
        ResultSet rs = null;

        try {
            connection = DriverManager.getConnection(databaseUrl, 
                                                     DATABASE_USERNAME, 
                                                     DATABASE_PASSWORD);

            statement = connection.createStatement();
            rs = statement.executeQuery("select * from articles");

            while (rs.next()) {
                Category categoryInDatabase = 
                    Category.valueOf(rs.getString("categories"));
                Assert.assertTrue(expectedCategories.contains(categoryInDatabase));
            }
        }
        finally {
            if (rs != null) { rs.close(); }
            if (statement != null) { statement.close(); }
            if (connection != null) { connection.close(); }
        }
    }

}
©2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.