Category: Java


Migrating your databases using AWS Database Migration Service

by Zhaoxi Zhang | on | in Java | Permalink | Comments |  Share

In this blog post, I will introduce a simple workflow using the AWS SDK for Java to perform a database migration with the AWS Database Migration Service (AWS DMS). AWS DMS helps you migrate databases to AWS easily and securely. With AWS DMS, the source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. 

  1. Create an AWS DMS service client.

    AWSDatabaseMigrationService dms = new AWSDatabaseMigrationServiceClient();
  2. Use the AWS CLI to create the IAM role you will use with the AWS DMS API.
    First, create a JSON file, dmsAssumeRolePolicyDocument.json, with the following policy:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "dms.amazonaws.com"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
        

    Second, use the following AWS CLI command to create the IAM role:

    $ aws iam create-role --role-name dms-vpc-role 
    --assume-role-policy-document file://dmsAssumeRolePolicyDocument.json
        

    Third, use the AWS CLI command to attach the AmazonDMSVPCManagementRole policy to the dms-vpc-role.

    aws iam attach-role-policy --role-name dms-vpc-role --policy-arn 
    arn:aws:iam::aws:policy/service-role/AmazonDMSVPCManagementRole 
        
  3. Create a replication instance.

    You need to create the replication instance in a subnet group, so you first need to create a replication subnet group. Keep in mind that at least two subnets in two different Availability Zones should be included in the subnet group. The replication instance should be specified in one of the Availability Zones.

    ReplicationSubnetGroup rsg = dms.createReplicationSubnetGroup(
            new CreateReplicationSubnetGroupRequest()
                    .withReplicationSubnetGroupDescription("foo description.")
                    .withSubnetIds(subnet1, subnet2)
                    .withReplicationSubnetGroupIdentifier(rsgIdentifier))
            .getReplicationSubnetGroup();
    

    The replication instance must have sufficient storage and processing power to perform the tasks you assign and migrate data from your source database to the target database. The size of this instance varies depending on the amount of data you need to migrate and the tasks you need the instance to perform.

    CreateReplicationInstanceResult replicationInstance = dms.createReplicationInstance(
            new CreateReplicationInstanceRequest()
                    .withReplicationSubnetGroupIdentifier(rsg.getReplicationSubnetGroupIdentifier())
                    .withAllocatedStorage(50)
                    .withReplicationInstanceClass("dms.t2.large")
                    .withReplicationInstanceIdentifier(replicationInstanceId)
                    .withAvailabilityZone(availabilityZone));
    
  4. Specify database endpoints.
    While your replication instance is being created, you can specify the source and target databases. The databases can be on an Amazon Elastic Compute Cloud (Amazon EC2) instance or an Amazon Relational Database Service (Amazon RDS) DB instance. They can also be on-premises databases.

    CreateEndpointResult sourceEndpoint = dms.createEndpoint(
            new CreateEndpointRequest()
                    .withDatabaseName(SOURCE_DATABASE_NAME)
                    .withEndpointIdentifier(SOURCE_ENDPOINT_IDENTIFIER)
                    .withEndpointType(ReplicationEndpointTypeValue.Source)
                    .withEngineName(SOURCE_ENGINE_NAME)
                    .withUsername(SOURCE_USER_NAME)
                    .withPassword(SOURCE_PASSWORD)
                    .withServerName(SOURCE_SERVER_NAME)
                    .withPort(SOURCE_PORT));
    
    CreateEndpointResult targetEndpoint = dms.createEndpoint(
            new CreateEndpointRequest()
                   .withDatabaseName(TARGET_DATABASE_NAME) 
                   .withEndpointIdentifier(TARGET_ENDPOINT_IDENTIFIER) 
                   .withEndpointType(ReplicationEndpointTypeValue.Target) 
                   .withEngineName(TARGET_ENGINE_NAME) 
                   .withUsername(TARGET_USER_NAME) 
                   .withPassword(TARGET_PASSWORD) 
                   .withServerName(TARGET_SERVER_NAME) 
                   .withPort(TARGET_PORT));
    
  5. Create a replication task.
    Create a task to specify which tables to migrate, to map data using a target schema, and to create new tables on the target database. As part of creating a task, you can choose the type of migration: migrate existing data (full load), migrate existing data and replicate ongoing changes (full load and cdc), or replicate data changes only (cdc). These are the three enum values for MigrationTypeValue. Using AWS DMS, you can specify a precise mapping of your data between the source and the target database. Before you specify your mapping, make sure you review the documentation section on data type mapping for your source and your target database.

    CreateReplicationTaskResult rtResult = dms.createReplicationTask(
                new CreateReplicationTaskRequest()
                    .withTableMappings(tableMapping)
                    .withMigrationType(MigrationTypeValue.FullLoad)
                    .withReplicationInstanceArn(replicationInstanceArn)
                    .withSourceEndpointArn(sourceEndpointArn)
                    .withTargetEndpointArn(targetEndpointArn)
                    .withTaskIdentifier(taskIdentifier)
    
  6. Execute the migration and monitor the status.
    To start the task, specify StartReplication for StartReplicationTaskType:

    StartReplicationTaskResult startReplicationTask = dms.startReplicationTask(
            new StartReplicationTaskRequest()
                .withStartReplicationTaskType(StartReplicationTaskTypeValue.StartReplication)
                .withReplicationTaskArn(replicationTaskArn));
    

    You can use the following code to check the migration status.

    ReplicationTaskStats stats = startReplicationTask.getReplicationTaskStats();

This is how a typical migration task is performed using the AWS SDK for Java. Of course, you can customize the migration, including the replication instance type, replication instance storage, the source and target database type, and so on. For more information about customization, see the AWS DMS documentation.

Using Amazon SQS with Spring Boot and Spring JMS

by Magnus Bjorkman | on | in Java | Permalink | Comments |  Share

By favoring convention over configuration, Spring Boot reduces complexity and helps you start writing applications faster. Spring Boot allows you to bootstrap a framework that abstracts away many of the recurring patterns used in application development. You can leverage the simplicity that comes with this approach when you use Spring Boot and Spring JMS with Amazon SQS. Spring can be used to manage things like polling, acknowledgement, failure recovery, and so on, so you can focus on implementing application functionality.

In this post, we will show you how to implement the messaging of an application that creates thumbnails. In this use case, a client system will send a request message through Amazon SQS that includes an Amazon S3 location for the image. The application will create a thumbnail from the image and then notify downstream systems about the new thumbnail that the application has uploaded to S3.

First we define the queues that we will use for incoming requests and sending out results:

Screenshots of Amazon SQS queue configurations

Here are a few things to note:

  • For thumbnail_requests, we need to make sure the Default Visibility Timeout matches the upper limit of the estimated processing time for the requests. Otherwise, the message might become visible again and retried before we have completed the image processing and acknowledged (that is, deleted) the message.
  • The Amazon SQS Client Libraries for JMS explicitly set the wait time when it polls to 20 seconds, so the Receive Message Wait Time setting for any of the queues will not change polling behavior.
  • Because failures can happen as part of processing the image, we define a Dead Letter Queue (DLQ ) for thumbnail_requests: thumbnail_requests_dlq. Messages will be moved to the DLQ after three failed attempts. Although you can implement a separate process that handles the messages that are put on that DLQ, that is beyond the scope of this post.

Now that the queues are created, we can build the Spring Boot application. We start by creating a Spring Boot application, either command-line or web application. Then we need to add some dependencies to make this work with Amazon SQS.

The following shows the additional dependencies in Maven:

    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring-jms</artifactId>
    </dependency>

     <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk</artifactId>
        <version>1.9.6</version>
    </dependency>

    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>amazon-sqs-java-messaging-lib</artifactId>
      <version>1.0.0</version>
      <type>jar</type>
    </dependency>

This will add the Spring JMS implementation, the AWS SDK for Java and the Amazon SQS Client Libraries for JMS. Next, we will configure Spring Boot with Spring JMS and Amazon SQS by defining a Spring Configuration class using the @Configuration annotation:

@Configuration
@EnableJms
public class JmsConfig {

    SQSConnectionFactory connectionFactory =
            SQSConnectionFactory.builder()
                    .withRegion(Region.getRegion(Regions.US_EAST_1))
                    .withAWSCredentialsProvider(new DefaultAWSCredentialsProviderChain())
                    .build();


    @Bean
    public DefaultJmsListenerContainerFactory jmsListenerContainerFactory() {
        DefaultJmsListenerContainerFactory factory =
                new DefaultJmsListenerContainerFactory();
        factory.setConnectionFactory(this.connectionFactory);
        factory.setDestinationResolver(new DynamicDestinationResolver());
        factory.setConcurrency("3-10");
        factory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
        return factory;
    }

    @Bean
    public JmsTemplate defaultJmsTemplate() {
        return new JmsTemplate(this.connectionFactory);
    }

}

Spring Boot will find the configuration class and instantiate the beans defined in the configuration. The @EnableJms annotation will make Spring JMS scan for JMS listeners defined in the source code and use them with the beans in the configuration:

  • The Amazon SQS Client Libraries for JMS provides the SQSConnectionFactory class, which implements the ConnectionFactory interface as defined by the JMS standard, allowing it to be used with standard JMS interfaces and classes to connect to SQS.

    • Using DefaultAWSCredentialsProviderChain will give us multiple options for providing credentials, including using the IAM Role of the EC2 instance.
  • The JMS listener factory will be used when we define the thumbnail service to listen to messages.

    • Using the DynamicDestinationResolver will allow us to refer to Amazon SQS queues by their names in later classes.
    • The values we provided to Concurrency show that we will create a minimum of 3 listeners that will scale up to 10 listeners.
    • Session.CLIENT_ACKNOWLEDGE will make Spring acknowledge (delete) the message after our service method is complete. If the method throws an exception, Spring will recover the message (that is, make it visible).
  • The JMS template will be used for sending messages.

Next we will define the service class that will listen to messages from our request queue.

@Service
public class ThumbnailerService {

    private Logger log = Logger.getLogger(ThumbnailerService.class);

    @Autowired
    private ThumbnailCreatorComponent thumbnailCreator;

    @Autowired
    private NotificationComponent notification;

    @JmsListener(destination = "thumbnail_requests")
    public void createThumbnail(String requestJSON) throws JMSException {
        log.info("Received ");
        try {
            ThumbnailRequest request=ThumbnailRequest.fromJSON(requestJSON);
            String thumbnailUrl=
                     thumbnailCreator.createThumbnail(request.getImageUrl());
            notification.thumbnailComplete(new ThumbnailResult(request,thumbnailUrl));
        } catch (IOException ex) {
            log.error("Encountered error while parsing message.",ex);
            throw new JMSException("Encountered error while parsing message.");
        }
    }

}

The @JmsListener annotation marks the createThumbnail method as the target of a JMS message listener. The method definition will be consumed by the processor for the @EnableJms annotation mentioned earlier. The processor will create a JMS message listener container using the container factory bean we defined earlier and the container will start to poll messages from Amazon SQS. As soon as it has received a message, it will invoke the createThumbnail method with the message content.

Here are some things to note:

  • We define the SQS queue to listen to (in this case thumbnail_requests) in the destination parameter.
  • If we throw an exception, the container will put the message back into the queue. In this example, the message will be eventually put onto the DLQ queue after three failed attempts.
  • We have autowired two components to process the image (ThumbnailCreatorComponent) and to send notifications (NotificationComponent) when the image has been processed.

The following shows the implementation of the NotificationComponent that is used to send a SQS message:

@Component
public class NotificationComponent {

    @Autowired
    protected JmsTemplate defaultJmsTemplate;

    public void thumbnailComplete(ThumbnailResult result) throws IOException {
        defaultJmsTemplate.convertAndSend("thumbnail_results", 
                                          result.toJSON());
    }

}

The component uses the JMS template defined in the configuration to send SQS messages. The convertAndSend method takes the name of the SQS queue that we want to send the message to.

JSON is the format in the body of our messages. We can easily convert that in our value objects. Here is an example with the class for the thumbnail request:

public class ThumbnailRequest {

    String objectId;
    String imageUrl;

    public String getObjectId() {
        return objectId;
    }

    public void setObjectId(String objectId) {
        this.objectId = objectId;
    }

    public String getImageUrl() {
        return imageUrl;
    }

    public void setImageUrl(String imageUrl) {
        this.imageUrl = imageUrl;
    }

    public static ThumbnailRequest fromJSON(String json) 
                               throws JsonProcessingException, IOException {
        ObjectMapper objectMapper=new ObjectMapper();
        return objectMapper.readValue(json, ThumbnailRequest.class);
    }
}

We will skip implementing the component for scaling the image. It’s not important for this demonstration.

Now we need to start the application and start sending and receiving messages. When everything is up and running, we can test the application by submitting a message like this on the thumbnail_requests:

{
    "objectId":"12345678abcdefg",
    "imageUrl":"s3://mybucket/images/image1.jpg"
}

To send a message, go to the SQS console and select the thumbnail_requests queue. Choose Queue Actions and then choose Send a Message. You should see something like this in thumbnail_results:

{
    "objectId":"12345678abcdefg",
    "imageUrl":"s3://mybucket/images/image1.jpg",
    "thumbnailUrl":"s3://mybucket/thumbnails/image1_thumbnail.jpg"
}

To see the message, go to the SQS console and select the thumbnail_results queue. Choose Queue Actions and then choose View/Delete Messages.

Parallelizing Large Uploads for Speed and Reliability

by Magnus Bjorkman | on | in Java | Permalink | Comments |  Share

As Big Data grows in popularity, it becomes more important to move large data sets to and from Amazon S3. You can improve the speed of uploads by parallelizing them. You can break an individual file into multiple parts and upload those parts in parallel by setting the following in the AWS SDK for Java:

TransferManager tx = new TransferManager(
    			new AmazonS3Client(new DefaultAWSCredentialsProviderChain()),
    			Executors.newFixedThreadPool(5));
    	
TransferManagerConfiguration config=new TransferManagerConfiguration();
config.setMultipartUploadThreshold(5*1024*1024);
config.setMinimumUploadPartSize(5*1024*1024);
tx.setConfiguration(config);

There are a few things to note:

  • When we create the TransferManager, we give it an execution pool of 5 threads. By default, the TransferManager creates a pool of 10, but you can set this to scale the pool size.
  • The TransferManagerConfiguration allows us to set the limits used by the AWS SDK for Java to break large files into smaller parts.
  • MultipartUploadThreshold defines the size at which the AWS SDK for Java should start breaking apart the files (in this case, 5 MiB).
  • MinimumUploadPartSize defines the minimum size of each part. It must be at least 5 MiB; otherwise, you will get an error when you try to upload it.

In addition to improved upload speeds, an advantage to doing this is that your uploads will become more reliable, because if you have a failure, it will occur on a small part of the upload, rather than the entire upload. The retry logic built into the AWS SDK for Java will try to upload the part again. You can control the retry policy when you create the S3 client.

ClientConfiguration clientConfiguration=new ClientConfiguration();
clientConfiguration.setRetryPolicy(
    			PredefinedRetryPolicies.getDefaultRetryPolicyWithCustomMaxRetries(5));
TransferManager tx = new TransferManager(
    			new AmazonS3Client(new DefaultAWSCredentialsProviderChain(),clientConfiguration),
    			Executors.newFixedThreadPool(5));

Here we change the default setting of 3 retry attempts to 5. You can implement your own back-off strategies and define your own retry-able failures.

Although this is useful for a single file, especially a large one, people often have large collections of files. If those files are close in size to the multipart threshold, you need to submit multiple files to the TransferManager at the same time to get the benefits of parallelization. This requires a little more effort but is straightforward. First, we define a list of uploads.

objectList.add(new PutObjectRequest("mybucket", "folder/myfile1.dat",
	    			new File("/localfolder/myfile1.dat));
objectList.add(new PutObjectRequest("mybucket", "folder/myfile2.dat",
	    			new File("/localfolder/myfile2.dat));

Then we can submit the files for upload:

CountDownLatch doneSignal = new CountDownLatch(objectList.size());
ArrayList uploads = new ArrayList();
for (PutObjectRequest object: objectList) {
	object.setGeneralProgressListener(new UploadCompleteListener(object.getFile(),object.getBucketName()+"/"+object.getKey(),doneSignal));
	uploads.add(tx.upload(object));
}
try {
	doneSignal.await();
} catch ( InterruptedException e ) {
        throw new UploadException("Couldn't wait for all uploads to be finished");
}

The upload command is simple: just call the upload method on the TransferManager. That method is not blocking, so it will just schedule the upload and immediately return. To track progress and figure out when the uploads are complete:

  • We use a CountDownLatch, initializing it to the number of files to upload.
  • We register a general progress listener with each PutObjectRequest, so we can capture major events, including completion and failures that will count down the CountDownLatch.
  • After we have submitted all of the uploads, we use the CountDownLatch to wait for the uploads to complete.

The following is a simple implementation of the progress listener that allows us to track the uploads. It also contains some print statements to allow us to see what is happening when we test this:

class UploadCompleteListener implements ProgressListener
{
	
	private static Log log = LogFactory.getLog(UploadCompleteListener.class);
	
	CountDownLatch doneSignal;
	File f;
	String target;
	
	public UploadCompleteListener(File f,String target,
									CountDownLatch doneSignal) {
		this.f=f;
		this.target=target;
		this.doneSignal=doneSignal;
	}
	
	public void progressChanged(ProgressEvent progressEvent) {
		if (progressEvent.getEventType() 
				== ProgressEventType.TRANSFER_STARTED_EVENT) {
        	log.info("Started to upload: "+f.getAbsolutePath()
        		+ " -> "+this.target);
        }
        if (progressEvent.getEventType()
        		== ProgressEventType.TRANSFER_COMPLETED_EVENT) {
        	log.info("Completed upload: "+f.getAbsolutePath()
        		+ " -> "+this.target);
        	doneSignal.countDown();
        }
        if (progressEvent.getEventType() == 
        		ProgressEventType.TRANSFER_FAILED_EVENT) {
        	log.info("Failed upload: "+f.getAbsolutePath()
        		+ " -> "+this.target);
        	doneSignal.countDown();
        }
    }
}

After you have finished, don’t forget to shut down the transfer manager.

tx.shutdownNow();

Another great option for moving very large data sets is the AWS Import/Export Snowball service, a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of the AWS cloud.

Deploying Java Applications on Elastic Beanstalk from Maven

by Zhaoxi Zhang | on | in Java | Permalink | Comments |  Share

The Beanstalker open source project now supports Java SE application development and deployment directly to AWS Elastic Beanstalk using the Maven archetype elasticbeanstalk-javase-archetype. With just a few commands in a terminal, you can create and deploy a Java SE application. This blog post provides step-by-step instructions for using this archetype.

First, in the terminal, type the mvn archetype:generate command. Use elasticbeanstalk as the filter, choose elasticbeanstalk-javase-archetype as the target archetype, and set the required properties when prompted. This screenshot shows the execution of this command.

$ mvn archetype:generate -Dfilter=elasticbeanstalk
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] >>> maven-archetype-plugin:2.3:generate (default-cli) >
 generate-sources @ standalone-pom >>>
[INFO] 
[INFO] <<< maven-archetype-plugin:2.3:generate (default-cli) <
 generate-sources @ standalone-pom <<<
[INFO] 
[INFO] --- maven-archetype-plugin:2.3:generate (default-cli) @
 standalone-pom ---
[INFO] Generating project in Interactive mode
[INFO] No archetype defined. Using maven-archetype-quickstart
 (org.apache.maven.archetypes:maven-archetype-quickstart:1.0)
Choose archetype:
1: remote -> 
br.com.ingenieux:elasticbeanstalk-docker-dropwizard-webapp-archetype
 (A Maven Archetype for Publishing Dropwizard-based Services on AWS'
 Elastic Beanstalk Service)
2: remote -> 
br.com.ingenieux:elasticbeanstalk-javase-archetype
 (A Maven Archetype Encompassing Jetty for Publishing Java SE
 Services on AWS' Elastic Beanstalk Service)
3: remote -> 
br.com.ingenieux:elasticbeanstalk-service-webapp-archetype
 (A Maven Archetype Encompassing RestAssured, Jetty, Jackson, Guice
 and Jersey for Publishing JAX-RS-based Services on AWS' Elastic
 Beanstalk Service)
4: remote -> 
br.com.ingenieux:elasticbeanstalk-wrapper-webapp-archetype
 (A Maven Archetype Wrapping Existing war files on AWS' Elastic
 Beanstalk Service)
Choose a number or apply filter
 (format: [groupId:]artifactId, case sensitive contains): : 2
Choose br.com.ingenieux:elasticbeanstalk-javase-archetype version: 
1: 1.4.3-SNAPSHOT
2: 1.4.3-foralula
Choose a number: 2: 1
Define value for property 'groupId': : org.demo.foo
Define value for property 'artifactId': : jettyjavase
Define value for property 'version':  1.0-SNAPSHOT: : 
Define value for property 'package':  org.demo.foo: : 
Confirm properties configuration:
groupId: org.demo.foo
artifactId: jettyjavase
version: 1.0-SNAPSHOT
package: org.demo.foo
 Y: : 
[INFO] ------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype:
 elasticbeanstalk-javase-archetype:1.4.3-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: org.demo.foo
[INFO] Parameter: artifactId, Value: jettyjavase
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: package, Value: org.demo.foo
[INFO] Parameter: packageInPathFormat, Value: org/demo/foo
[INFO] Parameter: package, Value: org.demo.foo
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: groupId, Value: org.demo.foo
[INFO] Parameter: artifactId, Value: jettyjavase
[INFO] project created from Archetype in dir: /current/directory/jettyjavase
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 24:07 min
[INFO] Finished at: 2016-02-19T09:53:54+08:00
[INFO] Final Memory: 14M/211M
[INFO] ------------------------------------------------------------------------

A new folder, jettyjavase, will be created under your current working directory. If you go to this folder and use the tree command to explore the structure, you will see the following:

$ tree
.
├── Procfile
├── pom.xml
└── src
    └── main
        ├── assembly
        │   └── zip.xml
        ├── java
        │   └── org
        │       └── demo
        │           └── foo
        │               └── Application.java
        └── resources
            └── index.html

8 directories, 5 files

The archetype uses java-se-jetty-maven, the sample project provisioned by AWS Elastic Beanstalk, as the template. This sample project generates a single executable jar file. Instructions that tell the server how to run the jar file are included in the Procfile.

As described here, there are basically three ways to deploy a Java SE application to AWS Elastic Beanstalk:

  • deploying a single executable jar.
  • deploying a single zip source bundle file that contains multiple jars and a Procfile.
  • deploying a single zip source bundle file that contains the source code, as-is, and a Buildfile.

This archetype supports the second option only. Before you deploy this sample project to Elastic Beanstalk, you might want to check the default configurations are what you want. You’ll find the default configurations in the <properties> section of the pom.xml file. 

  • beanstalk.s3Bucket and beanstalk.s3Key properties. The default Amazon S3 bucket used to store the zip source bundle file is configured as ${project.groupId}(in this case, for example, org.demo.foo). If you don’t have this bucket available in your S3 account, use the AWS CLI to create one, as shown here. You can also replace the default setting of beanstalk.s3Bucket to your preferred bucket. You must have permission to upload to the specified bucket.

    $ aws s3 mb s3://org.demo.foo
    make_bucket: s3://org.demo.foo/
    
  • mainApplication property. This is the application entry used to generate the executable jar file.
  • beanstalk.mainJar property. This is the main application compiled from your project and configured in the Procfile on the first row.
  • beanstalk.sourceBundle property. This is the final zip source bundle file. This property should not be changed because it is derived from the mavan-assembly-plugin file-naming logic.
  • beanstalk.artifactFile property. This is the target artifact file to be uploaded to S3 bucket.
  • beanstalk.solutionStack property. This is the solution stack used by the Java SE platform. You can change the default value, but make sure it is one of the supported solution stacks.
  • beanstalk.environmentName property. This is the name of the environment to which your application will be deployed. The default value is the artifact ID.
  • beanstalk.cnamePrefix property. The default value for the CNAME prefix is the artifact ID. Reconfigure it if the default value is already in use.

This archetype uses the maven-assembly-plugin to create the uber jar file and the zip source bundle file. Go to src/main/assembly/zip.xml and make changes, as necessary, for files you want to include or exclude from the zip source bundle file. Update the Procfile if you want to run multiple applications on the server. Make sure the first line in the file starts with "web: ".

After you have finished the configuration, you are ready to deploy the sample project to AWS Elastic Beanstalk. One single command, mvn -Ps3-deploy package deploy, will do the rest of work. If the sample project is deployed successfully, you will be able to access the http://jettyjavase.us-east-1.elasticbeanstalk.com/ endpoint and should see the Congratulations page. The following are the excerpts from the command output.

$ mvn -Ps3-deploy package deploy
[INFO] Scanning for projects...
...
[INFO] --- maven-assembly-plugin:2.2-beta-5:single (package-jar) @
 jettyjavase ---
...
[WARNING] Replacing pre-existing project main-artifact file:
 /current/directory/jettyjavase/target/jettyjavase-1.0-SNAPSHOT.jar
with assembly file:
 /current/directory/jettyjavase/target/jettyjavase-1.0-SNAPSHOT.jar
...
[INFO] --- maven-assembly-plugin:2.2-beta-5:single (package-zip) @
 jettyjavase ---
[INFO] Reading assembly descriptor: src/main/assembly/zip.xml
[INFO] Building zip:
 /current/directory/jettyjavase/target/jettyjavase-1.0-SNAPSHOT.zip
...
[INFO] --- beanstalk-maven-plugin:1.4.2:upload-source-bundle (deploy) @
 jettyjavase ---
[INFO] Target Path:
 s3://org.demo.foo/jettyjavase-1.0-SNAPSHOT-20160219040404.zip
[INFO] Uploading artifact file:
 /current/directory/jettyjavase/target/jettyjavase-1.0-SNAPSHOT.zip
  100.00% 945 KiB/945 KiB                        Done
[INFO] Artifact Uploaded
[INFO] SUCCESS
[INFO] null/void result
[INFO] 
[INFO] --- beanstalk-maven-plugin:1.4.2:create-application-version (deploy) @
 jettyjavase ---
[INFO] SUCCESS
[INFO] {
[INFO]   "applicationName" : "jettyjavase",
[INFO]   "description" : "Update from beanstalk-maven-plugin",
[INFO]   "versionLabel" : "20160219040404",
[INFO]   "sourceBundle" : {
[INFO]     "s3Bucket" : "org.demo.foo",
[INFO]     "s3Key" : "jettyjavase-1.0-SNAPSHOT-20160219040404.zip"
[INFO]   },
[INFO]   "dateCreated" : 1455854658750,
[INFO]   "dateUpdated" : 1455854658750
[INFO] }
[INFO] 
[INFO] --- beanstalk-maven-plugin:1.4.2:put-environment (deploy) @
 jettyjavase ---
[INFO] ... with cname set to 'jettyjavase.elasticbeanstalk.com'
[INFO] ... with status *NOT* set to 'Terminated'
[INFO] Environment Lookup
[INFO] ... with environmentId equal to 'e-pa2mn9iqkw'
[INFO] ... with status   set to 'Ready'
[INFO] ... with health equal to 'Green'
...
[INFO] SUCCESS
[INFO] {
[INFO]   "environmentName" : "jettyjavase",
[INFO]   "environmentId" : "e-pa2mn9iqkw",
[INFO]   "applicationName" : "jettyjavase",
[INFO]   "versionLabel" : "20160219040404",
[INFO]   "solutionStackName" : "64bit Amazon Linux 2015.09 v2.0.4 running Java 7",
[INFO]   "description" : "Java Sample Jetty App",
[INFO]   "dateCreated" : 1455854662359,
[INFO]   "dateUpdated" : 1455854662359,
[INFO]   "status" : "Launching",
[INFO]   "health" : "Grey",
[INFO]   "tier" : {
[INFO]     "name" : "WebServer",
[INFO]     "type" : "Standard",
[INFO]     "version" : " "
[INFO]   },
[INFO]   "cname" : "jettyjavase.us-east-1.elasticbeanstalk.com"
[INFO] }
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:14 min
[INFO] Finished at: 2016-02-19T12:09:19+08:00
[INFO] Final Memory: 26M/229M
[INFO] ------------------------------------------------------------------------

As you can see from the output and the pom.xml file, the s3-deploy profile consecutively executs three mojo commands, upload-source-bundle, create-application-version, and put-environment.

We welcome your feedback on the use of this Maven archetype. We want to add more features and continuously improve the user experience.

Event-driven architecture using Scala, Docker, Amazon Kinesis Firehose, and the AWS SDK for Java (Part 2)

by Sascha Moellering | on | in Java | Permalink | Comments |  Share

In the first part of this blog post, we used the AWS SDK for Java to create a Scala application to write data in Amazon Kinesis Firehose, Dockerized the application, and then tested and verified the application is working. Now we will roll out our Scala application in Amazon EC2 Container Service (ECS) and use the Amazon EC2 Container Registry (Amazon ECR) as our private Docker registry.

To roll out our application on Amazon ECS, we have to set up a private Docker registry and an Amazon ECS cluster. First, we have to create IAM roles for Amazon ECS. Before we can launch container instances and register them into a cluster, we must generate an IAM role for those container instances to use when they are launched. This requirement applies to container instances launched with the Amazon Machine Image (AMIoptimized for ECS or any other instances where you will run the agent.

aws iam create-role --role-name ecsInstanceRole --assume-role-policy-document file://<path_to_json_file>/ecsInstanceRole.json

ecsInstanceRole.json:

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

 

aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role --role-name ecsInstanceRole

We have to attach an additional policy so the ecsInstanceRole can pull Docker images from Amazon ECR:

aws iam put-role-policy --role-name ecsInstanceRole --policy-name ecrPullPolicy --policy-document file://<path_to_json_file>/ecrPullPolicy.json

ecrPullPolicy.json:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:BatchGetImage",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetAuthorizationToken"
            ],
            "Resource": "*"
        }
    ]
}

This role needs permission to write into our Amazon Kinesis Firehose stream, too:

aws iam put-role-policy --role-name ecsInstanceRole --policy-name firehosePolicy --policy-document file://<path_to_json_file>/firehosePolicy.json

firehosePolicy.json:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "firehose:DescribeDeliveryStream",
                "firehose:ListDeliveryStreams",
                "firehose:PutRecord",
                "firehose:PutRecordBatch"
            ],
            "Resource": [
                "arn:aws:firehose:aws-region:<account-ID>:deliverystream/<delivery-stream-name>"
            ]
        }
    ]
}

The Amazon ECS service scheduler makes calls on our behalf to the Amazon EC2 and Elastic Load Balancing APIs to register and deregister container instances with our load balancers. Before we can attach a load balancer to an Amazon ECS service, we must create an IAM role for our services to use. This requirement applies to any Amazon ECS service that we plan to use with a load balancer.

aws iam create-role --role-name ecsServiceRole --assume-role-policy-document file://<path_to_json_file>/ecsServiceRole.json

ecsServiceRole.json:

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "ecs.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

 

aws iam put-role-policy --role-name ecsServiceRole --policy-name ecsServicePolicy --policy-document file://<path_to_json_file>/ecsServicePolicy.json

ecsServicePolicy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticloadbalancing:Describe*",
        "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
        "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
        "ec2:Describe*",
        "ec2:AuthorizeSecurityGroupIngress"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Now we have set up the IAM roles and permissions required for a fully functional Amazon ECS cluster. Before setting up the cluster, we will create an ELB load balancer to be used for our akka-firehose service. The load balancer is called “akkaElb.It maps port 80 to port 80, and uses the specified subnets and security groups.

aws elb create-load-balancer --load-balancer-name akkaElb --listeners "Protocol=HTTP,LoadBalancerPort=80,InstanceProtocol=HTTP,InstancePort=80" --subnets subnet-a,subnet-b --security-groups sg-a --region us-east-1

The health check configuration of the load balancer contains information such as the protocol, ping port, ping path, response timeout, and health check interval.

aws elb configure-health-check --load-balancer-name akkaElb --health-check Target=HTTP:80/api/healthcheck,Interval=30,UnhealthyThreshold=5,HealthyThreshold=2,Timeout=3 --region us-east-1

We should enable connection draining for our load balancer to ensure it will stop sending requests to deregistering or unhealthy instances, while keeping existing connections open.

aws elb modify-load-balancer-attributes --load-balancer-name akkaElb --load-balancer-attributes "{"ConnectionDraining":{"Enabled":true,"Timeout":300}}" --region us-east-1

After setting up the load balancer, we can now create the Amazon ECS cluster:

aws ecs create-cluster --cluster-name "AkkaCluster" --region us-east-1

To set up our Amazon ECR repository correctly, we will sign in to Amazon ECR and receive a token that will be stored in /home/ec2-user/.docker/config.jsonThe token is valid for 24 hours.

aws ecr get-login --region us-east-1

To store the Docker image, we will create a repository in Amazon ECR:

aws ecr create-repository --repository-name akka-firehose --region us-east-1

Under most circumstances, the Docker image will be created at the end of a build process triggered by a continuous integration server like Jenkins. Therefore, we have to create a repository policy so that the Jenkins IAM role can push and pull Docker images from our newly created repository:

aws ecr set-repository-policy --repository-name akka-firehose --region us-east-1 --policy-text "{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "jenkins_push_pull",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account-id>:role/<Jenkins-role>"
            },
            "Action": [
                "ecr:DescribeRepositories",
                "ecr:GetRepositoryPolicy",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:ListImages",
                "ecr:BatchGetImage",
                "ecr:PutImage",
                "ecr:InitiateLayerUpload",
                "ecr:UploadLayerPart",
                "ecr:CompleteLayerUpload"
            ]
        }
    ]
}"

We have to add a similar repository policy for Amazon ECS because the Amazon EC2 instances in our Amazon ECS cluster have to be able to pull Docker images from our private Docker registry:

aws ecr set-repository-policy --repository-name akka-firehose --region us-east-1 --policy-text "{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "ecs_instance_pull",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account-id>:role/ecsInstanceRole"
            },
            "Action": [
                "ecr:DescribeRepositories",
                "ecr:GetRepositoryPolicy",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:ListImages",
                "ecr:BatchGetImage"
            ]
        }
    ]
}"

Now we can tag and push the Docker container into our repository in Amazon ECR:

docker tag akka-firehose <account-id>.dkr.ecr.us-east-1.amazonaws.com/akka-firehose


docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/akka-firehose

To populate our Amazon ECS cluster, we have to launch a few Amazon EC2 instances and register them in our cluster. It is important to choose either the Amazon Machine Image optimized for ECR or one for another operating system (such as CoreOS or Ubuntu) with the Amazon ECS container agent installed. (In this example, we will use the ECS-optimized AMI of Amazon Linux.) In the first step, we will create an instance profile and then attach the ecsInstanceRole to this profile:

aws iam create-instance-profile --instance-profile-name ecsServer


aws iam add-role-to-instance-profile --role-name ecsInstanceRole --instance-profile-name ecsServer

Now we will use the following user data script to launch a few EC2 instances in different subnets:

ecs-userdata.txt:

#!/bin/bash
yum update -y
echo ECS_CLUSTER=AkkaCluster >> /etc/ecs/ecs.config

This user data script updates the Linux packages of the Amazon EC2 instance and registers it in the Amazon ECS cluster. By default, the container instance is launched into your default cluster if you don’t specify another one.

aws ec2 run-instances --image-id ami-840e42ee --count 1 --instance-type t2.medium --key-name <your_ssh_key> --security-group-ids sg-a --subnet-id subnet-a --iam-instance-profile Name=ecsServer --user-data file://<path_to_user_data_file>/ecs-userdata.txt --region us-east-1


aws ec2 run-instances --image-id ami-840e42ee --count 1 --instance-type t2.medium --key-name <your_ssh_key> --security-group-ids sg-a --subnet-id subnet-b --iam-instance-profile Name=ecsServer --user-data file://<path_to_user_data_file>/ecs-userdata.txt --region us-east-1

Now we will register our task definition and service:

aws ecs register-task-definition --cli-input-json file://<path_to_json_file>/akka-firehose.json --region us-east-1

akka-firehose.json:

{
  "containerDefinitions": [
    {
      "name": "akka-firehose",
      "image": "<your_account_id>.dkr.ecr.us-east-1.amazonaws.com/akka-firehose",
      "cpu": 1024,
      "memory": 1024,
      "portMappings": [{
                      "containerPort": 8080,
                      "hostPort": 80
              }],
      "essential": true
    }
  ],
  "family": "akka-demo"
}

The task definition specifies which image you want to use, how many resources (CPU and RAM) are required and the port mappings between Docker container and host.

aws ecs create-service --cluster AkkaCluster --service-name akka-firehose-service --cli-input-json file://<path_to_json_file>/akka-elb.json --region us-east-1

akka-elb.json:

{
    "serviceName": "akka-firehose-service",
    "taskDefinition": "akka-demo:1",
    "loadBalancers": [
        {
            "loadBalancerName": "akkaElb",
            "containerName": "akka-firehose",
            "containerPort": 8080
        }
    ],
    "desiredCount": 2,
    "role": "ecsServiceRole"
}

Our service uses the task definition in version 1 and connects the containers on port 8080 to our previously defined ELB load balancer. The configuration specifies the desired number of services to two, so if we have registered two Amazon EC2 instances in our Amazon ECS cluster, each of them should run a service. After a short amount of time, the service should run successfully on the cluster. We can test the current setup by sending a POST request to our ELB load balancer:

curl -v -H "Content-type: application/json" -X POST -d '{"userId":100, "userName": "This is user data"}' http://<address_of_elb>.us-east-1.elb.amazonaws.com/api/user

After sending data to our application, we can list the files in the S3 bucket we created as a target for Amazon Kinesis Firehose:

aws s3 ls s3://<your_name>-firehose-target --recursive

In this blog post we created the infrastructure to roll out our Scala-based microservice in Amazon ECS and Amazon ECR. We hope we’ve given you ideas for creating your own Dockerized Scala-based applications in AWS. Feel free to share your ideas in the comments below! 

Event-driven architecture using Scala, Docker, Amazon Kinesis Firehose, and the AWS SDK for Java (Part 1)

by Sascha Moellering | on | in Java | Permalink | Comments |  Share

The key to developing a highly scalable architecture is to decouple functional parts of an application. In the context of an event-driven architecture, those functional parts are single-purpose event processing components (“microservices”). In this blog post, we will show you how to build a microservice using Scala, Akka, Scalatra, the AWS SDK for Java, and Docker. The application uses the AWS SDK for Java to write data into Amazon Kinesis Firehose. It can capture and automatically load streaming data into Amazon S3 and Amazon Redshift. Amazon S3 will be the target of the data we will put into a Firehose delivery stream.

In a two-part series, this blog post will cover the following topics:

Part 1: How to use the AWS SDK for Java to get started with Scala development, how to set up Amazon Kinesis Firehose, and how to test your application locally.

Part 2: How to use Amazon EC2 Container Service (Amazon ECS) and Amazon EC2 Container Registry (Amazon ECR) to roll out your Dockerized Scala application.

After you have downloaded your IDE, set up your AWS account, created an IAM user, and installed the AWS CLI, you can check out the example application from https://github.com/awslabs/aws-akka-firehose.

Accessing Java classes from Scala is no problem, but Scala has language features which can’t be applied to Java directly (for example, function types and traits) that can’t be applied to Java directly. The core of this application is an Akka actor that writes JSON data into Amazon Kinesis Firehose. Akka implements the actor model which is a model of concurrent programming. Actors receive messages and take actions based on those messages. With Akka, it is easy to build a distributed system using remote actors. In this example, the FirehoseActor receives a message from a REST interface that is written with Scalatra, a small and efficient, Sinatra-like web framework for Scala. It implements the servlet specification, so Scalatra apps can be deployed in Tomcat, Jetty or other servlet engines, or JavaEE application servers. To reduce dependencies and complexity, the application uses an embedded Jetty servlet engine that is bundled with the application. To bundle Jetty with our application, we have to add Jetty as dependency in build.scala:

"org.eclipse.jetty" % "jetty-webapp" % "9.2.10.v20150310" % "container;compile",

In this example, we use sbt and the sbt-assembly-plugin that was inspired by Maven’s assembly plugin to build a fat JAR containing all dependencies. We have to add sbt-assembly as a dependency in project/assembly.sbt and specify the main class in build.scala:

.settings(mainClass in assembly := Some("JettyLauncher"))

In this case, the main class is called JettyLauncher. It is responsible for bootstrapping the embedded Jetty servlet engine.

def main(args: Array[String]) {
    val port = if (System.getenv("PORT")!= null) System.getenv("PORT").toInt else 8080

    val server = new Server()
    val connector = new ServerConnector(server)
    connector.setHost("0.0.0.0");
    connector.setPort(port);
    server.addConnector(connector);

    val context = new WebAppContext()
    context setContextPath "/"
    context.setResourceBase("src/main/webapp")
    context.addEventListener(new ScalatraListener)
    context.addServlet(classOf[DefaultServlet], "/")
    server.setHandler(context)
    server.start
    server.join
  }

The ScalatraBootstrap file initializes the actor-system and mounts the FirehoseActorApp servlet under the context /api/:

class ScalatraBootstrap extends LifeCycle {
  val system = ActorSystem()
  val myActor = system.actorOf(Props[FireHoseActor])

  override def init(context: ServletContext) {
    context.mount(new FirehoseActorApp(system, myActor), "/api/*")
  }
}

The servlet exposes a REST API that accepts POST requests to /user with the parameters userId, userName, and timestamp. This API maps the passed values into a UserMessage object and sends this object as a message to the FireHoseActor.

class FirehoseActorApp(system: ActorSystem, firehoseActor: ActorRef) extends ScalatraServlet with JacksonJsonSupport {
  protected implicit lazy val jsonFormats: Formats = DefaultFormats
  implicit val timeout = new Timeout(2, TimeUnit.SECONDS)
  protected implicit def executor: ExecutionContext = system.dispatcher

  post("/user") {
    val userMessage = parsedBody.extract[UserMessage]
    firehoseActor ! userMessage
    Ok()
  }
}

This FirehoseActor uses the AWS SDK for Java to create an Amazon Kinesis Firehose client and send received messages asychronously to a Firehose stream:

def createFireHoseClient(): AmazonKinesisFirehoseAsyncClient = {
    log.debug("Connect to Firehose Stream: " + streamName)
    val client = new AmazonKinesisFirehoseAsyncClient
    val currentRegion = if (Regions.getCurrentRegion != null) Regions.getCurrentRegion else Region.getRegion(Regions.EU_WEST_1)
    client.withRegion(currentRegion)
    return client
  }


def sendMessageToFirehose(payload: ByteBuffer, partitionKey: String): Unit = {
   val putRecordRequest: PutRecordRequest = new PutRecordRequest
   putRecordRequest.setDeliveryStreamName(streamName)
   val record: Record = new Record
   record.setData(payload)
   putRecordRequest.setRecord(record)

   val futureResult: Future[PutRecordResult] = firehoseClient.putRecordAsync(putRecordRequest)

   try {
     val recordResult: PutRecordResult = futureResult.get
     log.debug("Sent message to Kinesis Firehose: " + recordResult.toString)
   }

   catch {
     case iexc: InterruptedException => {
       log.error(iexc.getMessage)
     }

     case eexc: ExecutionException => {
       log.error(eexc.getMessage)
     }
   }
 }

Using sbt to build the application is easy: The command sbt assembly compiles and builds a fat JAR containing all required libraries.

Now we should focus on setting up the infrastructure used by the application. First, we create the S3 bucket:

aws s3 mb --region us-east-1 s3://<your_name>-firehose-target --output json

Second, we create an IAM role to permit access to Firehose:

aws iam create-role --query "Role.Arn" --output json 
    --role-name FirehoseDefaultDeliveryRole 
    --assume-role-policy-document "{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PermitFirehoseAccess",
            "Effect": "Allow",
            "Principal": {
                "Service": "firehose.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}"

Now, we need to create an IAM policy in order to get access to the S3 bucket:

aws iam put-role-policy 
    --role-name FirehoseDefaultDeliveryRole 
    --policy-name FirehoseDefaultDeliveryPolicy 
    --policy-document "{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PermitFirehoseUsage",
            "Effect": "Allow",
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::<your_name>-firehose-target",
                "arn:aws:s3:::<your_name>-firehose-target/*"
            ]
        }
    ]
}"

The last step is to create the Firehose stream:

aws firehose create-delivery-stream --region eu-west-1 --query "DeliveryStreamARN" --output json 
    --delivery-stream-name firehose_stream 
    --s3-destination-configuration "RoleARN=<role_arn>,BucketARN=arn:aws:s3:::<your_name>-firehose-target"

To roll out the application in Amazon ECS, we need to build a Docker image containing the fat JAR and a JRE:

FROM phusion/baseimage

# Install Java.
RUN 
  echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections && 
  add-apt-repository -y ppa:webupd8team/java && 
  apt-get update && 
  apt-get install -y oracle-java8-installer && 
  rm -rf /var/lib/apt/lists/* && 
  rm -rf /var/cache/oracle-jdk8-installer

WORKDIR /srv/jetty

# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle

ADD target/scala-2.11/akka-firehose-assembly-*.jar srv/jetty/
CMD java -server 
   -XX:+DoEscapeAnalysis  
   -XX:+UseStringDeduplication -XX:+UseCompressedOops 
   -XX:+UseG1GC -jar srv/jetty/akka-firehose-assembly-*.jar

This Dockerfile it based on phusion-baseimage and installs Oracle’s JDK 8. It also sets the JAVA_HOME variable, copies the fat JAR to /srv/jetty, and starts using java -jar. Building the Docker image is pretty straightforward:

docker build -t smoell/akka-firehose .

There are two options for testing the application: by using the JAR file directly or by using the Docker container. To start the application by using the JAR file:

java -jar target/scala-2.11/akka-firehose-assembly-0.1.0.jar

With the following curl command, we can post data to our application to send data to our Firehose stream:

curl -v -H "Content-type: application/json" -X POST -d '{"userId":100, "userName": "This is user data"}' http://127.0.0.1:8080/api/user

The test looks a little bit different when we use the Docker container. First, we have to start the Firehose container and pass the access_key and secret_access_key as environment variables. (This is not necessary if we run on an EC2 instance, because the AWS SDK for Java uses the instance metadata.):

docker run --dns=8.8.8.8 --env AWS_ACCESS_KEY_ID="<your_access_key>" --env AWS_SECRET_ACCESS_KEY="<your_secret_access_key>" -p 8080:8080 smoell/akka-firehose

The curl command looks a little bit different this time, because we have to replace 127.0.0.1 with the IP address Docker is using on our local machine:

curl -v -H "Content-type: application/json" -X POST -d '{"userId":100, "userName": "This is user data"}' http://<you_docker_ip>:8080/api/user

After sending data to our application, we can list the files in the S3 bucket we’ve created as a target for Amazon Kinesis Firehose:

aws s3 ls s3://<your_name>-firehose-target --recursive

In this blog post, we used the AWS SDK for Java to create a Scala application to write data in Amazon Kinesis Firehose, Dockerized the application, and then testen and verified the application is working. In the second part of this blog post, we will roll out our application in Amazon ECS by using Amazon ECR as our private Docker registry.

Managing Dependencies in Gradle with AWS SDK for Java – Bill of Materials module (BOM)

by Manikandan Subramanian | on | in Java | Permalink | Comments |  Share

In an earlier blog post, I discussed how a Maven bill of materials (BOM) module can be used to manage your Maven dependencies on the AWS SDK for Java.

In this blog post, I will provide an example of how you can use the Maven BOM in your Gradle projects to manage the dependencies on the SDK. I will use an open source Gradle dependency management plugin from Spring to import a BOM and then use its dependency management.

Here is the build.gradle snippet to apply the dependency management plugin to the project:

buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath "io.spring.gradle:dependency-management-plugin:0.5.4.RELEASE"
    }
}

apply plugin: "io.spring.dependency-management"

Now, import the Maven BOM into the dependencyManagement section and specify the SDK modules in the dependencies section, as shown here:

dependencyManagement {
    imports {
        mavenBom 'com.amazonaws:aws-java-sdk-bom:1.10.47'
    }
}

dependencies {
    compile 'com.amazonaws:aws-java-sdk-s3'
    testCompile group: 'junit', name: 'junit', version: '4.11'
}

Gradle resolves the aws-java-sdk-s3 module to the version specified in the BOM, as shown in the following dependency resolution diagram.

 

Have you been using the AWS SDK for Java in Gradle? If so, please leave us your feedback in the comments.

Tuning the AWS SDK for Java to Improve Resiliency

by Andrew Shore | on | in Java | Permalink | Comments |  Share

In this blog post we will discuss why it’s important to protect your application from downstream service failures, offer advice for tuning configuration options in the SDK to fit the needs of your application, and introduce new configuration options that can help you set stricter SLAs on service calls.

Service failures are inevitable. Even AWS services, which are highly available and fault-tolerant, can have periods of increased latency or error rates. When there are problems in one of your downstream dependencies, latency increases, retries start, and generally API calls take longer to complete, if they complete at all. This can tie up connections, preventing other threads from using them, congest your application’s thread pool, and hold onto valuable system resources for a call or connection that may ultimately be doomed. If the AWS SDK for Java is not tuned correctly, then a single service dependency (even one that may not be critical to your application) can end up browning out or taking down your entire application. We will discuss techniques you can use to safeguard your application and show you how to find data to tune the SDK with the right settings.

Gathering Metrics

The metrics system in the AWS SDK for Java has several predefined metrics that give you insight into the performance of each of your AWS service dependencies. Metric data can be aggregated at the service level or per individual API action. There are several ways to enable the metrics system. In this post, we will take a programmatic approach on application startup.

To enable the metrics system, add the following lines to the startup code of your application.

AwsSdkMetrics.enableDefaultMetrics();

AwsSdkMetrics.setCredentialProvider(credentialsProvider);

AwsSdkMetrics.setMetricNameSpace("AdvancedConfigBlogPost");

Note: The metrics system is geared toward longer-lived applications. It uploads metric data to Amazon CloudWatch at one-minute intervals. If you are writing a simple program or test case to test-drive this feature, it may terminate before the metrics system has a chance to upload anything. If you aren’t seeing metrics in your test program, try adding a sleep interval of a couple of minutes before terminating to allow metrics to be sent to CloudWatch.

For more information about the features of the metrics system and other ways to enable it, see this blog post.

Interpreting Metrics to tune the SDK

After you have enabled the metrics system, the metrics will appear in the CloudWatch console under the namespace you’ve defined (in the preceding example, AdvancedConfigBlogPost).

Let’s take a look at the metrics one by one to see how the data can help us tune the SDK.

HttpClientGetConnectionTime: Time, in milliseconds, for the underlying HTTP client library to get a connection.

  • Typically, the time it takes to establish a connection won’t vary in a service (that is, all APIs in the same service should have similar SLAs for establishing a connection). For this reason, it is valid to look at this metric aggregated across each AWS service.
  • Use this metric to determine a reasonable value for the connection timeout setting in ClientConfiguration.

    • The default value for this setting is 50 seconds, which is unreasonably high for most production applications, especially those hosted within AWS itself and making service calls to the same region. Connection latencies, on average, are on the order of milliseconds, not seconds.

HttpClientPoolAvailableCount: Number of idle persistent connections of the underlying HTTP client. This metric is collected from the respective PoolStats before the connection of a request is obtained.

  • A high number of idle connections is typical of applications that perform a batch of work at intervals. For example, consider an application that uploads all files in a directory to Amazon S3 every five minutes. When the application is uploading files, it’s creating several connections to S3 and then does nothing with the service for five minutes. The connections are left in the pool with nothing to do and will eventually become idle. If this is the case for your application, and there are constantly idle connections in the pool that aren’t serving a useful purpose, you can tune the connectionMaxIdleMillis setting and use the idle connection reaper (enabled by default) to more aggressively purge these connections from the pool.
  • Setting the connectionMaxIdleMillis too low can result in having to establish connections more frequently, which can outweigh the benefits of freeing up system resources from idle connections. Take caution before acting on the data from this metric.
  • If your application does have a bursty workload and you find that the cost of establishing connections is more damaging to performance than keeping idle connections, you can also increase the connectionMaxIdleMillis setting to allow the connections to persist between periods of work.

    • Note: The connectionMaxIdleMillis will be limited to the Keep-Alive time specified by the service. For example if you set connectionMaxIdleMillis to five minutes but the service only keeps connections alive for sixty seconds, the SDK will still discard connections after sixty seconds when they are no longer usable.

HttpClientPoolPendingCount: Number of connection requests being blocked while waiting for a free connection of the underlying HTTP client. This metric is collected from the respective PoolStats before the connection of a request is obtained

  • A high value for this metric can indicate a problem with your connection pool size or improper handling of service failures.
  • If your usage of the client exceeds the ability of the default connection pool setting to satisfy your request you can increase the size of the connection pool through this setting.
  • Connection contention can also occur when a service is experiencing increased latency or error rates and the SDK is not tuned properly to handle it. Connections can quickly be tied up waiting for a response from a faulty server or waiting for retries per the configured retry policy. Increasing the connection pool size in this case might only make things worse by allowing the application to hog more threads trying to communicate with a service in duress. If you suspect this may be the case, look at the other metrics to see how you can tune the SDK to handle situations like this in a more robust way.

HttpRequestTime: Number of milliseconds for a logical request/response round-trip to AWS. Captured on a per request-type level.

  • This metric records the time it takes for a single HTTP request to a service. This metric can be recorded multiple times per operation, depending on the retry policy in use.
  • We’ve recently added a new configuration setting that allows you to specify a timeout on each underlying HTTP request made by the client. The SLAs for requests between APIs or even per request can vary widely, so it’s important to use the provided metrics and consider the timeout setting carefully.

    • This new setting can be specified per request or for the entire client (through ClientConfiguration). Although it’s hard to set a reasonable timeout on the client, it makes sense to set a default timeout on the client and override it per request, where needed.
    • By default, this feature is disabled.
    • Request timeouts are only supported in Java 7 and later.

Using the DynamoDB client as an example, let’s look at how you can use this new feature.

ClientConfiguration clientConfig = new ClientConfiguration();
clientConfig.setRequestTimeout(20 * 1000);
AmazonDynamoDBClient ddb = new AmazonDynamoDBClient(credentialsProvider, clientConfig);
// Will inherit 20 second request timeout from client level setting
ddb.listTables();
// Request timeout overridden on the request level
ddb.listTables(new ListTablesRequest().withSdkRequestTimeout(10 * 1000));
// Turns off request timeout for this request
ddb.listTables(new ListTablesRequest().withSdkRequestTimeout(-1));

ClientExecuteTime: Total number of milliseconds for a request/response including the time to execute the request handlers, the round-trip to AWS, and the time to execute the response handlers. Captured on a per request-type level.

  • This metric includes any time spent executing retries per the configured retry policy in ClientConfiguration.
  • We have just launched a new feature that allows you to specify a timeout on the entire execution time, which matches up very closely to the ClientExecuteTime metric.

    • This new timeout configuration setting can be set for the entire client in ClientConfiguration or per request.
    • By default, this feature is disabled.
    • Client execution timeouts are only supported in Java 7 and later.

Using the DynamoDB client as an example, let’s look at how you would enable a client execution timeout.

ClientConfiguration clientConfig = new ClientConfiguration();
clientConfig.setClientExecutionTimeout(20 * 1000);
AmazonDynamoDBClient ddb = new AmazonDynamoDBClient(credentialsProvider, clientConfig);
// Will inherit 20 second client execution timeout from client level setting
ddb.listTables();
// Client Execution timeout overridden on the request level
ddb.listTables(new ListTablesRequest().withSdkClientExecutionTimeout(10 * 1000));
// Turns off client execution timeout for this request
ddb.listTables(new ListTablesRequest().withSdkClientExecutionTimeout(-1));

The new settings for request timeouts and client execution timeouts are complementary. Using them together is especially useful because you can use client execution timeouts to set harder limits on the API’s total SLA and use request timeouts to prevent one bad request from consuming too much of your total time to execute.

ClientConfiguration clientConfig = new ClientConfiguration();
clientConfig.setClientExecutionTimeout(20 * 1000);
clientConfig.setRequestTimeout(5 * 1000);
// Allow as many retries as possible until the client execution timeout expires
clientConfig.setMaxErrorRetry(Integer.MAX_VALUE);
AmazonDynamoDBClient ddb = new AmazonDynamoDBClient(credentialsProvider, clientConfig);
// Will inherit timeout settings from client configuration. Each HTTP request 
// is allowed 5 second to complete and the SDK will retry as many times as 
// possible (per the retry condition in the retry policy) within the 20 second 
// client execution timeout
ddb.listTables();

Conclusion

Configuring the SDK with aggressive timeouts and appropriately sized connection pools goes a long way toward protecting your application from downstream service failures, but it’s not the whole story. There are many techniques you can apply on top of the SDK to limit the negative effects of a dependency’s outage on your application. Hystrix is an open source library specifically designed to make fault-tolerance easier and your application even more resilient. To use Hystrix to its fullest potential, you’ll need some data to tune it to match your actual service SLAs in your environment. The metrics we discussed in this blog post can give you that information. Hystrix also has an embedded metrics system that can complement the SDKs metrics.

We would love your feedback on the configuration options and metrics provided by the SDK and what you would like to see in the future. Do we provide enough settings and hooks to allow you to tune your application for optimal performance? Do we provide too many settings and make configuring the SDK overwhelming? Does the SDK provide you with enough information to intelligently handle service failures?

 

 

 

Building a serverless developer authentication API in Java using AWS Lambda, Amazon DynamoDB, and Amazon Cognito – Part 4

by Jason Fulghum | on | in Java | Permalink | Comments |  Share

In parts 1, 2, and 3 of this series, we used the AWS Toolkit for Eclipse to create a Java Lambda function. This function authenticated a user against an Amazon DynamoDB table (representing a directory of users) and then connected to Amazon Cognito to obtain an OpenID token. This token could then be used to obtain temporary AWS credentials for your mobile app. We tested the function locally in our development environment and used the AWS Toolkit for Eclipse to upload it to AWS Lambda. We will now integrate this Lambda function with Amazon API Gateway so it can be accessed through a RESTful interface from your applications.

First, let’s create a new API. Open the AWS Management Console. From the Application Services drop-down list, choose API Gateway. If this is your first API, choose Get Started. Otherwise, choose the blue Create API button. Type a name for this API (for example, AuthenticationAPI), and then choose the Create API button.

Now that your API has been created, you will see the following page, which shows, as expected, that there are no resources and no methods defined for this API.

We will create methods for our authentication API in a moment, but first, we will create models that define the data exchange for our API. The models are also used by API Gateway to create SDK classes for use in Android, iOS, and JavaScript clients.

To access the navigation menu, choose Resources. From the navigation menu, choose Models. By default, there are two models included with every API: Error and Empty. We will leave these models as they are and create two models that will define objects and attributes for use in our mobile app to send requests and interpret responses to and from our API.

To define the request model, choose the Create button. Set up the model as shown here:

Model Name: AuthenticationRequestModel
Content type: application/json 
Model Schema: 
{ 
  "$schema": "http://json-schema.org/draft-04/schema#", 
  "title": " AuthenticationRequestModel ", 
  "type": "object", 
  "properties": { 
    "userName": { 
      "type": "string" 
    }, 
    "passwordHash": { 
      "type": "string" 
    }
  } 
}

Choose the Create Model button. You may need to scroll down to see it.

Next, create the response model.

Model Name: AuthenticationResponseModel
Content type: application/json
Model Schema: 
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": " AuthenticationResponseModel ",
  "type": "object",
  "properties": 
  {
    "userId": { "type": "integer" },
    "openIdToken": { "type": "string" },
    "status": { "type": "string" }
 }
}

We need to create methods for our API. In true REST fashion, the URI path is composed of resources names. Each resource can have one or many HTTP methods. For our authentication example, we will add one resource, and then attach one method to it. Let’s start by creating a resource called “login.”

From the Models drop-down list, choose Resources.

Choose the Create Resource button. For the name of your resource, type login, and then choose the Create Resource button. This will refresh the dashboard. The new resource will appear in the navigation pane under the root of the URI path.

Choose the Create Method button. In the navigation pane, a drop-down list will appear under the login resource. From the drop-down list, choose POST, and then choose the check icon to confirm. The page will be updated, and the POST action will appear in the navigation pane. On the Setup page for your method, choose the Lambda Function option:

The page will be updated to display Lambda-specific options. From the Lambda Region drop-down list, choose the region (us-east-1) where you created the Lambda function. In the Lambda Function text box, type AuthenticateUser. This field will autocomplete. When a dialog box appears to show the API method has been given permission to invoke the Lambda function, choose OK.

The page shows the flow of the API method.

We will now set the request and response models for the login method. Choose Method Request. The panel will scroll and be updated. In the last row of the panel, choose the triangle next to Request Models. This will display a new panel. Choose Add Model. Choose the AuthenticationRequestModel you created earlier. Make sure you manually add “application/json” as the content type before you apply the changes, and then choose the check icon to confirm.

The procedure to add a model to our method response is slightly different. API Gateway lets you configure responses based on the HTTP response code of the REST invocation. We will restrict ourselves to HTTP 200 (the “OK” response code that signifies successful completion). Choose Method Response. In the response list that appears, choose the triangle next to the 200 response. The list of response models will appear on the right. The Empty model has been applied by default. Because multiple response models can be applied to a method, just as you did with the request model, add the AuthenticationResponseModel to the 200 response. Again, make sure to enter the content type (“application/json”) before you apply the change.

We now have an API resource method tied to a Lambda function that has request and response models defined. Let’s test this setup. In the navigation pane, choose the POST child entry of the /login resource. The flow screen will appear. Choose the Test icon.

Choose the triangle next to Response Body and type a sample JSON body (the same text used to test the Lambda function earlier).

{
   "username":"Dhruv",
   "passwordHash":"8743b52063cd84097a65d1633f5c74f5"
}

Choose the Test button. You may have to scroll down to see it. The function will be executed. The output should be the same as your response model.

The results show the HTTP response headers as well as a log snippet. The flow screen is a powerful tool for fine-tuning your API and testing use cases without having to deploy code.

At this point, you would continue to test your API, and when finished, use the Deploy API button on the Resources page to deploy it to production. Each deployment creates a stage, which is used to track the versions of your API. From the Stage page (accessible from the navigation menu), you can select a stage and then use the Stage Editor page to create a client SDK for your mobile app.

This wraps up our sample implementation of developer authenticated identities. By using the AWS Toolkit for Eclipse, you can do server-side coding and testing in Eclipse, and even deploy your server code, without leaving your IDE. You can test your Lambda code both in the IDE and again after deployment by using the console. In short, we’ve covered a lot of ground in the last four blog posts. We hope we’ve given you plenty of ideas for creating your own serverless applications in AWS. Feel free to share your ideas in the comments below!

AWS re:Invent 2015 Practical DynamoDB Working Together with AWS Lambda

by Zhaoxi Zhang | on | in Java | Permalink | Comments |  Share

Today, I’m excited to announce the Practical DynamoDB Programming in Java demo from AWS re:Invent 2015 is available on github. This project is used to demonstrate how Amazon DynamoDB can be used together with AWS Lambda to perform real-time and batch analysis of domain-specific data. Real-time analysis is performed by using DynamoDB streams as an event source of a Lambda function. Batch processing uses the parallel scan operation in DynamoDB to distribute work to Lambda.

To download the project from github, use:
git clone https://github.com/awslabs/reinvent2015-practicaldynamodb.git .

Follow the instructions in the README file and play with the demo code. You’ll see how simple it is to use the AWS Toolkit for Eclipse to upload AWS Lambda functions and invoke them with the AWS SDK for Java.