Scale your Amazon AppStream 2.0 fleets

AppStream 2.0 customers have told us that they appreciate the ability to scale their fleets based on user demand. In AppStream 2.0, you can scale your application streaming for any number of users across the globe without purchasing, provisioning, and operating on-premises hardware or infrastructure. You pay only for the streaming resources that you use, and a small fee per monthly authorized user.

This blog post describes the techniques that you can use to scale your AppStream 2.0 fleets. If you’re getting started with AppStream 2.0, we recommend using a Getting Started project before reading further. For more information, see Getting started with Amazon AppStream 2.0.

Using fleets in AppStream 2.0

An AppStream 2.0 fleet contains streaming instances launched with an image, instance type, domain, VPC, and scaling policies. The important points about a fleet:

Each streaming instance in a fleet supports a single streaming connection. The number of instances in a fleet at any time maps to the number of users that can be supported.
Each instance is based on the same image containing the same app catalog.
Instances in a fleet are non-persistent and terminated after each use. When a user’s streaming connection ends, the streaming instance connection established is terminated. Terminated instances are replaced in order to maintain the desired fleet size.
Instances which aren’t used for a session are automatically terminated and replaced after approximately a day.

A fleet can be created in On-Demand or Always-On modes. In On-Demand mode, instances in the fleet are in a stopped state waiting for a connection. Once a streaming request is assigned to an instance, the instance is started. Then, a connection is established with the user making the request. It takes 1–2 minutes for a connection to start when using an On-Demand fleet. Instances in a stopped state are charged a stopped fee, and instances to which there are connections are charged a running fee per hour. The stopped fee is the same across all instances in an AWS Region. The running fee changes based on instance type.

In Always-On mode, instances in the fleet are in a running state waiting for a connection. Once a streaming connection is assigned to an instance, the connection is immediately started. It usually takes about 10 seconds for a connection to start using an Always-On fleet. Instances in an Always-On fleet are always charged a running fee per hour. The running fee changes based on instance type. For more information about instance pricing, see Amazon AppStream 2.0 pricing.

Scaling policies

Scaling policies determine the size of a fleet. You can automatically increase or decrease the size of the fleet with step scaling, target tracking and schedule-based scaling policies. Scaling policies use fleet metrics as inputs for making changes to the fleet size. These metrics are collected in Amazon CloudWatch. These are four key metrics that AppStream 2.0 uses:

Actual Capacity – The total number of instances that are available for streaming or are currently streaming.
Capacity Utilization – The percentage of instances in a fleet that are being used. Use this metric to scale your fleet based on usage of the fleet.
Available Capacity – The number of instances in your fleet that are available for user sessions. Use this metric to maintain a buffer in your capacity available for users to start streaming sessions.
Desired Capacity – The number of instances in your fleet that are either running or pending. Desired Capacity can be increased or decreased by scaling policies in order to change the size of your fleet.

For more information about the metrics emitted by AppStream 2.0, see Monitoring Amazon AppStream 2.0 resources.

Step Scaling based on usage

You can define step scaling policies which increase or decrease fleet capacity by a specific number of instances or a percentage of current fleet size. These policies can be configured to use either Capacity Utilization, Available Capacity or Insufficient Capacity Error metrics to make automatic scaling decisions.

These utilization-based automatic scaling policies operate between two fleet capacity boundaries:

Minimum capacity – The minimum size of the fleet. Scaling policies do not scale your fleet below this value. For example, if you specify a minimum of 2, your fleet never has fewer than two instances available.
Maximum capacity– The maximum size of the fleet. Scaling policies do not scale your fleet above this value. For example, if you specify 4, your fleet never has more than four instances available.

You can create step scaling policies for your fleet by using the AWS Management Console, AWS SDK, or AWS Command Line Interface (AWS CLI). Begin by setting a minimum and maximum capacity for your fleet, as shown in the following example. You can enter this information in the AWS Management Console, on the Scaling Policies tab under AppStream 2.0 Fleets.

You can then create scale out policies to increase the fleet size when user demand grows. Likewise, create scale in policies to decrease the fleet size when user demand drops. The following section shows how to create an example scale out policy in the AWS Management Console.

Create an example scale out policy

Create a scale out policy by entering the following information:

Policy Name– A unique name for the scaling policy. For this example, it is set to default-scale-out.
If – The fleet metric that initiates a scaling trigger, either Capacity Utilization, Available Capacity or Insufficient Capacity Error. For this example, select Capacity Utilization.
Is– A condition that must be met to trigger a scaling action. For this example, Capacity Utilization > 75% is set as the trigger.
Then add – The scaling action to be performed. You can either increase or decrease the fleet capacity by an absolute number of instances or as a percentage of the fleet capacity. In general, it’s a best practice to add a % of capacity and not a fixed number of instances. This will ensure that your scaling actions are proportional to the size of your fleet. For this example, two instances are added.

Similar to the scale out policy, the following example uses a scale in policy to reduce fleet size when CapacityUtilization is low.

After these policies are set, they function on the AppStream 2.0 fleet to increase or desire fleet capacity, as shown in the following screenshot. The charts in the following examples plot the ActualCapacity in blue on the left axis. CapacityUtilization, the percentage of capacity in use, is plotted in brown on the right axis. This capacity information is shown as streaming connections are created and ended.

At 15:06 on November 6, the ActualCapacity is 2 and CapacityUtilization is 0 percent. There are no streaming connections active.

At 15:44 on November 6, the CapacityUtilization has increased to 100 percent. There are two active streaming connections using the entire ActualCapacity (2 instances). This triggers the default-scale-out policy, which adds two more instances.

At 15:57 on November 6, the CapacityUtilization has decreased to 50 percent while the ActualCapacity has increased to four.

At 16:18 on November 6, CapacityUtilization has dropped to 33 percent because one of the streaming connections has ended. This causes ActualCapacity to decrease from 4 to 3.

You can set policies to both scale out (increase) and scale in (decrease) the number of instances in a fleet. For more information about utilization-based scale out and scale in policies, see Fleet Auto Scaling for Amazon AppStream 2.0.

Scaling with Target Tracking

In November 2019, target tracking became available as another option for scaling AppStream 2.0 fleets.

With target tracking scaling, you specify a capacity utilization level for your fleet. Application Auto Scaling automatically creates and manages CloudWatch alarms that trigger the scaling policy. The scaling policy adds or removes capacity as required to keep capacity utilization at, or close to, the specified target value. To ensure application availability, your fleet scales out proportionally to the metric as fast as it can but scales in more gradually.

For example, you might create a target tracking policy with a target Capacity Utilization of 75%. Application Auto Scaling will add and remove capacity in an attempt to track this utilization target.

Currently, target tracking can be configured only through the AWS CLI or AWS SDK. For details on how to create a target tracking scaling policy, see the target tracking example in the Fleet Auto Scaling documentation.

Note: Do not edit or delete the CloudWatch alarms that are configured for the target tracking scaling policy. CloudWatch alarms that are associated with your target tracking scaling policies are managed by AWS and deleted automatically when no longer needed.

Schedule-based scaling

You can create scaling policies that set a desired fleet capacity based on a time-based schedule. You can scale policies to automatically increase or decrease the fleet size at a particular time of the day, between a given date range, or for every number of hours. Scheduled scaling policies for an AppStream 2.0 fleet can only be created or edited using AWS SDK or AWS CLI. For information about installing the AWS CLI, see Installing AWS CLI.

To create a scheduled action:

Register a scalable target with the Application Auto Scaling service by using the register-scalable-targetAPI operation. The scalable target is the AppStream 2.0 fleet resource whose capacity is adjusted. Perform this action only once.
```
$>aws application-autoscaling register-scalable-target --service-namespace appstream --resource-id fleet/sample-fleet --scalable-dimension appstream:fleet:DesiredCapacity --min-capacity 2 --max-capacity 5
```
Create a scheduled action against the AppStream 2.0 fleet by using the put-scheduled-actionAPI operation.
```
$> aws application-autoscaling put-scheduled-action --service-namespace appstream --scheduled-action-name ExamplePolicy --resource-id fleet/sample-fleet --scalable-dimension appstream:fleet:DesiredCapacity --scalable-target-action MinCapacity=2,MaxCapacity=5  --schedule <cron-expression> [--start-date] [--end-date]
```
The following parameters are part of put-scheduled-action:
1. service-namespace– The name of the service whose resources are scaled. This should be: appstream.
2. scheduled-action-name– The name of the scheduled action set to ExamplePolicy.
3. resource-id– The name of the AppStream 2.0 fleet. This should be set to fleet/sample-fleet. Replace sample-fleetwith your fleet name.
4. scalable-dimension– This is the AppStream 2.0 fleet capacity that you want to set. The value should be set to appstream:fleet:DesiredCapacity.
5. scalable-target-action– The scaling action to be performed. Set the desired values for minimum and maximum capacity for the fleet.
6. schedule– The repeated schedule when the scaling should happen. This can be a cron expression.
7. start-date and end-date– The date ranges in which the scaling policy is active.

Review the scheduled actions associated with the fleet by using the describe-scheduled-actionsAPI operation.

$>aws application-autoscaling describe-scheduled-actions --service-namespace appstream --resource-id fleet/sample-fleet

For more information about these API operations, see Application Auto Scaling CLI Reference or Application Auto Scaling API reference.

Schedule-based scaling example

Example Corp. is a software vendor that wants to use AppStream 2.0 to deliver online trials of their desktop application to a browser. Any customer can visit their website, register for an account, sign in, and start a trial. For this scenario:

Example Corp. wants to provide users with instant access to their application without any wait time. To do this, they use an always-on fleet.
Example Corp. expects the following usage patterns from their customers:
1. Weekdays – 8:00 to 20:00 – 50 to 100 users
2. Weekdays – 20:00 to 00:00 – 25 to 50 users
3. Weekdays – 00:00 – 8:00 – 10 to 25 users
4. Weekends – Throughout – 10 – 25 users
5. Auto scale based on demand to accommodate demand spikes during the time schedules.

Their scheduled actions would be as shown in the following example. Remember that the time input to automatic scaling API calls is in UTC timezone format. Convert your schedules to UTC timezone before making the API calls.

# Register the fleet as the scaling target with the application autoscaling service
$>aws application-autoscaling register-scalable-target --service-namespace appstream --resource-id fleet/samplefleet --scalable-dimension appstream:fleet:DesiredCapacity
# Scheduled actions 
$>aws application-autoscaling put-scheduled-action --service-namespace appstream --resource-id fleet/samplefleet --scheduled-action-name Policy1 --schedule="cron(0 0 8 ? * MON-FRI *)" --scalable-target-action MinCapacity=50,MaxCapacity=100 --scalable-dimension appstream:fleet:DesiredCapacity
$>aws application-autoscaling put-scheduled-action --service-namespace appstream --resource-id fleet/samplefleet --scheduled-action-name Policy2 --schedule="cron(0 0 21 ? * MON-FRI *)" --scalable-target-action MinCapacity=25,MaxCapacity=50 --scalable-dimension appstream:fleet:DesiredCapacity
$>aws application-autoscaling put-scheduled-action --service-namespace appstream --resource-id fleet/samplefleet --scheduled-action-name Policy3 --schedule="cron(0 0 0 ? * MON-FRI *)" --scalable-target-action MinCapacity=10,MaxCapacity=25 --scalable-dimension appstream:fleet:DesiredCapacity
$>aws application-autoscaling put-scheduled-action --service-namespace appstream --resource-id fleet/samplefleet --scheduled-action-name Policy4 --schedule="cron(0 0 0 ? * SAT-SUN *)" --scalable-target-action MinCapacity=10,MaxCapacity=25 --scalable-dimension appstream:fleet:DesiredCapacity
# To know the scheduled actions associated with a fleet 
$>aws application-autoscaling describe-scheduled-actions --service-namespace appstream --resource-id fleet/samplefleet

These scheduled actions increase the minimum and maximum capacity of fleets to desired values. Between the minimum and maximum values, they can also layer utilization-based scaling policies to scale out or scale in their AppStream 2.0 fleet based on user demand. To add the utilization based scaling policies, they can use the AWS Management Console.

Other Considerations

Many customers choose to combine different types of scaling policies in a single fleet to increase the power of Application Auto Scaling in AppStream 2.0. For example, you might configure a scheduled scaling policy to increase your fleet minimum at 6:00 am in anticipation of users starting their work day and decreases the fleet minimum at 4:00 pm before users stop working. You could combine this scheduled scaling policy with target tracking or step scaling policies to maintain a specific level of utilization and scale in or out during the day in order to handle spiky usage. The combination of scheduled scaling and target tracking scaling can help reduce the impact of a sharp increase in utilization levels, when capacity is needed immediately.
Consider whether your fleet might experience a high degree of “churn” (that is, a large number of users starting or ending sessions in a short period of time). This might occur when many users simultaneously access an application in your fleet for just a few minutes before signing off. In such situations, your fleet size may drop far below the Desired Capacity as instances are terminated when users end their sessions. You can identify churn by examining CloudWatch metrics for your fleet: periods of time when your fleet has non-zero Pending Capacity without change (or with very little change) in Desired Capacity indicate that high churn is likely occurring. In high churn situations, you should configure target tracking policies where (100 – target utilization percent) is more than churn rate in a 15-minute period. For example, if 10% of your fleet will be terminated in 15 minutes due to user turnover, set a capacity utilization target of 90% or less to offset high churn.
When using step scaling policies, we recommend that you add a % of capacity and not a fixed number of instances. This will ensure that your scaling actions are proportional to the size of your fleet. It will help to avoid situations where you scale out too slowly (because you are adding a small number of instances relative to your fleet size) or too many instances when your fleet is small.
Insufficient Capacity Error is a CloudWatch metric for AppStream 2.0 fleets. This metric specifies the number of session requests rejected due to lack of capacity. When you are making changes to your scaling policies, it is helpful to create a CloudWatch alarm to notify you when Insufficient Capacity Errors occur. This will allow you to quickly adjust your scaling policies to optimize availability for users.

Conclusion

This post provided techniques for scaling your AppStream 2.0 fleets using utilization-based scaling and schedule-based scaling policies. For more information about the information in this post, see:

Desktop and Application Streaming