AWS Compute Blog

Archiving and replaying messages with Amazon SNS FIFO

This post is written by Mohammed Atiq, Solutions Architect and Mithun Mallick, Principal Solutions Architect, Serverless

Amazon Simple Notification Service (SNS) offers a flexible, fully managed messaging service, allowing applications to send and receive messages. SNS acts as a channel, delivering events from publishers to subscribers.

Today, AWS is announcing a new capability that enables you to archive and replay messages published to SNS FIFO (first-in first-out) topics. Now, when enabled with an archive policy, SNS FIFO topics automatically:

  • Archives events, with a no-code, in-place message archive that doesn’t require any external resources. You only need to define an archive policy on your topic, including the required retention period (from 1 to 365 days).
  • Replays events: subscribers benefit from a managed, no-code message replay functionality, with built-in progress reporting and message filtering capabilities. To initiate a replay, subscribers simply apply a replay policy to their subscription, defining a starting point and an ending point using timestamps.

This feature can be useful in failure recovery and state replication scenarios.

Failure recovery

In failure recovery scenarios, developers can use this to reprocess a subset of messages and recover from a downstream application failure or a dependency issue.

Consider a situation where a search application needs to reprocess messages because the search engine’s index has been erased. To initiate recovery, the search application would update the ReplayPolicy attribute in its existing subscription using the SetSubscriptionAttributes API action, to start receiving messages from a specific point in time, rather than from when the archive policy was applied to the topic.

State replication

For state replication scenarios, this feature enables new applications to duplicate the state of previously subscribed applications.

Consider an internal data warehouse application that must replicate the state of an external search application to make the data indexed in the search engine available to product managers and other internal staff. The data warehouse application subscribes its newly created endpoint (for example, an Amazon SQS FIFO queue) to the topic using the Subscribe API action and sets the ReplayPolicy subscription attribute.

If it opts to replicate the full state of the search engine, it might set the timestamp in its ReplayPolicy to coincide with the search engine’s subscription’s creation date and time, ensuring all data ever delivered to the search engine is also delivered to the data warehouse tool.

Enabling the archive policy via the SNS console

When creating a new SNS FIFO topic, you see an option for the archive policy. This policy determines how long SNS stores your messages, making them available for potential resending to a subscription if necessary. The Archive policy does not activate by default – you must enable it for each topic manually or automate the operation.

For instance, the retention period for this FIFO topic is set at 30 days. However, you can adjust this duration anywhere from 1 to 365 days. Once you activate the archive policy, messages sent to this topic are archived for the defined period.

To confirm that the archive policy is in effect after creating the topic, check the topic details. Next to the retention policy, and its status is displayed as Active.

By subscribing an SQS FIFO queue to an SNS FIFO topic, you can replay messages, and the Replay status shows Not running. You can subscribe both FIFO and standard SQS queues to their SNS FIFO topics, providing flexibility for various use-case requirements. To initiate a replay, navigate to the SNS console, choose Replay, and then choose Start replay.

When you initiate a replay, a window appears, allowing you to specify the start and end dates, as well as the exact time from which messages are archived. This feature affords the flexibility to replay only messages of interest, instead of every archived message, by allowing you to set on a specific schedule. When you choose Start replay, the service begins sending messages to the subscriber.

You can also define settings for the SNS FIFO archive and replay features with both AWS CloudFormation and the AWS Serverless Application Model (AWS SAM).

Use Cases

Replaying events for error recovery in microservices

In a scenario where an insurance application uses multiple microservices, consider one claims processing microservice encounters an error and drops a claim. Such an oversight can cause the workload to be out of sync.

With the archive and replay feature, you can revisit and replay events from the time the error was detected. This allows the microservice to recognize the missed event and complete the necessary actions, ensuring the system remains updated and accurate.

  1. Messages are published to an SNS FIFO topic from an application.
  2. Messages are delivered to an SQS FIFO queue containing claim details to be processed by a downstream microservice.
  3. The microservice fails to process a series of messages due to an exception and discards all of the messages.
  4. The user then initiates a replay from the SNS FIFO topic, specifies the time frame of messages to replay based on when the failure occurred.
  5. The microservice is now able to successfully process the replayed messages and persists data into a DynamoDB table.

Replicating state across Regions

In situations where an application spans multiple Regions, and a microservice encounters difficulties in its primary Region, you can replicate the infrastructure to another Region using an active/standby setup.

You can reroute traffic to the standby microservice in the secondary Region, maintaining synchronization through event replays. You can set an end time in the SNS replay policy but if this isn’t defined, replaying continues until all the most recent messages are sent.

After, the SNS subscription resumes normal functioning, capturing all new messages. This approach is suitable for many state replication scenarios, such as cross-Region backup strategies, as it helps minimize downtime and prevent message loss.

  1. Messages are published to an SNS FIFO topic from an application.
  2. Messages are delivered to an SQS FIFO queue containing claim details to be processed by a downstream microservice.
  3. The microservice failed to process a series of messages due to an exception and discarded all of the messages.
  4. The user then subscribes a new SQS FIFO queue in another region, initiates a replay from the SNS FIFO topic and specifies the time frame of messages to replay based on when the failure occurred.
  5. The microservice in a different region is able to retrieve the replayed messages from the new SQS FIFO queue, successfully processes the series of messages and persists data into a DynamoDB table.

Configuring SNS FIFO archive and replay for auto insurance processing

Managing auto insurance claims requires timely coordination. This walkthrough shows the combined benefits of SNS FIFO and SQS FIFO to process claims in the correct sequence.

Both SQS FIFO and SQS standard queues can be subscribed to the SNS FIFO topic, offering versatility in handling claims. The archive and replay functionality of SNS FIFO is paramount; disruptions in downstream microservices don’t compromise claim integrity due to the replay capability.

This walkthrough guides you through deploying an auto insurance claims processing example using the AWS CLI. You create an SNS FIFO topic for claim submissions and two SQS FIFO queues. The first queue is for primary processing of the claims, while the second is specifically for message replays to support application state replication across various system instances.

Prerequisites

Step 1 – Creating resources using the AWS CLI and storing variables

Run the following commands in the terminal.


# Define the AWS Region
REGION=$(aws configure get region)

# Create an SNS FIFO topic for auto insurance claims
AUTO_INSURANCE_TOPIC_ARN=$(aws sns create-topic --name "AutoInsuranceClaimsTopic.fifo" --attributes "FifoTopic=true,ContentBasedDeduplication=true,DisplayName=Auto Insurance Claims Topic" --region $REGION | jq -r '.TopicArn')

# Create primary and replay SQS FIFO queues
AUTO_INSURANCE_QUEUE_URL=$(aws sqs create-queue --queue-name "AutoInsuranceClaimsQueue.fifo" --attributes "FifoQueue=true" --region $REGION | jq -r '.QueueUrl')
AUTO_INSURANCE_REPLAY_QUEUE_URL=$(aws sqs create-queue --queue-name "AutoInsuranceReplayQueue.fifo" --attributes "FifoQueue=true" --region $REGION | jq -r '.QueueUrl')

# Get ARNs for both SQS queues
AUTO_INSURANCE_QUEUE_ARN=$(aws sqs get-queue-attributes --queue-url $AUTO_INSURANCE_QUEUE_URL --attribute-names QueueArn --region $REGION | jq -r '.Attributes.QueueArn')
AUTO_INSURANCE_REPLAY_QUEUE_ARN=$(aws sqs get-queue-attributes --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --attribute-names QueueArn --region $REGION | jq -r '.Attributes.QueueArn')

# Define a policy allowing the topic to publish to both queues
SQS_POLICY_TEMPLATE="{\"Policy\" : \"{ \\\"Version\\\": \\\"2012-10-17\\\", \\\"Statement\\\": [ { \\\"Sid\\\": \\\"1\\\", \\\"Effect\\\": \\\"Allow\\\", \\\"Principal\\\": { \\\"Service\\\": \\\"sns.amazonaws.com\\\" }, \\\"Action\\\": [\\\"sqs:SendMessage\\\"], \\\"Resource\\\": [\\\"$AUTO_INSURANCE_QUEUE_ARN\\\", \\\"$AUTO_INSURANCE_REPLAY_QUEUE_ARN\\\"], \\\"Condition\\\": { \\\"ArnLike\\\": { \\\"aws:SourceArn\\\": [\\\"$AUTO_INSURANCE_TOPIC_ARN\\\"] } } } ]}\"}"

# Apply the access policy to the queues
aws sqs set-queue-attributes --queue-url $AUTO_INSURANCE_QUEUE_URL --attributes file://<(echo $SQS_POLICY_TEMPLATE)
aws sqs set-queue-attributes --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --attributes file://<(echo $SQS_POLICY_TEMPLATE)

# Subscribe the primary queue to the created SNS FIFO topic
aws sns subscribe --topic-arn $AUTO_INSURANCE_TOPIC_ARN --protocol sqs --notification-endpoint $AUTO_INSURANCE_QUEUE_ARN --region $REGION

Step 2 – Setting the archive policy on the SNS FIFO topic

Modify the attributes of the SNS FIFO topic to set a retention period. This determines how long a message is retained in the topic archive. This example uses 30 days.

# Set a 30-day retention period for the SNS FIFO topic

aws sns set-topic-attributes --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --attribute-name ArchivePolicy --attribute-value "{\"MessageRetentionPeriod\":\"30\"}"

Step 3- Publishing auto insurance claim details

Publish a sample claim to the SNS FIFO topic. This step mimics a real-world scenario where an insurance claim must be processed by subscribers of the topic.

# Get the current timestamp and publish a sample insurance claim
TIMESTAMP_START=$(date -u +%FT%T.000Z)
aws sns publish --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --message "{ \"claim_type\": \"collision\", \"registration\": \"AB123CDE\" }" --message-group-id "group1"

Step 4 – Reading auto insurance claim details

Retrieve the insurance claim details from the primary SQS FIFO queue. This simulates a process reading the insurance claim to take action. After reading the message, the claim is deleted from the queue to avoid reprocessing.

# Fetch the claim details from the primary queue, then delete to avoid redundancy
MESSAGE=$(aws sqs receive-message --region $REGION --queue-url $AUTO_INSURANCE_QUEUE_URL --output json)
MESSAGE_TEXT=$(echo "$MESSAGE" | jq -r '.Messages[0].Body')
MESSAGE_RECEIPT=$(echo "$MESSAGE" | jq -r '.Messages[0].ReceiptHandle')
aws sqs delete-message --region $REGION --queue-url $AUTO_INSURANCE_QUEUE_URL --receipt-handle $MESSAGE_RECEIPT
echo "Received claim details: ${MESSAGE_TEXT}"

Step 5 – Subscribing the replay SQS queue to the SNS FIFO topic

To ensure no claims are lost, configure a replay policy for your SQS FIFO queue subscription. This policy sets the schedule from which messages are replayed to the SQS FIFO queue. Here, you subscribe a replay queue with a replay policy and then monitor the status of the replay queue. Once complete, read the replayed claim details from the secondary SQS FIFO queue. If any processing issues occurred initially, there is a second chance to process the claim.

Subscribe replay queue to SNS FIFO topic:

# Subscribe the replay queue to the topic and define its replay policy
NEW_SUBSCRIPTION_ARN=$(aws sns subscribe --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --protocol sqs --return-subscription-arn --notification-endpoint $AUTO_INSURANCE_REPLAY_QUEUE_ARN --attributes "{\"ReplayPolicy\":\"{\\\"PointType\\\":\\\"Timestamp\\\",\\\"StartingPoint\\\":\\\"$TIMESTAMP_START\\\"}\"}" --output json | jq -r '.SubscriptionArn')

To monitor the replay status:

# Wait for the replay to complete
while [[ $(aws sns get-subscription-attributes --region $REGION --subscription-arn $NEW_SUBSCRIPTION_ARN --output text | awk 'END{print $9}') != 'Completed' ]]; do printf "."; sleep 5; done; echo "Replay complete";

To read the replayed message and delete the message from the queue:

# Fetch the replayed message and then remove it from the queue
REPLAYED_MESSAGE=$(aws sqs receive-message --region $REGION --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --output json)
REPLAYED_MESSAGE_TEXT=$(echo "$REPLAYED_MESSAGE" | jq -r '.Messages[0].Body')
REPLAYED_MESSAGE_RECEIPT=$(echo "$REPLAYED_MESSAGE" | jq -r '.Messages[0].ReceiptHandle')
aws sqs delete-message --region $REGION --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --receipt-handle $REPLAYED_MESSAGE_RECEIPT
echo "Received replayed claim details: ${REPLAYED_MESSAGE_TEXT}"

Cleaning up

To avoid incurring unnecessary costs, clean up the resources created in this walkthrough:

# Delete the primary SQS FIFO queue
aws sqs delete-queue --queue-url $AUTO_INSURANCE_QUEUE_URL --region $REGION

# Delete the replay SQS FIFO queue
aws sqs delete-queue --queue-url $AUTO_INSURANCE_REPLAY_QUEUE_URL --region $REGION

# Unset the 'ArchivePolicy' attribute
aws sns set-topic-attributes --region $REGION --topic-arn $AUTO_INSURANCE_TOPIC_ARN --attribute-name ArchivePolicy --attribute-value "{}"

# Delete the SNS FIFO topic
aws sns delete-topic --topic-arn $AUTO_INSURANCE_TOPIC_ARN --region $REGION

Conclusion

The new SNS FIFO archive and replay feature provides a substantial foundation for event-driven applications, emphasizing failure recovery and application state replication. These features allow developers to efficiently manage and recover from disruptions, and ensure state replication across different application instances or environments.

Get started with this new SNS FIFO capability by using the AWS Management Console, AWS CLI, AWS Software Development Kit (SDK), or AWS CloudFormation. For information on cost, see SNS pricing and SQS pricing.

For more serverless learning resources, visit Serverless Land.