S4 Firewall - LLM Token Budget and Runaway Loop Control for Amazon VPC

S4 Firewall is an in-VPC forwarding proxy that puts a pre-emptive spend firewall in front of your LLM traffic. Point your application's base_url at it (OpenAI-compatible, Anthropic Messages-compatible, or Bedrock-compatible); every request runs a synchronous attribute -> reserve -> budget/anomaly -> forward -> reconcile pipeline. Its headline job is the runaway-loop circuit breaker: a deterministic hard cap that blocks a request before it is relayed once a tenant, feature, or customer budget would be exceeded, plus best-effort detection of agent loops and near-duplicate call chains to contain a runaway agent before it burns the budget. Token spend is attributed per feature/tenant/customer and emitted to CloudWatch and an optional counts-only audit ledger. Ships as an Amazon Linux 2023 arm64 AMI with a least-privilege IAM role and one-click CloudFormation - no control plane, no database. Billed per instance per hour, with an annual option.

View purchase options

Try for free

Overview

Try agent mode

Create proposal

Ask question

S4 Firewall is a forwarding proxy you run inside your own Amazon VPC to put a budget and a circuit breaker in front of your LLM token spend. Your application changes only its base_url: S4 Firewall exposes an OpenAI-compatible, an Anthropic Messages-compatible, and a Bedrock-compatible intake, relays each request to the upstream provider your application already uses, and returns the upstream response (including streaming responses, passed through chunk by chunk without buffering so time-to-first-token is preserved).

Headline capability - the runaway-loop circuit breaker. Every request runs through a synchronous in-memory pipeline before it is relayed: attribute the request to a feature, tenant, and customer; reserve the worst-case cost (input tokens counted now, output priced at max_tokens times the output rate); check the reservation against the hierarchy of budgets; then either forward or block. There are two layers, kept honestly distinct. Layer 1, the hard cap, is deterministic and pre-emptive: cumulative spend is known, so any request whose reservation would push the running total past a configured hard cap is blocked before it is relayed - a 100 percent pre-emptive block of over-cap requests, with the same state in producing the same decision out, fixed by chaos tests. Layer 2, the loop block, is best-effort and behavioral: a runaway is only knowable after a few calls (those few are already billed), so this layer detects agent loops, near-duplicate call chains, and in-session amplification and bounds the blast radius - containing the runaway within a small number of requests or a small dollar amount. Layer 2 is explicitly best-effort, not a 100 percent guarantee, and ships with a conservative default plus a dry-run shadow mode so you can measure the false-block rate before you enforce.

Honest budgeting under output uncertainty. The number of output tokens is unknowable until the response returns, so stopping before the bill is incurred is reserve-then-reconcile, not a flat estimate: the reservation uses the worst case to make the hard-cap decision, and when the response returns the provider's reported usage is taken as the source of truth and reconciled against the reservation. Token counts are normalized across providers and split into input, output, cached-read, and cache-write so the per-feature accounting reflects each provider's real rate card.

Attribution and metering. Tag requests by header or API key to attribute token spend to a feature, tenant, or customer - finer than IAM-principal granularity. Spend rolls up by dimension and is emitted to Amazon CloudWatch (namespace S4/Firewall) and, optionally, to a counts-only append-only S3 audit ledger that records token counts, quantities, and attribution metadata - never prompt or response bodies.

Data handling - the honest version. S4 Firewall itself does not persist or transmit your prompts or responses. Its only outbound call is the provider request your application would have made anyway; the firewall does not add an egress. The ledger and metrics carry token counts, not content (counts-not-content), fixed by property tests. Where prompts egress depends on the upstream you choose: when you route to Amazon Bedrock through a VPC interface endpoint (PrivateLink, which this AMI can provision), the Bedrock calls stay inside your VPC/AWS boundary; when you route to a third-party provider on the public internet, that traffic egresses to the internet and does not stay in your VPC. The value S4 Firewall provides is that the firewall does not hoard your prompts, does not send them to any third party of its own, and writes no bodies to the ledger - not that your prompts never leave the VPC.

Operations. No separate control plane and no external database: budget state is held in-memory per instance and re-derived from zero on restart. The data plane is a single static binary running under a hardened systemd unit with zero elevated capabilities and a least-privilege IAM role (upstream model invocation, CloudWatch PutMetricData scoped to the S4/Firewall namespace, and write-only PutObject to the ledger bucket). No telemetry home-call and no license-key check - billing is AMI hourly + annual. Deploy in minutes with the included CloudFormation template, which optionally creates the Bedrock VPC interface endpoint.

S4 Metrics bundle. S4 Firewall is sold standalone and is also offered as a bundle SKU with S4 Metrics (observability) as a separate Marketplace offer - observe plus enforce in one bill. The bundle is a distinct offer entity; this listing's pricing covers the standalone S4 Firewall product.

There is no lock-in - it is a normal Amazon Linux 2023 AMI you run in your own VPC, billed per instance per hour through your AWS bill, with an annual contract option.

Highlights

Runaway-loop circuit breaker, pre-emptive: a deterministic hard cap blocks any request before it is relayed once a feature/tenant/customer budget would be exceeded (100 percent pre-emptive block of over-cap requests), plus best-effort detection of agent loops and near-duplicate call chains that bounds the blast radius of a runaway agent before it burns the month's budget. (Layer 2 is best-effort, not a 100 percent guarantee; ships with a dry-run shadow mode.)
Per-feature/tenant/customer token attribution and budgets, in your VPC: a forwarding proxy you point your base_url at (OpenAI-compatible, Anthropic Messages-compatible, Bedrock-compatible). Reserve-then-reconcile accounting uses the provider's reported usage as the source of truth and splits input/output/cached/cache-write at each provider's real rate, emitted to Amazon CloudWatch and an optional counts-only audit ledger.
No prompts hoarded, no separate control plane: S4 Firewall does not persist or transmit your prompts - the only egress is the provider call your app already makes, and the ledger carries token counts, not content. Optional Amazon Bedrock VPC interface endpoint keeps the Bedrock path inside your AWS boundary. Runs as a single static binary on a standard Amazon Linux 2023 AMI with a least-privilege IAM role and one-click CloudFormation - no external database, no telemetry home-call.

Details

Sold by

abyo software

Introducing multi-product solutions

You can now purchase comprehensive solutions tailored to use cases and industries.

Learn more

Explore multi-product solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Free trial

Try for free

Try this product free for 14 days according to the free trial terms set by the vendor. Usage-based pricing is in effect for usage beyond the free trial terms. Your free trial gets automatically converted to a paid subscription when the trial ends, but may be canceled any time before that.

S4 Firewall - LLM Token Budget and Runaway Loop Control for Amazon VPC

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time. Alternatively, you can pay upfront for a contract, which typically covers your anticipated usage for the contract duration. Any usage beyond contract will incur additional usage-based costs.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (18)

Info

Dimension	Cost/hour
c7g.large Recommended	$0.12
c6g.medium	$0.12
c6g.large	$0.12
c6g.xlarge	$0.12
c6g.2xlarge	$0.12
c6g.4xlarge	$0.12
c6g.8xlarge	$0.12
c7g.medium	$0.12
c7g.xlarge	$0.12
c7g.2xlarge	$0.12

Vendor refund policy

Email aws-support@abyo.net within 30 days of charge for refund requests; refunds are evaluated case by case.

How can we make this page better?

Tell us how we can improve this page, or report an issue with this product.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery method

Version

Delivery details

64-bit (Arm) Amazon Machine Image (AMI)

Amazon Machine Image (AMI)

An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

Version release notes

Adds a CloudFormation Quick Launch delivery option. Software identical to version 1.0.0.

Additional details

Usage instructions

Deploy via the included CloudFormation (cfn-single.yaml for a single instance, or cfn-ha.yaml for a redundant fleet behind an internal load balancer); point your application base_url at the firewall. See the runbook and docs on the AMI.

Support

Vendor support

Email support at aws-support@abyo.net .

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

S4 - Squished S3: CPU S3 Compression Gateway (EC2 AMI)

By abyo software

Self-contained EC2 AMI of the S4 transparent S3 compression gateway with CPU codecs (zstd / gzip) preinstalled. Launch on any general-purpose or compute-optimized instance (t3 / m6i / m7i / c6i / c7i), point your S3 clients at it, and cut S3 storage bytes 50-80 percent for compressible data with zero application changes.

View product

S4 - Squished S3: GPU S3 Compression Gateway (EC2 AMI)

By abyo software

Self-contained EC2 AMI of the S4 transparent S3 compression gateway with NVIDIA nvCOMP GPU codecs preinstalled. Launch on a GPU instance (g4dn / g5 / g6), point your S3 clients at it, and cut S3 storage bytes 50-80 percent for compressible data with zero application changes.

View product

S4 - Squished S3: Transparent S3 Compression Gateway

By abyo software

Drop-in S3-compatible gateway that transparently compresses every object (CPU zstd or GPU nvCOMP), cutting S3 storage bytes 50-80 percent for compressible data with zero application changes. Includes pre-deployment savings estimation and measured-savings reporting.

View product

S4 - Squished S3: Compression Gateway (Metered Savings)

By abyo software

Drop-in S3-compatible gateway that transparently compresses every object, cutting S3 storage bytes 50-80 percent for compressible data with zero application changes. This edition bills by measured savings: you pay per GB of backend storage avoided, per hour, at roughly one third of the avoided storage cost.

View product

S4 Metrics Commercial

By abyo software

Cut your CloudWatch custom-metric bill: govern metric cardinality at ingest, then auto-baseline and roll up savings across your whole AWS Organization.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.