Listing Thumbnail

    S4 Weights - Lossless GPU Checkpoint Compression for PyTorch Training

     Info
    Deployed on AWS
    This product has charges associated with it for the S4 Weights software, in addition to the underlying AWS infrastructure costs. S4 Weights is a GPU lossless compression codec for PyTorch training checkpoints (weights and optimizer state). It byte-plane splits each tensor and routes the exponent plane through ANS and the mantissa through GDeflate on the GPU, so a checkpoint takes less storage in your Amazon S3 bucket, fewer bytes to transfer, and shorter save/load stalls - while staying bit-exact (lossless) for bf16, fp16, and fp32. Billed per instance-hour with an annual option; at boot the AMI calls the AWS Marketplace Metering Service RegisterUsage as a fail-closed entitlement check.

    Overview

    S4 Weights is a transparent, lossless compression codec for the checkpoints a PyTorch training run writes - the model weights and the optimizer state. It is delivered as a GPU AMI you launch on your own Amazon EC2 instances and run inside your own VPC; your checkpoints are written to your own Amazon S3 bucket and never leave your account.

    How it works: each tensor is split into byte planes (sign / exponent / mantissa) on the GPU and each plane is routed to the codec that suits it - the exponent plane through ANS entropy coding, the mantissa through GDeflate, the sign plane bit-packed (built on NVIDIA nvCOMP). For runs that checkpoint frequently, S4 Weights also stores the byte-XOR delta between consecutive checkpoints and compresses that, which is much smaller when most weights barely move between saves.

    Lossless is the trust core: restore is byte-for-byte identical, verified against adversarial bit patterns (NaN, +/-Inf, denormal, -0.0) on every supported dtype. The AMI build itself fails if a compress -> decompress round-trip is not bit-exact on the build GPU, so a broken plane reassembly never reaches a customer image.

    What you save depends on the data: all-bf16 / low-precision-optimizer checkpoints compress well (and better at scale); fp32-heavy checkpoints saved far apart compress little, so the product is honest about where it helps (see the documentation). Compression is always lossless and never expands a blob beyond a small fixed header.

    Billing: a paid hourly + annual per-instance-type AMI. AWS meters the running instance-hours automatically; the runner calls RegisterUsage once at boot as a fail-closed entitlement check (an unentitled instance refuses to start) and does not meter per-usage by default. The runner runs as a non-root service and persists compressed checkpoints to the S3 registry bucket you configure. Support: aws-support@abyo.net .

    Highlights

    • Bit-exact lossless: restore is byte-for-byte identical for bf16/fp16/fp32 weights and fp32 optimizer state, verified against adversarial NaN / +/-Inf / denormal / -0.0 inputs. The AMI build fails unless a GPU compress->decompress round-trip is bit-exact, so a broken codec never ships.
    • GPU byte-plane codec built on NVIDIA nvCOMP: exponent plane -> ANS, mantissa -> GDeflate, sign bit-packed; plus byte-XOR delta between consecutive checkpoints for extra savings on frequent saves. Compression never expands a blob beyond a small fixed header.
    • Drop-in for PyTorch training: transparent save/load plus a base->delta checkpoint store, written to your own Amazon S3 bucket (you keep your data). Paid hourly + annual per instance type; RegisterUsage entitlement only; runs as a non-root service inside your VPC.

    Details

    Delivery method

    Delivery option
    64-bit (x86) Amazon Machine Image (AMI)

    Latest version

    Operating system
    Ubuntu 22.04

    Deployed on AWS
    New

    Introducing multi-product solutions

    You can now purchase comprehensive solutions tailored to use cases and industries.

    Multi-product solutions

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    S4 Weights - Lossless GPU Checkpoint Compression for PyTorch Training

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time. Alternatively, you can pay upfront for a contract, which typically covers your anticipated usage for the contract duration. Any usage beyond contract will incur additional usage-based costs.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (16)

     Info
    Dimension
    Description
    Cost/hour
    g6.xlarge
    Recommended
    S4 Weights hourly software fee for instance type g6.xlarge
    $0.16
    g6.16xlarge
    S4 Weights hourly software fee for instance type g6.16xlarge
    $0.16
    g6.48xlarge
    S4 Weights hourly software fee for instance type g6.48xlarge
    $1.28
    g6.24xlarge
    S4 Weights hourly software fee for instance type g6.24xlarge
    $0.64
    g6e.8xlarge
    S4 Weights hourly software fee for instance type g6e.8xlarge
    $0.16
    g6.12xlarge
    S4 Weights hourly software fee for instance type g6.12xlarge
    $0.64
    g6e.16xlarge
    S4 Weights hourly software fee for instance type g6e.16xlarge
    $0.16
    g6.2xlarge
    S4 Weights hourly software fee for instance type g6.2xlarge
    $0.16
    g6e.4xlarge
    S4 Weights hourly software fee for instance type g6e.4xlarge
    $0.16
    g6e.48xlarge
    S4 Weights hourly software fee for instance type g6e.48xlarge
    $1.28

    Vendor refund policy

    Refund requests are handled by email to aws-support@abyo.net . Refunds are reviewed case by case in line with the AWS Marketplace refund process.

    How can we make this page better?

    Tell us how we can improve this page, or report an issue with this product.
    Tell us how we can improve this page, or report an issue with this product.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    64-bit (x86) Amazon Machine Image (AMI)

    Amazon Machine Image (AMI)

    An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

    Version release notes

    Initial release: GPU lossless compression codec for PyTorch training checkpoints (weights + optimizer state). Byte-plane split with exponent->ANS / mantissa->GDeflate (nvCOMP) + byte-XOR delta between checkpoints; bit-exact for bf16/fp16/fp32, verified against adversarial NaN/Inf/denormal/-0.0. Compressed checkpoints persist to your own S3 bucket. Paid hourly + annual per instance type; RegisterUsage entitlement (fail-closed) at boot.

    Additional details

    Usage instructions

    Launch the AMI on a g6 or g6e GPU instance in your own VPC (the provided CloudFormation template deploy/cfn-train-runner.yaml wires it end-to-end). Set S4WEIGHTS_REGISTRY_S3_BUCKET to your S3 checkpoint bucket and grant the instance role access to that bucket plus aws-marketplace:RegisterUsage. The runner starts as a non-root systemd service, calls Marketplace RegisterUsage at boot (fail-closed entitlement), and serves a health endpoint on TCP 8080. Your PyTorch training code writes checkpoints with s4weights.save / s4weights.load (or the delta-chain save_checkpoint / load_checkpoint), which compress each tensor on the GPU and persist bit-exact compressed checkpoints to your S3 registry. SSH (port 22) is for administration only.

    Support

    Vendor support

    Email support at aws-support@abyo.net .

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 reviews
    No customer reviews yet
    Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.