Learn
From YC to AWS: Tusk turns production traffic into AI-powered tests on AWS

From YC to AWS: Tusk turns production traffic into AI-powered tests on AWS

How was this content?

AI-generated code is rapidly reshaping software development. What once took days now takes hours, and what required teams can increasingly be done by individuals. The problem? More code is being generated than ever before. That means more pull requests, more edge cases, and more demand on engineering teams. Time-saved on writing is worth little if it’s simply swallowed up by increased requirements on quality assurance—a responsibility that increasingly sits with those building the software.

Tusk, a pioneering startup and Y Combinator (YC) alumnus, is helping businesses prevent bugs that would otherwise be missed by both coding agents and humans with AI-enabled tests based on real production traffic. By using high-performing foundation models (FMs) in Amazon Bedrock, Tusk automatically flags issues like unexpected regressions and API contract drift before code merge, allowing engineering teams to focus on higher-value work.

Software testing built on reality, not assumptions

Founded in 2023 by two UC Berkeley graduates, Tusk helps businesses ship quality code with AI-generated testing based on real user behavior. “Tusk turns your production traffic into realistic unit and API tests,” says Marcel Tan, CEO. “We do this by recording traces as users interact with your app in the real world, and we replay these traces against your code changes to find and prevent regressions.” This represents a significant shift in how businesses of all sizes can approach code testing in the AI era.

“If you look at all the top engineering teams right now, the people doing QA are typically the ones also building the feature,” says Tan. The reasoning behind this trend is sound. These teams have better context with which to approach testing as they are the ones actually updating and optimizing the code. However, as code volume surges, fixing bugs has become increasingly time-consuming. “In the past, QA would account for roughly half your release cycle. With coding agents today, we have top engineers spending 90 percent of their time on QA, which isn’t a good use of their time,” says Tan.

“Most tests written manually or with AI don’t actually reflect how users are interacting with your product in the real world,” says Tan. “Because we’re capturing real traffic, we provide coverage over the edge cases that would otherwise be missed.” That includes silent failures as a result of unintended semantic behavior. In these instances, an output appears valid but is functionally wrong. Tusk runs and iterates on the tests it generates, and by evaluating them against real production traffic, makes it easier to catch regressions that would be near-impossible to predict otherwise.

Incubating success from first pitch to product-market fit

Tusk began life as one of the first publicly available coding agents. “We wanted to build a coding agent that would allow product managers, software engineers, even non-technical people to go from a JIRA ticket all the way to a pull request,” says Tan. “We were arguably the first agent that was capable of doing that in a mature codebase.” After pitching this early version of its product, the company was accepted into YC W24 batch, which is where today’s Tusk started to take shape.

“The three months of YC is super intensive,” says Tan. “It’s basically a bootcamp and you’re not really thinking about anything aside from the startup.” For Tusk, one of the most valuable aspects of the YC experience was connecting with other founders, including a smaller, more curated group within the batch. These groups would meet regularly to discuss their goals and progress. “It’s really motivating because you can see how fast people can move in the span of three or four days. That sense of urgency gets baked into the startup—it gives you good DNA,” says Tan.

A lasting lesson from the incubator was the value of engaging directly with customers. “Instead of trying to intuit what our customers needed, we were encouraged to just ask them directly,” says Tan. “It sounds so obvious, right? Sometimes the simplest advice is the best advice.” In fact, it was after engaging with customers that the Tusk team began to rethink the direction of their business.

“Our customers then repeatedly pointed out that generating more pull requests was creating more work for their engineers,” says Tan. This, coupled with the growing availability of AI-powered coding companions, provided a clear signal of where the industry was heading. “Writing code was becoming a commodity,” says Tan. “We realized that in 18 months the bottleneck would be verifying that the code works.” As a result, the team shifted focus, reorienting the company around testing and laying the foundation for the product it offers today.

Freedom to focus on the customer, not cost

Shortly after coming out of YC, Tusk started collaborating with AWS. The company participated in AWS Activate, a dedicated program to support startups with technical expertise, go-to-market opportunities, and funding in the form of AWS Credits. “It’s been incredible,” says Sohil Kshirsagar, CTO. “The AWS team’s been very responsive, even when we were a lot smaller. On top of that, the amount of credits we’ve received has been really helpful. It’s essentially an investment that we’re getting for no equity.” This is particularly valuable for startups that rely on AI infrastructure.

“As a pre-AI startup, your cloud costs would be limited to things like hosting and storage, but today, large language models (LLMs) become your primary cost,” says Kshirsagar. “If we didn’t have those credits, every time we released something to the customer we would be thinking how much is this going to cost? Is this going to affect our runway? But now, we can actually just solve the problem and figure out how to optimize it after the fact.”

In addition to the cost savings, AWS Activate freed up the Tusk team to direct their attention to what matters most. “There are already so many things that we have to worry about every single day, you don’t really want your cloud usage or spend to be one of them,” says Kshirsagar. “Activate allows us to stay customer focused—what is the problem that they’re having, how can we best solve it—and not necessarily think about the cost implications down the road.”

Real-time observability meets scalable intelligence

Tusk uses a combination of AWS services for inference and monitoring. “Amazon Bedrock is our primary LLM inference solution,” says Kshirsagar. “One of the main benefits it gives us is scalable cross-region inference, which is important early on when you could go from one to ten customers in a couple of weeks and need to increase rate limits.”

The models Tusk uses in Amazon Bedrock drive semantic understanding and regression classification. “When Tusk look at differences in the outputs of an API response, it has to consider that you might be changing the structure of the API or slightly modifying the response,” says Kshirsagar. “We use reasoning models in Bedrock to determine whether that change is a regression, or an intended update based on the context of the pull request.”

Amazon Bedrock helps Tusk optimize model and token usage. “We often switch models depending on the complexity of the task,” says Kshirsagar. If a model change is needed, Amazon Bedrock makes that process easy—often as straightforward as updating the model ID.

Beyond the QA bottleneck, towards end-to-end assurance

As Tusk continues to grow and evolve, the customer-first mindset fostered during its time in YC remains central. “We’re seeing a lot of burnout among engineers,” says Tan. “We want to help them spend less time bogged down with testing and more time on the fun stuff, like designing solutions to complex problems or working on features that serve users.”

To realize that ambition, Tusk is deepening its collaboration with AWS use of Amazon Bedrock. “As we continue to push out new features and reach new customers, our Amazon Bedrock usage is likely to scale exponentially,” says Kshirsagar. “We’ve also spoken to AWS about potentially fine-tuning models or building and training our own models on AWS Trainium EC2 Instances.”

“We plan on becoming the all-in-one testing platform,” says Tan. “We will intelligently cover all major types of testing software companies rely on: unit, integration (API), and end-to-end testing. This would allow Tusk to function as a staff-level AI test engineer that anybody can hire—even a one-person startup—to QA any code change and pull request you create. That’s the ultimate vision.”

How was this content?