Idioma del contenido
No todo el contenido está traducido actualmente.

Letting Nature Lead: How Sakana AI is Transforming Model Building

¿Qué le pareció este contenido?

The explosion of generative artificial intelligence (AI) has created an astronomical pace of change. Now companies are hyper-focused on bringing higher performing models to life, with vast numbers of new and improved large language models (LLMs) emerging every day. The tried-and-tested Transformer model has been at the heart of generative AI’s boom, empowering founders to rapidly scale and release new LLMs.

Yet these upgrades often come at a cost, demanding more processing power and resources with each new version. Meanwhile, older LLM versions can quickly become overshadowed by bigger, compute-hungry models. Against a backdrop of global GPU shortages, which set the upper bound on the practical scaling of model training, co-founders David Ha and Llion Jones were curious to find a more efficient way to push the frontiers of AI. They set out on a research journey to explore creative techniques for foundation model (FM) development inspired by a different source of power—the power of nature.

Sakana AI, their Tokyo-based startup, is now spearheading a new trend in AI model training by creating cutting-edge LLMs born from pre-existing ones. Since they founded the company in 2023, their research is already breaking new ground by maximizing resources that are often overlooked. Using age-old ideas such as evolution and natural selection, the business is making leaps towards a future where FMs automatically inherit the strongest traits of their ancestors. Their vision? A training method where models constantly evolve and adapt to changing environments.  

Embracing new generations of AI

In true entrepreneurial spirit, the startup isn’t just waiting for the next change in generative AI—they’re embracing the unknown to find what’s next. Jones, Chief Technology Officer at Sakana AI, explains why they left roles at major tech firms to start Sakana AI: “David and I weren’t exploring the long-term speculative research we wanted to do, so we knew we had to start out on our own.” Noticing historical patterns in technology development, the co-founders saw an opportunity to make meaningful discoveries.  

Jones adds: “The way I think about AI research is that it goes through exploration and exploitation phases. People try different approaches until they find something that works well—then everyone focuses on exploiting that technology. But while there’s lots of hype around how the Transformer model trains generative AI, it means we’re not exploring outside of that much.”

The Transformer model was a breakthrough in deep learning architecture in 2017 and has taken the world by storm ever since. Unlike models that came before, the Transformer can be trained on much larger datasets, used for a variety of tasks, and have a more accurate understanding of the texts they read and write. But with much greater scalability comes the need for greater computation, so much so that hardware manufacturers have been unable to create AI chips quick enough to meet demand.

Sakana AI is exploring alternative, more sustainable model training methods. Takuya Akiba, Research Scientist at Sakana AI, explains: “Everyone is converging to similar goals when training models. Because of this, we’re not seeing much difference in the outcomes. At Sakana AI we’re creating a new paradigm inspired by nature. This is enabling us to find new applications that wouldn’t be possible by just scaling.”

Harnessing nature’s wisdom

Named after the Japanese word for fish, ‘Sakana’ alludes to their nature-inspired techniques and evolutionary influence. The logo appropriately represents their pioneering methods. It shows a school of fish swimming in one direction while a red fish defiantly swims the opposite way. The graphic also captures the idea of collective intelligence that inspires their thinking—namely, the notion that smaller models can interact more efficiently with less information and resources than large, dense models passing out lots of information.

With technology reaching an inflection point, Sakana AI are putting the idea of evolutionary computation to the test on FMs. The well-established technique of training and optimizing models is known as gradient descent—but, like the Transformer model, this comes at a high computational cost. You’d be wrong in assuming resource efficiency is simply a necessity for progressing their startup journey, though. The Sakana AI team see it as a strategic advantage that’s empowering them to think outside the box, maximize available resources, and nurture innovation. As Jones says, “I think that constraint means we can come up with some more interesting things.”

“Our philosophy is ‘learning always wins’. And to learn things, you can't just use the most popular algorithm. You must use different techniques like evolutionary computation to search these spaces,” he adds. With strategic technical support from AWS, Sakana AI have since planted ideas from nature into the technology sphere, and they’re already seeing the fruits of their labor.

Making waves with Evolutionary Model Merge

A key breakthrough to date is Sakana AI’s novel approach to model merging. The team observed that there’s a tremendous amount of value to gain from current models, yet hundreds of thousands of them are unused or dismissed when they get superseded by new versions. “There’s a very large ocean of unique, open-source LLMs already out there,” says Akiba.

By merging different models, rather than training them from scratch, they can take the best qualities from each one to create a new, more powerful one. Model merging isn’t a new notion in itself—people have experimented in the art of ‘hacking’ models to create specialized LLMs—but what is new is how Sakana AI applies a nature-inspired algorithm to automate the process.

Just consider the process of natural selection. Species have evolved over time to pass on genes that help them to adapt and thrive in their environment. Meanwhile, traits that threaten species’ survival are eventually wiped out. Likewise, Sakana AI’s evolutionary algorithms can find the optimal combinations of different parts of FMs to produce new FMs that are naturally selected to perform well at a particular application. The new model inherits the winning traits of the previous models based on what the user has specified. It's a far cry from a Frankenstein-style approach of stitching together different model elements.

Previous model merging techniques relied on human experience, domain knowledge, and intuition—all of which have limits. “By evolving different ways of merging the algorithms, we end up with a better merged model than a human could design by hand,” Llion explains. “Anytime you can get a computer to search through a space of solutions for you, you win. That beats a human trying do it manually because a computer can do it faster, try more things than you, and have more patience than you.”

Only the fittest FMs survive

The diversity of open models and generative AI tasks continues to surge, meaning Sakana AI’s much more systematic approach to model merging will only become more important. As Akiba says: “There are almost an infinite number of ways to combine different models—so we need these heuristic optimization models.”  Within their experiments, Sakana AI let the evolution process work for a few hundred generations when the highest scoring models survive to repopulate the next generation.

The Evolutionary Model Merge approach is already proven to evolve FMs in often unintuitive, but highly effective ways. For example, while there’s a wealth of open-source models in Japan, none of them could previously handle mathematics because there’s no Japanese mathematics data set. Instead of starting from scratch and training a new model, Sakana AI merged one model with Japanese language fluency with an English model that is good at mathematics, but doesn’t speak Japanese.

The result was a state-of-the-art LLM with both enhanced Japanese reasoning and strong mathematics capabilities—and it has performed exceptionally well against benchmarks in both areas. Manually combining these models would have been incredibly difficult, especially when handling such distinct domains. By automating the process, the startup can quickly transform existing FMs and bring their unique qualities to different cultures.

Sakana AI have discovered that evolutionary algorithms don’t just support text LLMs, as they’ve successfully merged LLMs with Japanese vision-language models too. In fact, the resulting model improved accuracy for image-related questions and was even able to learn nuances and culturally specific knowledge about Japan. The team have also achieved promising results from applying the same method to different image generation diffusion models.

The power to adapt and learn

Breaking new ground in generative AI requires specialist expertise combined with a robust technical foundation comprising of flexible and cost-effective solutions. AWS provides Sakana AI with those solutions, in addition to strategic guidance and credits through the AWS Activate program. Access to funding has enabled them to experiment with their nature-inspired approach in the AWS Cloud without the barrier of upfront costs. Personalized, technical support from the AWS Startups team has also empowered them to progress and publish results quickly.

Choosing the right Amazon EC2 instances is just one of the ways they are powering their research—renting instances with On-demand or Capacity Blocks means they can stay agile and select the best ones in any moment. This approach to compute power has also contributed to reduced costs and a much smaller memory footprint than what would have been needed for gradient descent methods. Akiba commented: “AWS deeply understands our workload and what we’re trying to achieve. They have helped us quickly overcome challenges, such as capacity issues.”

Inspired by their ambition and intelligence, AWS has supported Sakana AI since day one. As Yoshitaka Haribara, Solutions Architect at AWS, says: “It’s a pleasure to work with such a talented team at the top of their game. We’re thrilled to see exciting results across their research and hope that AWS can continue to support their effort by offering resources, expertise, and creative thinking.”

Akiba noted how AWS’s partnership and services enabled the company to hit the ground running: “We are quite a small team, so we didn’t have a platform engineer to set up a cluster. It’s really easy to use AWS services which has made it simple to explore our research.”

Exploring new AI frontiers

While the generative AI space is hotly competitive and evolving at pace, Sakana AI’s research promises to accelerate progress even further. “Right now, there’s competition between proprietary models and open-source models, and many think proprietary models are leading the way. However, I believe our research can be a game-changer for accelerating open-source model development and unlocking new skills in the community,” says Akiba.

Sakana AI continues to avidly research how novel techniques can create faster innovation cycles. But as Jones points out, they’re not in this for quick rewards: “Our long-term, exploratory approach makes it much harder to see what the future is. But I'm very comfortable with that risk because it’s extremely exciting exploring fascinating topics.”

As Sakana AI builds momentum across multiple projects, they are examining how other AWS services can support proof of concepts, such as using Amazon Bedrock for scaling their use of foundation models like Anthropic’s Claude. Beyond model merging techniques, the company is also researching how to evolve agent-based intelligent systems, and AWS is backing their vision in this exciting space.

Jones has high hopes based on technology’s current rate of advancement: “Since the amount of compute used to train models continues to double every six months, we could reach human-level intelligence if we keep improving the training algorithms and optimizing how we put them into an agent. If we’re then able to spin up 10,000 AI agents to solve a problem, it could be possible to do a couple of years’ worth of scientific research in a week.” From automating drug discovery to improving core operations in computer science, this research could solve some of the world’s most challenging problems.

Teaming up with partners like AWS has been crucial to Sakana AI’s journey—and it’s only just the beginning of the long-term value they are yet to unlock. Their advice for other startups looking to expand generative AI’s potential? Jones would love to see other founders take advantage of their freedom by going deeper with technology: “Be ambitious with your ideas. Don’t race to the goldrush or push out a first version of an app for the sake of being the first—take the time to explore.”  

¿Qué le pareció este contenido?