AWS and Anthropic ink deal to accelerate model development, enhance AI chips

The announcement that Amazon Web Services (AWS) will be Anthropic’s primary training partner confirms rumors of an even tighter partnership between the two companies.

They announced Friday that Anthropic will use AWS Trainium processors to train and deploy its Claude family of models. Further, as predicted earlier this month, Amazon will invest an additional $4 billion in the startup, making its total investment $8 billion.

AWS is already Anthropic’s primary cloud provider, and the OpenAI rival will now also primarily use Trainium and Inferentia chips to train and deploy its foundation models. Anthropic will also contribute to Trainium development in what the companies call a “hardware-software development approach.”

While it’s unclear whether the agreement requires Anthropic to exclusively use AWS chips, it is a move by Amazon to challenge the likes of Nvidia and other dominant players as the AI chip race accelerates.

“This is a first step in broadening the accessibility of generative AI and AI models,” Alvin Nguyen, Forrester senior analyst, told Computerworld.

Accelerating Claude development

Anthropic, which launched in 2021, has made significant progress with its Claude large language models (LLMs) this year as it takes on OpenAI. Its Claude 3 family comprises three LLMs: Sonnet, Haiku (its fastest and most compact), and Opus (for more complex tasks), which are all available on Amazon Bedrock. The models have vision capabilities and a 200,000 token context window, meaning they support large volumes of data, equal to roughly 150,000 words, or 500 pages of material.

Notably, last month Anthropic introduced “Computer Use” to Claude 3.5 Sonnet. This capability allows the model to use computers as people do; it can quickly move cursors, toggle between tabs, navigate websites, click buttons, type, and compile research documents in addition to its generative capabilities. All told, the company claims that Sonnet outperforms all other available models on agentic coding tasks.

Claude has experienced rapid adoption since its addition to Amazon Bedrock, AWS’ fully-managed service for building generative AI models, in April 2023, and now supports “tens of thousands” of companies across numerous industries, according to AWS. The foundation models are used to build a number of functions, including chatbots, coding assistants, and complex business processes.

“This has been a year of breakout growth for Claude, and our collaboration with Amazon has been instrumental in bringing Claude’s capabilities to millions of end users on Amazon Bedrock,” Dario Amodei, co-founder and CEO of Anthropic, said in an announcement.

The expanded partnership between the two companies is a strategic one for both sides, signaling that Anthropic’s models are performant and versatile, and that AWS’ infrastructure can handle intense generative AI workloads in a way that rivals Nvidia and other chip players.

From an Anthropic point of view, the benefit is “guaranteed infrastructure, the ability to keep expanding models’ capabilities, and showcase them,” said Nguyen, noting that it also expands their footprint and access.

“It’s showing that they can work well with multiple others,” he said. “That increases comfort levels in their ability to get training done, to produce models, to get them utilized.”

AWS, meanwhile, has a “’premiere client, one of the faces of AI’ in Anthropic,” said Nguyen.

From silicon through the full stack

As part of the expanded partnership, Anthropic will also help to develop and optimize future versions of AWS’s purpose-built Trainium chip. The machine learning (ML) chip supports deep learning training for 100 billion-plus parameter models.

Anthropic said it is working closely with AWS’ Annapurna Labs to write low-level kernels that allow it to interact with Trainium silicon. It is also contributing to the AWS Neuron software stack to help strengthen Trainium, and is collaborating with the chip design team around hardware computational efficiency.

“This close hardware-software development approach, combined with the strong price-performance and massive scalability of Trainium platforms, enables us to optimize every aspect of model training from the silicon up through the full stack,” Anthropic wrote in a blog post published Friday.

This approach provides an advantage over more general purpose hardware (such as Nvidia’s GPUs) that do more than what is “absolutely necessary,” Nguyen pointed out. The companies’ long partnership also means they may have mitigated performance optimization advantages that Nvidia has with their CUDA platform.

“This type of deep collaboration between the software and hardware engineers/developers allows for optimizations in both the hardware and software that is not always possible to find when working independently,” said Nguyen.