With genAI models, size matters (and smaller may be better)

As organizations continue to adopt generative AI (genAI) tools and platforms and explore how they can create efficiencies and boost worker productivity, they’re also grappling with the high costs and complexity of the technology.

The foundation of genAI and AI in general are language models, the algorithms and neural networks that power chatbots like OpenAI’s ChatGPT and Google’s Bard. The most popular and widely used models today are known as large language models (LLMs).

LLMs can be massive. The technology is tied to large, diverse troves of information and the models contain billionssometimes even trillions — of parameters (or variables) that can make them both inaccurate and non-specific for domain tasks or vertical industry use.

Enter small language models (SLMs), which have gained traction quickly and some even believe are already becoming mainstream enterprise technology. SLMs are designed to perform well for simpler tasks; they’re more accessible and easier to use for organizations with limited resources; they’re more natively secure because they exist in a fully self-manageable environment; they can be fine-tuned for particular domains and data security; and they’re cheaper to run than LLMs.

According to Ritu Jyoti, a group vice president of IDC’s AI research group, SLMs are well suited for organizations looking to build applications that can run locally on a device (as opposed to in the cloud) and “where a task doesn’t require extensive reasoning or when a quick response is needed,” Jyoti said.

Conversely, LLMs are better suited for applications that need orchestration of complex tasks involving advanced reasoning, data analysis and a better understanding of context.

SLMs can be built from scratch using open-source AI frameworks, which means an organization can create a highly customizable AI tool for any purpose without having to ask for permission, it can study how the system works and inspect its components, and it can modify the system for any purpose, including to change its output.

Open-source affords more freedom, customization

Dhiraj Nambiar, CEO of AI prototype developer Newtuple Technologies, said SLM adoption is growing because they can be fine-tuned or custom trained and have demonstrated “great performance for a narrow range of tasks, sometimes comparable to much larger LLMs.”

For example, he said, there are SLMs today that do “a great job” at optical character recognition (OCR) type tasks, and text-to-SQL tasks. “Some of the open-source ones are showing comparable performance to the LLMs,” Nambiar said.

In fact, the most popular SLMs today are open-source, IDC’s Jyoti said. They include:

  • Meta’s Llama3
  • Microsoft’s Phi-3
  • Google’s Gemma
  • Mitral AI’s Mixtral8x7B
  • Apple’s OpenELM

The most popular non-open-source SLMs (which are proprietary and not freely available for public use) include:

  • Deep Seek AI’s Coder
  • Microsoft’s Phi-2
  • Microsoft’s Orca-2

“These models are typically used within specific organizations or offered as part of commercial services, providing advanced capabilities while maintaining control over their distribution and use,” Jyoti said.

An AI model infers from inputs the outputs it will generate, such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

In the simplest of terms, a small language model (SLM) is a lightweight genAI model. The “small” in this context refers to the size of the model’s neural network, the number of parameters and the volume of data on which it is trained, according to Rosemary Thomas, a senior technical researcher in the AI lab at Version 1, a management consulting and software development firm. She said while some SLM implementations can require substantial compute and memory resources, several can run on a single GPU and have more than 5 billion parameters.

Those include Google Gemini Nano, Microsoft’s Orca-2–7b and Orca-2–13b, Meta’s Llama-2–13b, and others, Thomas noted in a recent article.

Adoption of SLMs is growing, driven by the need for more efficient models and the speed at which they can be trained and set up, according to Thomas. “SLMs have gained popularity due to practical considerations such as computational resources, training time, and specific application requirements,” she said. “Over the past couple of years, SLMs have become increasingly relevant, especially in scenarios where sustainability and efficiency are crucial.”

When compared with LLMs, the key difference lies in scale. Larger models are trained on vast amounts of data from diverse sources, making them capable of capturing a broad range of language patterns, where SLMs are more compact and trained on smaller, often proprietary. datasets. That allows for quicker training and inference times.

LLMs also require more computational resources and longer training times. “This makes SLMs a more practical choice for applications with limited resources or where quick implementation is needed,” Thomas said.

Though LLMs shine in tasks like content generation, language translation, and understanding complex queries small models can achieve comparable performance when correctly fine-tuned, according to Thomas.

“SLMs are particularly efficient for domain-specific tasks due to their smaller size and faster inference times,” she said.

Build or buy?

Organizations considering the use of an open-source framework to build their own AI models from scratch should understand that it’s both exorbitantly expensive and time consuming to fine-tune an existing model, according to Nambiar.

“There are a number of approaches in building your own AI model, from building it from scratch to fine-tuning an existing open-source model; the former requires an elaborate setup of GPUs, TPUs, access to lots of data, and a tremendous amount of expertise,” Nambiar said. “The software and hardware stack required for this is available, however, the main blocker is going to be the remaining components.

“..I highly recommend that for domain specific use cases, it’s best to ‘fine tune’ an existing SLM or LLM rather than building one from scratch,” he said. “There are many open-source SLMs available today, and many of them have very permissible licenses. This is the way to go about building your own model as of today. This broadly applies to all transformer models.” 

It shouldn’t be an all-or-nothing SLM strategy, said Andrew Brown, senior vice president and chief revenue officer at Red Hat. For one, training a single, general purpose AI model requires a lot of resources.

“Some of the largest models can require some 10,000 GPUs, and those models may already be out of date. In fact, research shows that by 2026, the cost of training AI will be equivalent to the US GDP, which is $22 trillion,” Brown said. “The average CIO doesn’t have a US GDP-level IT budget, nor do they have thousands of spare GPUs lying around. So, what’s the answer? Specialized, smaller AI models driven by open-source innovation.”

One of the big challenges in comparing costs across AI providers is the use of different terms for pricing — OpenAI uses tokens, Google uses characters, Cohere uses a mix of “generations,” “classifications,” and “summarization units,” according to Nambiar, whose company builds AI for business automation.

Nambiar settled on “price per 1,000 tokens” to evaluate varying prices.

Fine tuning an LLM for business purposes means organizations rely on AI providers to host the infrastructure. Nambiar said businesses should plan for a two-to-four month project based on both infrastructure and manpower. Costs typically start at $50,000 or more, Nambiar said.

Fine tuning SLMs will typically be more expensive, because if an organization hosts the opensource model, it will need to spin up the infrastructure – the GPU and/or TPU serves — as well as spend effort on fine-tuning and the labor costs. “Assume it will be more expensive than LLMs,” he said.

Clean data brings reliable results

Whether building your own or using a cloud-based SLM, data quality is critical when it comes to the accuracy. As with LLMs, small models can still fall victim to hallucinations; these occur when an AI model generates erroneous or misleading information, often due to flawed training data or algorithm. They can, however, more easily be fined tuned and have a better chance of being more grounded in an organization’s proprietary data.

As with LLMs, retrieval-augmented generation (RAG) techniques can reduce the possibility of hallucinations by customizing a model so responses become more accurate.

At the same time, due to their smaller size and datasets, SLMs are less likely to capture a broader range of language patterns compared to LLMs — and that can reduce their effectiveness. And though SLMs can be fine-tuned for specific tasks, LLMs tend to excel in more complex, less-well-defined queries because of the massive data troves from which they can pull.

“In short, SLMs offer a more efficient and cost-effective alternative for specific domains and tasks, especially when fine-tuned to use their full potential, while LLMs continue to be powerful models for a wide range of applications,” Thomas said.

Adam Kentosh, Digital.ai’s field CTO for North America, said it is extremely important with SLMs to clean up data and fine tune data stores for better performance, sustainability, and lower business risk and bias.  

AI initiatives have been sliding into the “trough of disillusionment,” something that could be avoided by addressing data quality issues, according to Kentosh.

By 2028, more than 50% of enterprises that have built LLMs from scratch will abandon their efforts due to costs, complexity and technical debt in their deployments.

“One of the biggest challenges we continue to face with existing customers is diverse data sources, even across common areas in software development,” Kentosh said. “For instance, most companies own two or more agile planning solutions. Additionally, there is almost zero consistency as it pertains to releasing software. This makes data preprocessing incredibly important, something that many companies have not been historically good at.”

Getting well curated, domain-specific data that works for fine tuning models is not a trivial task, according to Nambiar. “Transformer models require a specific kind of prompt response pair data that is difficult to procure,” he said.

And, once an organization decides to fine-tune its own SLM, it will have to invest in consistently keeping up with benchmarks that come from the state-of-the-art models, Nambiar said. “With every new LLM model release, the standards of inference capabilities go up, and, thus, if you’re creating your own fine-tuned SLM, you have to also raise the inference capabilities of this model, or else there’s no use case for your model anymore,” he said.

Brown said open-source AI models are not uncommon now, with industry giants such as Meta earlier this year championing the importance of its Llama model being open source. “That’s great news for organizations as these open-source models offer a lot of benefits, such as preventing vendor lock-in, allowing for a broad ecosystem of partners, affordability for the performance and more,” he said. “But unfortunately, none of that really matters if you don’t have the data scientists needed to work with the model.”

Brown described data scientists as unicorns right now — rare and often demanding the pay of a mythical creature, as well. “And rightly so,” he said.

Most organizations can only employ a handful of data scientists at best, whether due to a scarcity of qualified talent or the cost of employing them, “which creates bottlenecks when it comes to effectively training and tuning the model,” he said.

A move to hybrid?

CIOs, Brown noted, have long been moving away from monolithic technologies — starting with the shift from UNIX to Linux in the early 2000s. He believes AI is at a similar turning point and argues that a hybrid strategy, similar to that of hybrid cloud, is most advantageous for deploying AI models. While the large, somewhat amorphous LLMs are in the spotlight today, the future IT environment is 50% applications and 50% SLMs.

“Data lives everywhere, whether it’s on-premises, in the cloud or at the edge. Therefore, data by nature is hybrid, and because AI needs to run where your data lives, it must also be hybrid,” Brown said. “In fact, we often tell customers and partners: AI is the ultimate hybrid workload.

“Essentially, a CIO will have as many AI models as applications. This means that training needs to be faster, tuning needs to speed up and costs need to be kept down. The key to this challenge lies in open source,” he continued. “Just as it democratized computing, open source will do so for AI; it already is.”

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *