close
close

New IBM Granite 3.0 AI models show strong benchmark performance

New IBM Granite 3.0 AI models show strong benchmark performance

IBM has just announced a new collection of AI models, the third generation of LLM Granite. The foundation models of the new collection are the Granite 3.0 2B Instruct and Granite 3.0 8B Instruct models (Train means that these models can understand and execute instructions more accurately). The models were trained on over 12 trillion tokens in 12 different human languages ​​and 116 different programming languages. All these models come with an Apache 2.0 open source license. It is also important to note that IBM Granite models are indemnified against legal issues with training data when used in the IBM watsonx AI platform.

Business uses for smaller models of granite

IBM designed the new 2B and 8B Granite models to handle a wide range of common enterprise tasks. Think of these patterns as basic tools for everyday jobs like summarizing articles, finding important information, writing code, and creating explanatory documents. Patterns also perform well in common language tasks such as entity extraction and recovery generation, which improves text accuracy. According to IBM, by the end of 2024 Granite 3.0 models will be able to understand documents, interpret diagrams and answer questions about a graphical interface or product screen.

AI agents are becoming increasingly important, and creating agentic use cases is a new capability for Granite 3.0 that was not previously available in IBM language models. Agentic use cases can proactively identify needs, use tools, and initiate actions within predefined parameters without human intervention. Typical agency use cases are virtual assistants, customer service, decision support and referrals, and a variety of other complex tasks.

AI speculative decoders are also a new IBM offering. Decoders optimize the text generated by an LLM by making assumptions about the identification of future tokens. IBM’s speculative decoder called Granite 3.0 8B Accelerator can speed up text generation by up to 2x during inference.

Granite 3.0 models will get another update in a few weeks. IBM will increase its context size from 4,000 to 128,000 tokens, which is a key enabler for longer conversations as well as the aforementioned RAG tasks and agentic use cases. By the end of the year, IBM plans to add vision models, which will increase their versatility and allow them to be used in more applications.

Benchmarks for performance and cyber security

Hugging Face’s LLM Ranking evaluates and ranks open-source LLMs and chatbots based on benchmark performance. The chart above shows how the IBM Granite 3.0 8B Instruct compares to the Llama 3.1 8B Instruct and the Mistral 7B Instruct. The Granite 3.0 2B Instruct model performs as well as other top models.

IBM Research’s cybersecurity team helped identify high-quality data sources that were used to train the new Granite 3.0 models. IBM Research also helped develop the public and proprietary benchmarks needed to measure the model’s cybersecurity performance. As shown in the graph, the IBM Granite 3.0 8B Instruct model was the best performer in all three cybersecurity benchmarks against the same Llama and Mistral models mentioned above.

Future Mix Designs by Granite Experts

At some point in the future, IBM plans to release several smaller and more efficient models, including the Granite 3.0 1B A400M, a 1 billion parameter model, and the Granite 3.0 3B A800M, a 3 billion parameter model. Unlike the Granite 3.0 models discussed above, future models will not be based on the dense transformer architecture, but will use a mixed expert architecture.

The MoE architecture divides a model into several specialized expert subnets for greater efficiency. MoE models are small and light, yet considered to be best in class for efficiency with a good balance between cost and power. These models use a small fraction of their total parameters for inference. For example, the MoE model with 3 billion parameters uses only 800 million parameters during inference, and the MoE model with 1 billion parameters uses only 400 million parameters during inference. IBM developed them for applications such as edge server deployments and CPUs.

In 2025, IBM plans to expand its largest MoE architecture models from 70 billion parameters to 200 billion parameters. Initially, the models will have language, code and multilingual capabilities. Vision and audio will be added later. All these future Granite models will also be available under Apache 2.0.

Granite Guardian models

Along with Granite 3.0 models 2B and 8B, IBM also announced a Granite Guardian 3.0 model, which acts as a buffer for the inputs and outputs of other Granite 3.0 models. When monitoring inputs, Granite Guardian looks for jailbreaking attacks and other potentially harmful requests. To ensure safety standards are met, Granite Guardian also monitors LLM results for bias, fairness and violence.

These models also provide hallucination detection for ground tasks that anchor model results to specific data sources. In a RAG workflow, Granite Guardian checks whether a response is based on the provided grounding context. If the response is not justified in the context, the model flags it as an exception.

By 2025, IBM plans to reduce the size of Granite Guardian models to somewhere between 1 billion and 4 billion parameters. The reduction in model size makes them more versatile and affordable. It will also enable wider deployment in various industries and applications such as high-end devices, healthcare, education and finance.

Continued evolution of IBM Granite models

IBM Granite 3.0 models are high-performance open source models with benchmarks to support their performance and security. IBM plans to add new developer-friendly features to these models, such as structured JSON requests. As with previous Granite models, updates will be made regularly to ensure the models remain current. This means we can be on the lookout for a conveyor belt of new features as they are developed. Unlike some of the competitive open-source models with custom licenses, Granite’s lack of restrictions on the Apache 2.0 license makes them adaptable for a wide variety of applications.

It looks like IBM has big plans for the future of the entire Granite 3.0 collection.