close
close

Cohere adds insight to its RAG search capabilities

Cohere adds insight to its RAG search capabilities

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. More information


Cohere has added multimodal embeddings to its search model, allowing users to deploy images in RAG-style enterprise search.

Embed 3, which emerged last year, uses embedding models that transform data into numerical representations. Embeddings have become crucial in retrieval augmented generation (RAG) because companies can make embeds of their documents that the model can compare to get the information requested by the message.

The new multimodal version can generate embeddings in both images and texts. Cohere claims that Embed 3 is “now the most capable multimodal embedding model on the market.” Aidan Gomez, co-founder and CEO of Cohere, posted a graph on X showing performance improvements in image search with Embed 3.

“This breakthrough allows companies to unlock the true value of their vast amount of data stored in images,” Cohere said in a blog post. “Companies can now build systems that accurately and quickly search important multimodal assets such as complex reports, product catalogs and design files to increase workforce productivity.”

Cohere said a more multimodal approach expands the volume of data companies can access through a RAG search. Many organizations often limit RAG searches to structured and unstructured text despite having multiple file formats in their data libraries. Customers can now enter more graphics, charts, product images and design templates.

Performance improvements

Cohere said Embed 3 encoders “share a unified latent space,” allowing users to include images and text in a database. Some image embedding methods often require maintaining a separate database for images and text. The company said this method leads to more mixed-modality searches.

According to the company, “Other models tend to group text and image data into separate areas, leading to weak search results that are biased toward text-only data. Embed 3, on the other hand, prioritizes meaning behind the data without biasing towards a specific modality.”

Embed 3 is available in over 100 languages.

Cohere said multimodal Embed 3 is now available on its platform and Amazon SageMaker.

Playing catch up

Many consumers are quickly becoming familiar with multimodal search, thanks to the introduction of image-based search on platforms like Google and chat interfaces like ChatGPT. As individual users become accustomed to searching for information from images, it makes sense that they would want the same experience in their work lives.

Enterprises have also started to see this benefit, as other companies that offer embedding models offer some multimodal options. Some model developers, such as Google and OpenAI, offer some form of multimodal embedding. Other open source models may also facilitate image embeddings and other modalities. The battle is now in the multimodal embedding model that can operate at the speed, accuracy and security that businesses demand.

Cohere, which was founded by some of the researchers responsible for the Transformer model (Gómez is one of the authors of the famous “Attention is all you need” paper), has struggled to be top of mind for many in business space It updated its APIs in September to allow customers to easily switch from competitor models to Cohere models. At the time, Cohere had said the move was to align with industry standards where customers often switch between models.