Google proclaims Gemma, its open LLMs that may run locally

Google has released two models from its family of lightweight, open models called Gemma.

While Google’s Gemini models are proprietary, or closed models, the Gemma models have been released as “open models” and made freely available to developers.

Google released Gemma models in two sizes, 2B and 7B parameters, with pre-trained and instruction-tuned variants for every. Google is releasing the model weights in addition to a collection of tools for developers to adapt the models to their needs.

Google says the Gemma models were built using the identical tech that powers its flagship Gemini model. Several firms have released 7B models in an effort to deliver an LLM that retains useable functionality while potentially running locally as an alternative of within the cloud.

Llama-2-7B and Mistral-7B are notable contenders on this space but Google says “Gemma surpasses significantly larger models on key benchmarks,” and offered this benchmark comparison as evidence.

Benchmark results of Gemma-7B vs Llama-2-7B and Llama-2-12B. Source: Google

The benchmark results show Gemma beats even the larger 12B version of Llama 2 in all 4 capabilities.

The really exciting thing about Gemma is the prospect of running it locally. Google has partnered with NVIDIA to optimize Gemma for NVIDIA GPUs. If you have got a PC with one among NVIDIA’s RTX GPUs, you may run Gemma in your device.

NVIDIA says it has an installed base of over 100 million NVIDIA RTX GPUs. This makes Gemma a sexy option for developers who are attempting to determine which lightweight model to make use of as a basis for his or her products.

NVIDIA can even be adding support for Gemma on its Chat with RTX platform making it easy to run LLMs on RTX PCs.

While not technically open-source, it’s only the usage restrictions within the license agreement that keep Gemma models from owning that label. Critics of open models point to the risks inherent in keeping them aligned, but Google says it performed extensive red-teaming to make sure that Gemma was protected.

Google says it used “extensive fine-tuning and reinforcement learning from human feedback (RLHF) to align our instruction-tuned models with responsible behaviors.” It also released a Responsible Generative AI Toolkit to assist developers keep Gemma aligned after fine-tuning.

Customizable lightweight models like Gemma may offer developers more utility than larger ones like GPT-4 or Gemini Pro. The ability to run LLMs locally without the fee of cloud computing or API calls is becoming more accessible day by day.

With Gemma openly available to developers it’s going to be interesting to see the range of AI-powered applications that might soon be running on our PCs.