Tue, Feb 27, 2024

Google Unveils Gemini: A Multimodal Generative AI Family

In a highly anticipated move, Google has introduced Gemini, its next-generation generative AI model. The unveiling, conducted through a virtual press briefing by the Google DeepMind team, shed light on Gemini's capabilities, its variations, and its potential impact on various applications. The flagship model, Gemini Ultra, promises groundbreaking multimodal capabilities, while Gemini Pro, a lightweight version, is set to make its debut this week. The family also includes Gemini Nano, designed for mobile devices.

Gemini: A Family of AI Models

Image Credit: Google


Google's Gemini is not a singular entity but a family of AI models, comprising three variants:

  1. Gemini Ultra: The flagship model designed for comprehensive multimodal tasks.
  2. Gemini Pro: A lighter version optimized for tasks such as summarization and reasoning.
  3. Gemini Nano: Tailored for mobile devices, available in two sizes – Nano-1 and Nano-2.

Gemini Pro Powers Bard

Gemini Pro takes center stage in Google's ChatGPT competitor, Bard. Sissie Hsiao, GM of Google Assistant and Bard, highlighted Gemini Pro's enhanced reasoning and planning capabilities during the briefing. Users can experience these improvements, particularly in English in the U.S., starting December 13.

Multimodal Prowess of Gemini Ultra

Gemini Ultra, the flagship model, boasts natively multimodal capabilities, surpassing rival models in its ability to comprehend nuanced information in various formats. Eli Collins, VP of Product at DeepMind, emphasized Gemini Ultra's superiority in handling complex tasks, including math and physics problems.

Training Data Mystery

Image Credit : Google


Despite its impressive features, Google remained tight-lipped about Gemini's training data sources. Collins acknowledged that some data came from public web sources, but the specifics and whether creators can opt out or receive compensation remain undisclosed. This approach aligns with other industry players keeping training data proprietary for competitive reasons.

Gemini's Environmental Impact

While Collins claimed Gemini is Google's most efficient large generative AI model, details about the number of chips used for training, costs, and environmental considerations were notably absent from the briefing. Concerns about the carbon footprint of training large models were not addressed, leaving questions about the environmental impact unanswered.

Gemini Ultra's Benchmark Performance

Gemini Ultra's benchmark performance was highlighted during the briefing, with claims of superiority in various academic benchmarks. However, closer inspection reveals marginal improvements over existing models like GPT-4. Questions regarding potential biases, hallucinations, and localization issues also lingered, with Collins admitting that some challenges remain unsolved.

Gemini's Unfulfilled Promise

The launch of Gemini, particularly Gemini Ultra, appears to be met with skepticism. Reports suggest a rushed development, with challenges in handling non-English queries and an undefined monetization strategy. Gemini Ultra's release is limited, with select users gaining early access before a broader rollout next year.

In conclusion, while Google's Gemini holds promise in advancing generative AI capabilities, the lack of transparency, environmental considerations, and unmet expectations surrounding Gemini Ultra's launch leave the industry and consumers eager for more clarity and tangible results.


