Microsoft and Google release new AI models on the same day: voice, image, and local open-source capabilities all in action.

Microsoft and Google release new AI models on the same day: voice, image, and local open-source capabilities all in action.

```

Microsoft and Google both announced new AI models on Thursday, but with clear differences: Microsoft released a new foundational model, MAI, which is only available through its Azure Foundry and US-only MAI Playground platforms; meanwhile, Google launched the brand new Gemma 4 open-source model, which can run locally. Additionally, Google has changed the license of these new open-source models to Apache 2.0.

Three “world-class” self-developed MAI models

Microsoft’s “world-class” self-developed MAI models include three versions:

First is MAI-Transcribe-1, an “advanced” speech-to-text model that can understand the 25 most widely used languages globally. Its batch transcription speed is 2.5 times faster than Microsoft’s existing Azure Fast solution.

Second is MAI-Voice-1, a new speech generation model that can produce 60 seconds of audio in just 1 second. It also supports creating custom voices in Microsoft Foundry using short audio samples.

Lastly is MAI-Image-2, a faster text-to-image model, which has begun rolling out in Copilot and will soon be available in Bing and PowerPoint.

Microsoft stated:

“We are rapidly deploying these top-tier models to support our consumer and commercial products. Soon you will see more models in Foundry and across various Microsoft products and experiences.”

Google Releases Gemma 4 Open-Source Model

Google’s Gemma 4 open-source model is licensed under Apache 2.0, rather than the previous custom Gemma agreement. Google states these models feature advanced reasoning capabilities, agent workflows, code generation, as well as visual and audio generation abilities. There are four different versions, optimized for local deployment, and can even run on “billions of Android devices.”

Google stated:

“Gemma 4 is built on the same world-class research and technology as Gemini 3, and is currently the most powerful series of models you can run on local hardware. They complement our Gemini models, offering developers the industry’s strongest combination of open-source and proprietary tools.”

Among them, the larger 26B and 31B versions of Gemma 4 are designed to run on consumer-grade GPUs, suitable for powering IDEs, coding assistants, and agent workflows. The lighter E2B and E4B versions focus more on multimodal capabilities and low-latency processing, suitable for mobile devices and IoT devices (including Raspberry Pi). These models also support fully offline operation.

Google’s Gemma 4 open-source models can be downloaded from multiple platforms, including Hugging Face, Kaggle, and Ollama. Google emphasized:

“These models adhere to the same strict security protocols for infrastructure as our proprietary models.”

More updates coming soon

Risk Warning and DisclaimerThe market carries risks, and investment requires caution. This article does not constitute personal investment advice and does not consider the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions herein are suitable for their particular situation. Investments made based on this article are at your own risk. ```