NVIDIA officially announces new partnership achievement: Mistral open-source models accelerate, improving efficiency and accuracy at any scale.
```
Nvidia disclosed a major breakthrough achieved in collaboration with French AI startup Mistral AI on Tuesday, June 2, Eastern Time. By leveraging Nvidia's latest chip technology, the new members of Mistral AI's open-source model family have achieved leaps in performance, efficiency, and deployment flexibility.
The core of this collaboration's achievement is that the large model Mistral Large 3 achieved a tenfold performance increase compared to the previous generation H200 chip on Nvidia's GB200 NVL72 system. This leap in performance translates into a better user experience, lower per-response costs, and higher energy efficiency. The model can process more than 5 million tokens per second per megawatt (MW) of energy consumption.
In addition to large models, the Ministral 3 small model series has also been optimized for Nvidia edge platforms and can run on RTX PCs, laptops, and Jetson devices. This allows enterprises to deploy AI applications anywhere from the cloud to the edge, without relying on a continuous network connection.
The new model family released by Mistral AI on Tuesday includes a large frontier model and nine small models, all accessible through open-source platforms like Hugging Face and major cloud providers. Industry insiders believe this series of releases marks a new stage of "distributed intelligence" for open-source AI, bridging the gap between research breakthroughs and practical applications.
GB200 system powers breakthrough in large model performance
Mistral Large 3 is a Mixture of Experts (MoE) model with 67.5 billion total parameters and 41 billion active parameters, as well as a 256,000-token context window. This architecture is characterized by only activating the most relevant parts of the model for each token rather than all the neurons, enabling efficient scaling while maintaining accuracy.
Nvidia says that by using a range of optimization techniques tailored for advanced large MoEs, Mistral Large 3 achieved best-in-class performance on Nvidia's GB200 NVL72.

Nvidia achieved the performance breakthrough through three key technical optimizations. The first is Wide Expert Parallelism, which fully utilizes NVLink's coherent memory domain via optimized MoE kernels, expert allocation, and load balancing. The second is NVFP4 low-precision inference technology, which reduces compute and memory costs while maintaining accuracy. The third is the Dynamo distributed inference framework, which improves long text processing performance by separating prefill and decoding stages.
The model is compatible with mainstream inference frameworks such as TensorRT-LLM, SGLang, and vLLM. Developers can deploy the model flexibly on Nvidia GPUs of different scales through these open-source tools, choosing suitable precision formats and hardware configurations for their needs.
Small models target edge device deployment
The Ministral 3 series includes nine dense, high-performance models covering three parameter sizes: 3 billion, 8 billion, and 14 billion. Each size offers a base, instruct, and inference variant. All variants support vision capabilities, process context windows of between 128,000 and 256,000 tokens, and support multiple languages.
These small models can achieve inference speeds of up to 385 tokens per second on Nvidia RTX 5090 GPUs. On Jetson Thor devices, the vLLM container can reach 52 tokens per second for single concurrency and scale up to 273 tokens per second for eight concurrent threads.
Nvidia optimized the edge performance of these models in collaboration with Ollama and llama.cpp. Developers can run these models on Nvidia edge platforms such as GeForce RTX AI PCs, DGX Spark, and Jetson devices, enabling faster iteration, lower latency, and stronger data privacy protection.
Since a single GPU is sufficient to run them, Ministral 3 can be deployed on robots, autonomous drones, cars, phones, and laptops. Such deployment flexibility allows AI applications to run in environments with limited or no network connectivity.
Commercialization of Mistral's new model family speeds up
The new model series released by Mistral AI on Tuesday is the company's latest move to catch up with leading AI labs such as OpenAI, Google, and DeepSeek. Founded in 2023, the company completed a €1.7 billion financing round last September, with Dutch chip equipment maker ASML contributing €1.3 billion and Nvidia also participating, resulting in a valuation of €11.7 billion.
Mistral AI's co-founder and Chief Scientist Guillaume Lample stated that although large closed-source models perform better in initial benchmarks, after targeted fine-tuning, small models often match or even surpass large models in enterprise-specific use cases. He emphasized that the vast majority of enterprise use cases can be solved by fine-tuned small models, at lower cost and faster speeds.
Mistral AI has begun accelerating its commercialization process. On Monday, the company announced an agreement with HSBC to provide the multinational bank with model access for tasks ranging from financial analysis to translation. In addition, the company has signed contracts worth hundreds of millions of dollars with multiple enterprises and is also branching into physical AI, cooperating with Singapore's Home Team Science and Technology Agency, German defense technology startup Helsing, and automaker Stellantis on robotics, drones, and in-vehicle assistant projects.
Mistral Large 3 and Ministral-14B-Instruct are now available to developers via Nvidia's API catalog and preview API. Enterprise developers will soon be able to easily deploy these models on any GPU-accelerated infrastructure using Nvidia NIM microservices. All Mistral 3 family models can be downloaded from Hugging Face.
Risk Disclaimer and Legal NoticeThe market has risks, and investment needs caution. This article does not constitute personal investment advice and does not take into account the specific investment objectives, financial standing, or needs of any individual user. Users should consider whether any opinion, viewpoint, or conclusion in this article is suitable for their particular circumstances. Investment based on this article is at your own risk. ```