xAI by Elon Musk Unveils Grok 3 – An Overview of What Sets It Apart From Other LLMs
The progress made in a year and a half is astonishing.
Elon Musk's xAI has just unveiled Grok 3, the latest iteration of Grok. Grok 3 seems to be proof that the laws of scale are not over. This opinion continues to gain ground in Silicon Valley, and it has to be said that several elements fuel this opinion. Starting with the computing capacity dedicated to Grok 3 training.
In the spring of 2024, xAI set out to build a cluster of 100,000 H100 GPUs. Faced with the 18 to 24-month lead times announced by data center suppliers, the company had invested in a former Electrolux factory in Memphis (Tennessee). It took around four months to install 100,000 GPUs (120 MW). And three more to increase to 200,000 (250 MW). Tesla Megapack batteries were deployed to back up the generators and absorb the fluctuations in demand inherent in the Grok 3 drive.
xAI says it is working on a cluster “5 times more powerful”, in this case, 1.2 GW. Elon Musk mentions NVIDIA GB200 accelerator cards, which combine a Grace CPU and Blackwell GPUs.
Reasoning and inference scaling
The inference scaling much talked about with DeepSeek, also appears beneficial to Grok 3. It goes hand in hand with its reasoning capabilities. xAI has integrated it in the form of a “Big Brain” mode. By activating it, the model is given more time and computational resources to deepen its thinking. This reflection is the subject of a chain of thought accessible to the user. But not in its entirety. The aim is to avoid distillation. That is, using the outputs of one model to drive another. A practice of which OpenAI - and, with it, Washington - accused DeepSeek.
The latter used almost exclusively reinforcement learning (without fine-tuning) to develop the reasoning capacities of its latest models. xAI does not reveal its recipe but seems to have followed the same path. However, it claims to have limited the exercise to math and code problems, with Grok 3 then managing to generalize.
Following on from reasoning skills, xAI introduces, like many others before it, “deep search” functionality. The promise, by now, is well known: to perform in a few minutes tasks that could take several hours. The model cites its sources and displays its stages of progress.
Grok 3 stands out from the competition
xAI announces evaluation results on the AIME (math), GPQA (science), and LiveCodeBench (coding) benchmarks, but does not specify the conditions under which they were carried out. For more meaningful indicators, we can turn to Chatbot Arena. Here's how it works: the user submits a query, receives responses from two models, and selects the best one. After around 8,000 evaluations, Grok 3's ELO exceeded 1,400, putting it in the lead.
Towards a sharper separation between Grok and X
The rollout of Grok 3 on X began on Monday, February 17, 2025, for Premium+ subscribers. It will also be available in the Grok mobile app, currently available on iOS. The same app will also allow subscribers to take out a specific subscription called SuperGrok. Expected to cost $30/month ($300/year), it will give access to more queries with reasoning and in-depth search. As well as unlimited image generation.
The most recent versions of Grok 3 will not be available on the mobile application but on the grok.com website. xAI intends to make the mini version of Grok 3 available “free to all in the next few days”. A native voice mode (without using the text modality) should follow before Grok 3 becomes available on the API. As for the opening of Grok-2 weights, it's a matter of months, according to xAI.
Grok 3 still suffers from inconsistencies
During its presentation, Grok 3 showed itself to be more at ease with physical simulation (calculating and rendering, in a 3D plane, a viable trajectory for a round trip between Earth and Mars) than with the creation of a video game. The latter was to combine the rules of Tetris (among other things, make all complete lines disappear) and Bejeweled (make all alignments of three jewels of the same color disappear).
The reasoning mode enables Grok 3 to solve certain problems that state-of-the-art models are unable to solve. For example, determining the amount of computation used to train GPT-2 from the scientific article devoted to it by OpenAI. Or, more prosaically, identifying that 9.11 < 9.9 (which is not obvious for LLM) and that there are three r's in “strawberry”...
On Simon Willison's (co-creator of Django) famous test consisting of generating a 2D vector image of a pelican riding a bicycle, Grok 3 doesn't fare as well as Claude, among others. Humor is not his strong point either.
Bluffing results for xAI in just a year and a half of activity
In the space of a year and a half, xAI will have achieved - if its announced performance is anything to go by - the level of state-of-the-art models. The company officially launched its activities in July 2023. Five months later, it opened its Grok chatbot in beta. The underlying LLM was Grok-1. It was the result of training Grok-0 (33B) and then improving its coding and, already, reasoning skills. It was now up to the level of GPT-3.5. Even Claude 2 in language processing.
At the end of the year, the chatbot arrived on X, for Premium+ subscribers. In March 2024, xAI published the weights of Grok-1, in a base version dated October 2023. We noted the adoption of a MoE (Mixture of Experts) architecture, in which specialized models coexist and are activated according to queries. Released shortly afterward, Grok-1.5 had, at least on paper, improved problem-solving capabilities. At the same time, the context window was extended from 8k to 128k. xAI then added vision.
Then in the summer of 2024 came the Grok-2 beta, for X Premium subscribers. Its ELO on Chatbot Arena (around 1,280) was below that of GPT-4o and Gemini 1.5. Grok was opened up to all social network users in December 2024, with the addition of web search and quotes, as well as a new model for image generation.