Who’s Behind xAI Grok 3, Elon Musk’s ‘Maximally Truth-Seeking A.I.”


Elon Musk wearing a black baseball cap and a black jacket.
Tesla and SpaceX CEO Elon Musk. Andrew Harnik/Getty Images

On Feb. 18, Elon Musk’s xAI unveiled Grok 3, marking the startup’s foray into the advanced A.I. reasoning race dominated by players like Google (GOOGL), OpenAI, and China’s DeepSeek. In a live stream on X, Musk described Grok 3 as a “maximally truth-seeking A.I.” that prioritizes accuracy even when it challenges political correctness and a “major leap forward in reasoning and computational efficiency.”

Musk claims Grok 3 outperforms top A.I. models including GPT-4o, Gemini 2 Pro and DeepSeek-V3 on internal evaluations and scored over 1,400 points on LMArena’s Chatbot Arena—an open-source A.I. benchmarking leaderboard developed by UC Berkeley’s SkyLab.

Grok 3 introduces a new “Think Mode” feature for real-time problem-solving and “Big Brain Mode” for computation-heavy tasks. A standout feature is DeepSearch, an A.I.-powered research tool designed to rival Google Search and A.I.-search alternatives including OpenAI’s Deep Research, DeepSeek’s Search Mode and Perplexity AI’s Pro Search.

The A.I. model was built by xAI’s elite team of former Big Tech researchers and engineers. Key members include Jimmy Ba, a former student of A.I. pioneer Geoffrey Hinton and Yuhuai “Tony” Wu, a former researcher at Google DeepMind. Leading the engineering effort is Igor Babuschkin, a former engineer at OpenAI, whom Musk personally recruited to build a ChatGPT rival. Babuschkin previously served as X’s senior director of engineering.

Grok 3 is currently exclusive to members of X Premium+, which costs $40 per month in the U.S. Access to advanced features like DeepSearch and Think Mode reasoning costs an extra $30 per month.

What’s behind Grok 3′ impressive capabilities?

According to xAI’s blog, Grok 3 leverages Test-Time Compute at Scale (TTCS), a specific implementation of test-time scaling, to power its reasoning. This machine learning strategy enables the A.I. model to dynamically adjust computational resources, ensuring higher accuracy for complex queries while maintaining speed for simpler tasks.

Grok 3’s TTCS approach could unlock groundbreaking discoveries, including a potential cure for lung cancer if combined with dedicated compute clusters for extended reasoning periods, said Jeetu Patel, chief product officer at Cisco, in a LinkedIn post.  “This is the first publicly shared use of Test-time compute cluster at an unprecedented scale, with a reasoning model that is multi-modal and can consume real-time data,” he wrote.

Moreover, Grok 3’s development involved Colossus, a massive supercomputer cluster built by xAI in Memphis, Tenn. The system houses 200,000 Nvidia H100 GPU accelerators. xAI utilized 100,000 Nvidia H100 GPUs from Colossus to train Grok 3, which delivered 200 million GPU hours, a tenfold increase over the training setup used for Grok 2. Musk said last year the next generation of Colossus training cluster will be five times more powerful.

Some are skeptical about the sustainability of xAI’s rapid progress, Musk’s lofty claims about Grok 3, and its ability to compete with A.I.-search rivals like OpenAI’s Deep Research.

“Scaling up computing power raises costs and makes model deployment too expensive, it is an unsustainable approach for business in the long run,” Lin Qiao, a former senior engineering director at Meta and now CEO of the A.I. infrastructure platform Fireworks AI, told Observer.

“Models absorb everything they’re trained on, including biases. I don’t believe an A.I. can ever be truly neutral,” Inna Tokarev Sela, a former machine learning executive at SAP and now CEO of the data intelligence platform Illumex, told Observer. She noted that Grok 3 can match Google Search on general topics—if xAI keeps accelerating training. “When it comes to specialized topics, Google Search doesn’t have an edge over domain-specific A.I. models (Grok), trained on focused data,” she said.

Likewise, early testing of Grok 3 by Andrej Karpathy, a founding member of OpenAI and former senior director of A.I. at Tesla, suggests that while DeepSearch is better than Google’s Gemini models, it struggles with occasional hallucinations of citations and URLs. “The impression I get of DeepSearch is that it’s approximately around Perplexity DeepResearch offering, but not yet at the level of OpenAI’s recently released ‘Deep Research,’ which still feels more thorough and reliable,” he wrote in an X post.

Behind xAI Grok 3, Elon Musk’s ‘Maximally Truth-Seeking A.I.”





<

Leave a Comment