The significance of Gemini 3: AI has surpassed the "hallucination stage" and is approaching humans; "human-machine collaboration" will shift from "humans correcting AI errors" to "humans guiding AI's work."
Google’s newly released Gemini 3 model marks a key turning point in the field of artificial intelligence.
As WallstreetCN previously mentioned, on Tuesday the 18th Eastern time, Google officially launched its most powerful and highly anticipated artificial intelligence (AI) model yet: Gemini 3. On the day of release, it was immediately available on Google Search, the Gemini App, and multiple developer platforms, and was integrated into various revenue-generating products.
Google executives emphasized at the press conference that Gemini 3 leads the pack on several popular industry rankings for measuring AI model performance. Demis Hassabis, CEO of Google’s AI research lab DeepMind, stated that Gemini 3 is “the world’s best multimodal understanding model,” and also the company’s most powerful agent and code generation model to date.
According to Wharton professor Ethan Mollick’s in-depth review, the release of Gemini 3 and its companion tool “Antigravity” show astonishing “agent” capabilities. Compared to the GPT-3 model three years ago, AI now no longer just generates text—it can write code, build interactive applications, and execute multi-step tasks.
Mollick points out that this leap from “description” to “action” means AI is evolving from a conversational partner into a general-purpose tool capable of accessing computers and completing real work.
Mollick’s conclusion: We are transitioning from the “chatbot era” to the “digital coworker era.” While Gemini 3 is not flawless, the mistakes it makes are no longer groundless “hallucinations,” but closer to the biases humans have in judgment or understanding intent.
This indicates that the mode of human-machine collaboration is changing, and humans will no longer simply correct AI’s trivial mistakes, but rather provide it with high-level strategic guidance and direction. This may be the biggest shift in the AI field since the launch of ChatGPT.
The following is the full text of Ethan Mollick’s review article:
Three Years: From GPT-3 to Gemini 3
From Chatbot to Agent
Ethan Mollick
November 19, 2025
I have been testing Google’s newly released Gemini 3 model. It is truly impressive. But instead of listing benchmark numbers, I decided to let AI show, firsthand, how far we’ve come in three years—and that’s what I did.
It’s been nearly three years since ChatGPT launched. Days before its release, I wrote my first article on OpenAI’s earlier GPT-3 for this Substack. Then ChatGPT made its debut, and I immediately followed up with another piece: “I am usually cautious about technological predictions, but this time, it will change our world faster and more profoundly than we expect. It’s no longer just automating repetitive or dangerous work—the first jobs to be changed by AI may be those in analysis, creativity, writing, and communication.”
Looking back today, I was right.
(Illustration: screenshot of my first article after trying GPT-3.5 three years ago)
I could verbally explain how vast the gap is between the original ChatGPT and Google’s new Gemini 3, but it’s unnecessary. I tossed that screenshot above to Gemini 3, and simply wrote one sentence:
Me: “Make something that shows everyone how much AI has improved in three years.”
Gemini: “Of course. Your tweet is from November 2022—the eve of the modern AI boom. Back then the biggest ‘wow’ moment was that AI could write a paragraph of fluent text, or compose a silly poem about a ‘candy-powered superluminal spaceship escaping otters.’
To show the progress, I’ll not just describe things with words, but directly make you an interactive, playable ‘candy-powered superluminal spaceship simulator.’
In 2022, AI could only describe the engine; in 2025, AI can write the engine, design the interface, and let you pilot the spaceship yourself.”
It really made a mini-game: I had to escape otters chasing me with candy energy, with silly poems and funny updates along the way. Most readers here probably know AI can write code, but what’s more interesting is—if you stop treating Gemini 3 as a “chat box” and see it as “something else,” what else can it do?A Tool That Writes Code—But Not “Just a Coding Tool”
Alongside Gemini 3, Google also released Antigravity. Programmers will immediately recognize its similarity to Claude Code or OpenAI Codex—it’s given computer access, and can autonomously write programs under instructions.
If you’re not a programmer, you might overlook it—but that would be a mistake. “Can write code” doesn’t just mean “be a programmer,” it means “can complete anything you need a computer for.” These tools are fundamentally redefined.
Gemini 3 is extremely good at writing code, and even if you don't see yourself as a "programmer," this matters to you. A fundamental view in AI is: anything you do on a computer ultimately is code; if AI can write code, it can build dashboards, scrape websites, make PPTs, read files... This makes “code-writing agents” general-purpose tools. Antigravity productizes this idea: I get an “inbox,” assign tasks to agents, and they notify me when they need approval or help.
(Illustration: four agents running at once, one working, one awaiting my feedback)
I don’t use code with them, just English; they use code to work for me. Gemini 3 plans well, knows what to do, when to check in. For example, I put all my historical newsletter drafts into one folder and instructed:
“Make me a beautiful webpage that summarizes all my AI predictions, then search online to see which I got right and which I got wrong.”
It read all the files, ran code, and gave me an editable draft—its first time asking me a question, its understanding was so accurate it surprised me. I tweaked a bit and let it proceed.
It then searched the web, built the site, took over the browser to check the result, and packaged the finished product for me. I gave revision feedback like I would to a real coworker, and it kept iterating.
It's not perfect—the agent isn’t quite there yet. I found no hallucinations, but there were parts I had to correct. However, those mistakes were more like the “decision bias or misunderstandings a human coworker might have,” rather than the outlandish hallucinations of traditional AI. Importantly, I felt able to control the AI’s decisions because the AI regularly checked in and confirmed its work, and my process was completely visible. It felt like managing a teammate, not just conversing with AI via chat.PhD-level Intelligence?
Antigravity isn’t the only surprise. Another shock was—it demonstrated genuine “judgment.”
I often complain: AI benchmarking is a tangled mess. Gemini 3 leads most rankings (maybe it still loses to the $200 GPT-5 Pro, but wait until Gemini 3’s “deep thinking” version drops and it might flip the score). The industry loves the advertising phrase: “PhD intelligence.” I decided to put that to the test.
I fed it a pile of old files from a crowdsourcing research project a decade ago—file names like “project_final_seriously_this_time_done.xls,” format: ancient STATA. The only instruction:
“Figure out the data structure yourself, clean up the STATA files, get them ready for new analysis.”
It actually recovered damaged data and mastered the complex environment.
Then I gave it a classic “second-year PhD paper” assignment, no further hints:
“Great, now write an original paper with these data. Research the field deeply, raise the topic to entrepreneurship or strategy theory, do strict statistical analysis, format it for journal submission.”
It chose its own topic, proposed hypotheses, ran statistics, created charts, formatted the paper—the hardest part, “balancing topic interest and feasibility,” it handled like a tightrope walker. I vaguely said, “Expand and improve,” and finally got a 14-page paper.
(Illustration: first two pages of the paper)
Even more impressive, it invented a metric: used NLP to mathematically compare project descriptions against large datasets, to measure “crowdfunding creativity uniqueness.” It wrote the code and checked the result itself.
So, is this “PhD level?”
If you mean “can do the work of a competent grad student at a top university,” the answer is partly “yes.” But it has grad student flaws: some statistical methods are flawed, theory leaps are big, evidence is not rigorous enough... We have moved beyond the “hallucination” stage and entered a subtler, more “human” zone of deficiencies. What’s interesting is that when I guided it as I would a student (“add more literature on crowdfunding to establish methodology”), it improved dramatically—maybe with more guidance, it could approach true “PhD” level.
What is Gemini 3?
It’s a superb “thinking + execution” partner, instantly available to billions around the world; it’s also a mirror reflecting multiple trends: AI is not slowing down, agents are rising, and humans must learn to manage “smart AI.”
Three years ago, we marveled that “machines can write otter poems”; less than a thousand days later, I’m debating statistical methodology with an agent that built its own research environment.
The chatbot era is giving way to the “digital coworker” era.
That’s right, Gemini 3 is still imperfect, needs a human manager who can direct and check. But “humans in the loop” are evolving from “cleaning up AI’s mess” to “directing AI to do work”—perhaps the biggest paradigm shift since the release of ChatGPT.
Easter Egg:
I asked Gemini to “make me a Substack cover image purely with code, and check the size first.” It searched for the specs, drew the image mathematically, and completed everything in one go.
** obligatory warning: **
Giving AI agents computer access is risky—it might move/delete files or even leak documents without asking. When these tools are designed for non-programmers, things will be much better; for now, proceed with caution.
Risk Warning and DisclaimerThe market has risks; investment requires caution. This article does not constitute personal investment advice and does not take into account the individual investment goals, financial situation, or needs of specific users. Users should consider whether any opinions, views, or conclusions in this article are appropriate for their particular circumstances. Investing accordingly is at your own risk.