Year of Intelligence
Dec 31, 2025

This is a review of 2025, but also my predictions for 2026 and how everything will change. These are my opinions. I'll be wrong on some of them. But I think it's worth putting stakes in the ground.
Take a moment to appreciate how absurd this year has been.
It's been just over a year since OpenAI dropped o1. Its the model that introduced "reasoning" as a product category. Fourteen months ago, watching a model think before answering felt like science fiction. Today, GPT-5.2 Pro is research-grade intelligence. Claude Opus 4.5 is the best coding model on the planet—and it's not particularly close. We have models that can use computers, voice AI that's indistinguishable from humans, and enterprise spending on generative AI that went from $1.7 billion to $37 billion in 24 months.
2025 wasn't the year AI arrived - that was 2023. It wasn't the year AI got good - that was 2024. 2025 was the year AI got fast. Fast to improve. Fast to deploy. Fast to change the assumptions we'd built careers on.
Looking back, a few themes defined the year: the efficiency revolution (smaller models, better results), the global race heating up, voice and audio AI crossing the uncanny valley, agents learning to actually do things, and money—so much money—finally flowing from hype into production and leading the world to bubble allegations.
Let me walk you through it.
The State of Frontier Models:

A year ago, "reasoning models" were a novelty. OpenAI's o1 could think step-by-step, but it was slow, expensive, and felt more like a research preview than a product.
Fast forward to December 2025:
Claude Opus 4.5 from Anthropic is, in my view, the best model available for complex work—especially coding. It doesn't just autocomplete; it architects. It reads your codebase, understands intent, plans multi-file changes, and executes. I've talked to developers who describe working with it less like "using a tool" and more like "pair programming with someone who never gets tired." It's unbeatable right now.
GPT-5.2 Pro represents OpenAI's answer—research-grade intelligence optimized for deep analysis, long-context reasoning, and scientific work. It's iterative rather than revolutionary compared to earlier GPT-5 releases, but the ceiling keeps rising.
Gemini 3 (November) showed Google isn't out of the race. Their multimodal capabilities—especially for visual reasoning—remain best-in-class in some benchmarks.
And then there's Grok 4 and 4.1 from xAI, which iterated faster than anyone expected. Say what you will about Musk—his team shipped.
The pattern that emerged: release cycles compressed dramatically. What used to take 12-18 months now takes 3-6. The "next big model" stopped being an event and started being a regular occurrence.
My prediction for 2026: This pace continues, but the gains feel smaller. I think we're approaching diminishing returns on the current paradigm. Something architecturally new will emerge—not just "bigger transformer." The pressure for efficiency will force real innovation.
Tiny but mighty intelligence
For years, the playbook was simple: more parameters, more data, more GPUs, better models. 2025 broke that assumption.
The "scaling laws" that defined the GPT era might have started showing diminishing returns at the frontier. Meanwhile, smaller models kept getting dramatically better. Alibaba's Qwen3 family and Google's Gemma 3n proved that a 7B-parameter model in 2025 could outperform a 175B-parameter model from 2024.
This isn't academic. It means:
- AI running on your phone
- AI in your browser, no API calls
- AI on edge devices, in cars, in appliances
- Dramatically lower inference costs
Llama 4 (Meta, April) and Qwen3 (Alibaba, April) both delivered flagship performance with fully open weights. The moat around closed-source models narrowed considerably. If you're building an AI product today, you have real options beyond OpenAI and Anthropic.
My prediction for 2026: On-device AI becomes standard. Apple finally partners with a major lab—probably Anthropic or OpenAI—to ship a Siri that actually works. LLM-powered, on-device, private. The "your phone is smart" era finally arrives, years after it was promised.

The DeepSeek moment
January's biggest story wasn't a U.S. lab.
DeepSeek-R1 appeared on arXiv with a paper titled "Incentivizing Reasoning Capability in LLMs via Reinforcement Learning." The model demonstrated frontier-class reasoning at dramatically lower cost, using techniques that emphasized efficiency over brute-force scale. It was open-weight. And it came from China.
The response was seismic. Within days, Nvidia's market cap shed hundreds of billions. The U.S. government floated banning DeepSeek entirely by March. The lab got hit with cyberattacks so severe they had to limit signups.
The comfortable assumption that American labs held an insurmountable lead while is not completely gone but definitely under threat.
By August, DeepSeek released V3.1. The gap hadn't just closed—it had inverted in some domains.
The takeaway: Compute isn't everything. Algorithmic innovation can leapfrog hardware advantages.
My prediction for 2026: China accelerates further. Alibaba, Baidu, ByteDance, and new entrants close remaining gaps. The "two-track" AI world becomes entrenched—separate ecosystems, separate supply chains, separate regulatory regimes. DeepSeek specifically becomes a household name.

Voice and Audio
ElevenLabs had a 2025 that other startups only dream about.
Eleven v3 launched as the most expressive text-to-speech model ever publicly available—with audio tags, multi-speaker dialogue, emotion control, and support for 70+ languages. For the first time, synthetic voice became genuinely difficult to distinguish from human recording in blind tests. Not "pretty good for AI." Indistinguishable.
They raised $180M at a $3.3B valuation and kept shipping:
- Scribe (February): Speech-to-text competitive with Whisper
- ElevenReader Publishing (December): Authors generating and selling AI audiobooks
- ElevenLabs Agents (November): Conversational AI platform, rebranded and expanded
- Iconic Voice Marketplace: Licensed voices from recognizable figures
But it wasn't just generation—transcription and dictation hit a new level too.
Tools like SuperWhisper and Wispr Flow made voice-to-text so accurate and fast that I know developers who've stopped typing entirely for first drafts. The friction of "speaking to your computer" disappeared. You talk, it transcribes, it's accurate. Done.
This matters more than people realize. Voice is the interface layer most humans actually prefer. We've been typing because we had to, not because we wanted to.
My prediction for 2026: Voice-first workflows become mainstream for knowledge workers. Not everyone—but a meaningful chunk of emails, documents, and code comments get dictated rather than typed. The tooling catches up to the capability.
AI Hardware: Waiting for the iPhone Moment
Here's an uncomfortable truth: we still don't know what AI hardware should look like.
2024 and 2025 saw a parade of attempts. Most of them flopped. The ones that didn't are still... limited.
The Humane AI Pin launched in April 2024 with breathless hype—a screenless, voice-first wearable that would replace your phone. It shipped to brutal reviews. The projector was unreadable in sunlight. The battery died in hours. The AI was slow. It felt like a $700 proof-of-concept that escaped from the lab too early.
Rabbit R1 had a similar arc. A cute orange device with a physical scroll wheel, promising an "AI-native" interface. People bought it, played with it for a week, and put it in a drawer. Turns out "talk to a box" isn't a compelling interaction model when your phone already does it better.
The pendant wave emerged next—Limitless (formerly Rewind), Tab, Friend, Plaud Note. The pitch: wear a microphone, record everything, let AI summarize your life. Some of these are genuinely useful for meeting notes. But let's be honest: they're glorified microphones with a cloud subscription. The intelligence isn't in the device; it's in a server somewhere processing your audio.
Meta's Ray-Ban smart glasses are probably the closest thing to a hit. They look like normal glasses. The camera and audio quality are decent. The AI integration (via Meta AI) is improving. But they're still primarily a camera and speakers you wear on your face—not a new computing paradigm.
Brilliant Labs' Frame glasses attempted something more ambitious—a heads-up display with AI integration. Interesting concept, but early reviews suggest the display is hard to see and the use cases remain niche.

Here's the thing: this is exactly what computing looked like before the iPhone.
Remember the Apple Newton? The Palm Pilot? Windows Mobile phones with styluses? BlackBerries with their tiny keyboards? Each one was a genuine attempt to make portable computing work. Each one had passionate users. And each one was a stepping stone to something that hadn't been invented yet.
We didn't know we needed a capacitive touchscreen with no keyboard until Steve Jobs showed us. We didn't know we needed an App Store until it existed. The form factor, the interaction model, the ecosystem—it all had to be invented together.
AI hardware is in that pre-iPhone phase right now. We know the ingredients exist: powerful on-device models, great microphones and cameras, always-on connectivity, voice interfaces that actually work. But nobody has assembled them into a product that feels inevitable.
The current attempts share a common flaw: they're AI-as-accessory, not AI-as-primary-interface. The Pin and R1 tried to replace the phone and failed. The pendants and glasses augment the phone but don't transform anything. I think we're waiting for someone to figure out the right combination.
Maybe it's glasses that feel like glasses but see what you see and know what you need. Maybe it's earbuds that are always listening (with consent) and always helpful. Maybe it's something entirely different—a form factor we haven't imagined yet.
My prediction for 2026: We get more Newtons. More interesting failures. More "almost but not quite" products. The breakthrough probably doesn't come until 2027 or 2028. But when it does, it'll be obvious in retrospect—and everyone will wonder why it took so long.
The company that figures this out might be Apple (finally shipping useful AI glasses?), Meta (iterating on Ray-Bans), or a startup we haven't heard of yet. My gut says it won't be the incumbents—they're too committed to existing form factors.
Until then, we're in the experimentation phase. And honestly? That's fine. The Newton had to exist for the iPhone to exist.
Physical Intelligence: Robots Learning to Learn
2025 was the year robotics companies started betting big on the data flywheel. 1X began selling their robots, and others followed suit, all chasing the promise that more deployed robots means more data means better policies. There's been plenty of skepticism—learning robust robot policies still seems genuinely hard, and data scarcity remains a real bottleneck.
But Physical Intelligence's work this year makes me cautiously optimistic that there's a way out. π*0.6 directly addresses the core limitation of imitation learning: policies trained on demonstrations alone struggle to recover from their own mistakes, leading to compounding errors. By combining demonstrations with corrections and reinforcement learning from autonomous experience, they've shown robots can actually improve through practice—more than doubling throughput on hard tasks like espresso-making and box assembly.
Even more exciting is their work on human-to-robot transfer. The finding is striking: as you scale up the diversity of robot pre-training data, the model's ability to leverage human egocentric video emerges without any explicit transfer mechanism. The latent representations naturally align. This suggests we might not need perfectly curated robot demonstrations for everything—scaled-up foundation models could learn to extract useful signal from the massive amounts of human video that already exist.
My prediction for 2026: If these two ingredients—learning from experience and emergent human-to-robot transfer—continue to improve with scale (along with many others that I have no idea about), 2026 might finally have all the pieces required for robotics' GPT moment.

Agents Learn to Use Computers
The most underrated story of 2025 wasn't a model release—it was models learning to do things.
Claude Code from Anthropic emerged as perhaps the most transformative developer tool of the year. Not just autocomplete or chat—actual agentic coding. The model reads your codebase, plans changes, executes them, runs tests, and iterates. Developers I've talked to describe productivity gains that felt less like "assistance" and more like "multiplication."
Cursor's Composer followed a similar trajectory, evolving from a clever IDE feature into something closer to a junior engineer you could direct with natural language.
And then there's computer use—models that can see your screen, move your mouse, click buttons, and navigate UIs the way a human would. Anthropic shipped this capability with Claude. Startups like Orgo AI (founded 2025) built entire products around "computers for AI agents"—cloud desktops designed for agent fleets to operate.
The Manus Moment: Why Environments Matter
And then, just yesterday, Meta acquired Manus for over $2 billion.
If you haven't been paying attention to Manus, here's why this matters: they figured out that intelligence alone isn't enough. You need environments.
Manus launched in March 2025, just weeks after DeepSeek shocked the world. Their demo videos showed an AI agent doing things that felt qualitatively different from ChatGPT—screening job candidates from a ZIP file of resumes, planning entire vacations, analyzing stock portfolios, building websites. Not just answering questions. Completing work.
The key insight: Manus gives each task its own virtual computer—a cloud-hosted VM where the agent can browse, click, type, download files, and execute multi-step workflows without human supervision. They've processed over 147 trillion tokens and spun up 80 million virtual computers since launch. They hit $125 million in annual recurring revenue in under a year.
Meta struck the deal in about 10 days. They're integrating Manus into Facebook, Instagram, WhatsApp, and Meta AI. The team of ~100 will report to Meta's COO.
Here's what I think this signals: the next wave of AI isn't just about smarter models—it's about better execution environments. The model is the brain, but it needs hands, eyes, and a workspace. Manus built that layer. Now Meta owns it.
The geopolitics are interesting too. Manus was founded in Beijing in 2022, moved to Singapore this year, and was backed by Tencent and Sequoia China. Meta is severing all Chinese ownership ties post-acquisition. The U.S.-China AI divide just got another data point.
My prediction for 2026: Claude Code and Cursor become standard developer tools. Not optional power-user features, but default workflow. MCP (Model Context Protocol) and similar standards mature. Tool-calling gets faster and more reliable. "Agentic workflow" stops being a buzzword and starts being how work gets done.
My prediction for 2026: Agentic security becomes a field. As more actions online are taken by agents rather than humans, identity verification, rate limiting, and abuse prevention need fundamental rethinking.
METR: The Exponential Curve of Coding Capability
If there's one trend that's held most clearly this year, it's the METR long-horizon task eval for coding capabilities. Model capabilities on software tasks have indeed been doubling roughly every 196 days with Opus 4.5 sitting at the frontier with ~5 hours.
After o3 and GPT-5, I wasn't immediately confident the doubling would continue. Exponentials have a way of hitting walls. But Claude Opus 4.5 and GPT-5.2-Codex have made me a strong believer again—progress in coding capabilities is tracking the exponential remarkably well, and there's no obvious ceiling in sight.

For research, the doubling of task horizon means we can run many more experiments, releasing the bottleneck of writing research code. Hypothesis testing is going to be dramatically sped up. Fully automating research remains a fever dream. It still might truly be "the age of research," just not in the way people expected.
Image Generation: The Year Aesthetics Met Accuracy
Image generation had a massive 2025—bigger than I expected. The tools got better, the trends went viral, and the battle lines between "beautiful" and "accurate" started to blur.

The Ghibli Moment
The cultural moment of the year came in March when OpenAI launched GPT-4o image generation—native image creation built directly into ChatGPT. Within 24 hours, the internet was flooded with Studio Ghibli-style portraits.
Everyone did it. Sam Altman changed his profile picture to a Ghibli-fied version of himself. Memes, historical photos, family portraits—all transformed into that soft, hand-painted aesthetic from Spirited Away and My Neighbor Totoro. The demand was so intense that Altman posted "our GPUs are melting" and they had to delay the free tier rollout.

The irony wasn't lost on anyone: Hayao Miyazaki, Studio Ghibli's co-founder, famously called AI-generated animation "an insult to life itself" back in 2016. Now his studio's style was being replicated at scale by millions of people who'd never touched a paintbrush.
But here's what the Ghibli trend actually showed: style transfer finally works. GPT-4o could take any image you uploaded and convincingly render it in a specific aesthetic—something that previous models struggled with. The photorealism, the prompt-following, the consistency across multiple generations—it was a step change.
The Landscape: Who's Winning What
Midjourney still owns pure aesthetics. If I want something beautiful—concept art, illustration, that ineffable "vibe"—Midjourney V7 remains the gold standard. The images feel designed, not generated. There's a reason professional artists still reach for it first.
But Midjourney has a weakness: text and logos. Ask it to render "Welcome to San Francisco" on a sign, and you'll get something that looks vaguely like letters arranged in a pleasing way. Ask for a logo with specific typography, and you'll spend an hour re-rolling and still end up in Illustrator. It's averaging shapes, not understanding language.

Nano Banana (Gemini 2.5 Flash Image) launched in August and immediately became a viral sensation—the "3D figurine" trend that took over Instagram came from people turning selfies into toy-like renders. Over 200 million image edits in weeks. But the real story was the editing. Upload a photo, describe a change in natural language, and Nano Banana just... does it. Change the background. Swap outfits. Blend two photos together. It maintains subject consistency in a way that felt like magic.
Then in November, Google dropped Nano Banana Pro (Gemini 3 Pro Image), and something clicked: diagrams and infographics that actually make sense. Because the model connects to Google Search for real-world knowledge, you can ask for a chart, a map, a technical diagram, and it gets the facts right. Students started using it to generate complex infographics from raw data. Professionals used it for presentations and training materials. Text rendering in multiple languages. Accurate typography. The "practical work" use case—mockups, posters, data visualization—finally had a champion.
Imagen 3 and Imagen 4 round out Google's arsenal for photorealism and enterprise use. Better text rendering, fewer hallucinations (hands, eyes, weird artifacts), and the kind of professional polish that makes them genuinely usable for marketing and product work.
FLUX.2 from Black Forest Labs emerged as the dark horse. Their multi-reference control is genuinely impressive—you can reference up to 10 images simultaneously and maintain character consistency across generations. Product placement, brand-accurate color matching (hex codes!), complex typography that actually works. FLUX.2 Pro and FLUX.2 Max are production-grade in a way that matters for commercial work: 4MP output, sub-10-second generation, and the kind of spatial reasoning that makes product photography at scale actually viable. The open weights (FLUX.2 Dev) also mean it's become a favorite for fine-tuning and custom deployments.
My prediction for 2026: This gap closes. Design understanding improves dramatically across all models. Text rendering becomes table stakes—everyone will do it well. And crucially, all major models will achieve Midjourney-level aesthetics and customization. The moat around "beautiful images" erodes. The differentiation shifts to speed, control, and integration. And all the components that make up an image can be separated and edited by hand. ( Like Qwen Image Layered )
Midjourney either evolves into a full creative suite or gets squeezed. Meanwhile, image generation gets built directly into design tools like Figma, Canva. Adobe starts to feel redundant.
My prediction for 2026: We will have really good SVG generation. Not just icons and Illustrations, but really detailed animations too that can be used by websites.
The Money: AI Eats Venture Capital
Let's talk numbers.
AI captured nearly 50% of all global venture funding in 2025—up from 34% in 2024. Total invested: $202.3 billion, versus $114 billion the year before.





Foundation model companies alone raised $80 billion (~40% of all AI funding), more than doubling 2024's $31 billion. The poster child: SoftBank's $40B investment into OpenAI, valuing the company at $300 billion—with secondary market chatter suggesting $500B.
Geography concentrated further. 79% of global AI funding ($159B) went to U.S. companies, with the San Francisco Bay Area claiming $122 billion.
But here's what separates 2025 from previous hype cycles: enterprises actually started spending.
According to Menlo Ventures, companies spent $37 billion on generative AI in 2025—up from $11.5B in 2024 and just $1.7B in 2023. That's 22× growth in two years.
This isn't speculative infrastructure betting. This is demand. Companies deploying AI in production, paying for API calls, buying seats.
The Jaggedness of Intelligence
For all the progress in coding and math capabilities this year, there's been a striking lag in how quickly models get adopted and prove useful out-of-the-box for real-world tasks. Models that can answer PhD-level questions still struggle to count how many times two lines cross.
I think as models get larger and more capable, this jaggedness is only going to get more amplified, not less. The spikes will get spikier—superhuman at verifiable tasks like code and math—while the hollows persist in messier domains where reward signals are hard to define and context is hard to paste into a context window.

The raw intelligence is there, but extracting value requires scaffolding that's adapted to specific environments—whether that's agent frameworks or domain-specific integrations. Anthropic's project vend is a good benchmark for this: steady progress, but the failures remain confusingly basic.
My prediction for 2026: We'll see a lot more energy going into the "last mile" problem—not making models smarter, but making them actually usable. Startups that figure out how to smooth the jagged edges for specific domains will capture a lot of value.
My Predictions for 2026
I've scattered predictions throughout this piece, but here they are consolidated. These are raw, unhedged takes. I'll revisit them next December.
Models & Architecture
- A new architecture emerges that isn't just "better transformer." Efficiency pressure forces real innovation.
- GPT 6, Claude 5, Gemini 4 will all still be marginal improvements from what we have today in terms of innovation.
- Cursor ships image/design generation alongside code. Composer 2 becomes the most-used model on the platform. Potentially very good SVG generation.
- Enterprise-specific models proliferate. Companies train on their own workflows, logs, and internal data.
- Unfortunately I do not see open source models catching up to SOTA models.
Agents & Tools
- Claude Code and Cursor become standard, not optional. Agentic workflows go mainstream.
- MCP and tool-calling get meaningfully faster and more reliable.
- Agentic security/identity becomes a funded category.
- Vibe coding crosses the quality threshold. AI-generated code stops being "good enough" and starts being better—more secure, better tested, fewer bugs born from human fatigue or shortcuts. Shipping velocity increases across the board. The scope expands beyond static Next.js apps into complex backends, distributed systems, mobile, infra. "Software is a solved problem" becomes less of a meme and more of a directional truth. The remaining hard problems get smaller and more specialized.
- METR coding capabilities continue doubling every ~196 days, but gains won't come purely from scaling test-time compute with Reinforcement Learning. The exponential curve reshapes hiring for software roles—the repercussions start being felt in 2026. Better coding models don't mean we'll need more engineers to write more code. The bottleneck shifts instead.
- The "last mile" problem gets more attention—not making models smarter, but making them actually usable. Startups that figure out how to smooth the jagged edges of intelligence for specific domains will capture a lot of value.
Platforms & Products
- Apple partners with a major lab for real on-device LLM-powered Siri. Finally.
- Perplexity AI gets acquired or fades—squeezed between Google and ChatGPT.
- Low-code platforms (Lovable, Replit Agent) struggle as users get comfortable with Claude Code and Cursor directly. The abstraction layer becomes less necessary.
Hardware & Physical AI
- More AI hardware Newtons. Interesting failures. The iPhone moment is probably 2027-2028.
- Autonomous vehicles expand. Waymo hits more cities. A major OEM acquires or partners with an autonomy startup like Comma.ai.
- Robotics gets real—not humanoid demos, but practical deployments. Matic vacuums, warehouse bots, agricultural automation. Maybe something intelligent from Dyson that ships to major households.
- If learning from experience and emergent human-to-robot transfer continue to improve with scale, 2026 might finally have all the pieces required for robotics' GPT moment.
Labor & Society
- Junior developer/analyst roles compress dramatically. AI doesn't replace seniors—it replaces the tasks juniors learned on.
- Real economic research on AI's labor impact finally emerges—actual data, not vibes. Expect 2-3 major papers from MIT, Stanford, or IMF economists.
- The junior pipeline problem becomes visible: big companies quietly stop hiring entry-level roles, then realize in 2-3 years they have no one to replace departing seniors.
- Senior devs who refuse to adopt AI tools fall behind—not because AI is smarter, but because AI-native colleagues ship 3x faster. Adapt or become expensive for what you deliver.
- Students who've been vibe coding since freshman year enter the workforce with a completely different relationship to the craft. Less precious about code. Faster iteration. They treat AI as a collaborator, not a threat.
- Startups explode in quantity, collapse in quality. Barrier to building drops to near-zero. The graveyard fills faster than ever.
- Counterintuitively, startups become the primary employers of junior devs—they need speed over polish and will bet on AI-augmented juniors who can vibe code to an MVP. Big companies won't hire juniors; scrappy ones will.
- The hiring conversation shifts from "will AI take jobs?" to "what do we train people for?"
- "Vibe coding" matures into something more reliable. Better feedback loops, better debugging, better design integration.
Geopolitics
- China accelerates. DeepSeek and GLM becomes a household names (especially with z.ai going IPO on Jan 8 2026). The two-track AI world entrenches.
2025 wasn't the year of one breakthrough. It was the year of compounding—where advances in reasoning, voice, agents, and efficiency all reinforced each other, and where the money finally followed the hype.
Fourteen months ago, o1 was novel. Today, Opus 4.5 writes better code than most humans. That's the pace we're on.
The question for 2026 isn't whether AI will keep improving—it will. The question is whether our institutions, businesses, and mental models can adapt as fast as the technology demands.
If 2024 was when AI became real, 2025 was when it became fast. 2026 is when it becomes mainstream.
Acknowledgements
Huge thanks to Marmik Chaudhari for proofreading, editing, and METR insights.
Thanks as well to Devanshi Gupta, Eric Farrall, Ishaan Narang, Idhant Gulati, Dhruva Nagesh, Dhruv Joshi, Manit Garg, Neelima Bayyapu, Claude 4.5 Opus, and GPT-5.2 Pro for their help with review, proofreading, research, and editing.