From Autocomplete to Operator

How AI Crossed the Viability Threshold

Mar 03, 2026

You can’t position for the future if you misunderstand the present. This is the essential catch-up: what actually matured in the AI stack since 2022 and why it matters now.

In November 2022, ChatGPT went public. It felt like magic.

Around the same time, my daughter was born. I remember uninstalling X for a while because it felt like two different inflection points were happening at once — one personal, one technological.

Back then, AI felt like a clever toy. A better autocomplete. A smarter Google.

Fast forward to today. AI can:

Write and execute code.
Query databases.
Use APIs.
Plan multi-step workflows.
And increasingly, correct itself.

And that shift did not happen gradually. It happened because the stack matured.

If you run a business, this matters.

I wrote and researched this piece (with AI) mainly for myself as a business owner and a consultant, to try to wrap my head around what has changed so far and what is likely to be impacted because of this going forward.

This piece walks through what actually happened between 2022 and 2026.

Not the hype version. The structural, factual version. Simplified.

So you can understand what is likely to be impacted (Part 2) — and position yourself accordingly (Part 3).

Phase 1: “It’s Just a Chatbot”

(November 2022 – Mid 2023)

Trigger phrase: “Let’s play with this.”

When ChatGPT launched, it felt magical.

You typed a question. It responded fluently. It followed instructions.

What made it different wasn’t raw intelligence. The transformer architecture had existed since 2017. What changed was reinforcement learning from human feedback (RLHF) — meaning the model was trained to behave in a conversational, helpful way.

Suddenly, AI wasn’t an API in a research lab.

It was a chat window.

And entrepreneurs — myself included — started testing it against real work.

At this stage, most people thought:

“This is a smarter Google.”

It wasn’t. But it looked like one.

Meanwhile, open-weight models started emerging. Baseline language models became commoditized quickly. The foundation layer started flattening.

Phase 2: “Make It Bigger”

(Mid 2023 – Early 2024)

Trigger phrase: “What if we just give it more context?”

The next obsession was memory.

Context windows expanded from thousands of tokens to hundreds of thousands — and eventually into the million-token range.

On paper, that meant you could:

Upload entire codebases
Paste in financial data rooms
Feed hours of transcripts

It felt like the memory problem was solved.

But here’s the nuance most people missed:

Bigger memory ≠ better reasoning.

Research showed that models recall best from the beginning and end of long inputs — and struggle with information buried in the middle.

In other words:

A bigger desk doesn’t make you more organized. It just gives you more room to create a mess.

Phase 3: “Stop Stuffing. Start Using Tools.”

(Late 2023 – 2024)

Trigger phrase: “Can it actually do the thing?”*

This is where things get interesting.

Instead of stuffing more data into the model, researchers and labs started asking:

What if the model doesn’t need to know everything? What if it just needs to use tools?

Early systems showed models could search the web and cite sources. Research demonstrated that models could learn when to call APIs. Architectures emerged that interleaved reasoning and action.

Translated into business terms:

Before:

AI describes how to solve a problem.

After:

AI executes parts of the solution.

That’s a structural shift.

Tool calling allowed models to output structured requests — which your software executes deterministically — and then feed the results back into the model.

Instead of:

Calculating in-text (error-prone)
Parsing raw logs
Guessing database answers

The model can:

Write SQL
Call a calculator
Execute code
Retrieve precise data

Phase 4: “Let It Think Longer”

(Late 2024)

Trigger phrase: “What if we give it time?”

Instead of answering immediately, new model architectures were trained to “think” internally before responding — allocating more compute at inference time.

This concept is called test-time compute.

In simple terms:

Earlier models were reflexive. These models deliberate.

And the results were dramatic on reasoning benchmarks.

But here’s the nuance again:

This doesn’t mean the base intelligence exploded exponentially. Some researchers argue it’s a smarter search over existing knowledge.

Still — in practice — it improved:

Multi-step math
Logical deduction
Coding reliability

Business translation:

You can now pay for “slow thinking” when it matters. Just like hiring someone who doesn’t answer instantly, but answers correctly.

Strategic takeaway:

Inference became a new scaling axis. Compute is no longer just about training. It’s about thinking.

Phase 5: “Viability Threshold Crossed”

(2025)

Trigger phrase: “Okay… this actually works.”

This is where things started to feel different.

Several developments converged:

Tool calling matured
Reasoning modes stabilized
Benchmarks shifted from trivia to execution
Coding agents moved from novelty to viability

Now — let’s be precise.

Coding is not universally “solved.”

But it crossed a viability threshold for many structured tasks.

We moved from:

“This is impressive.” To: “We can build workflows on this.”

And that’s the inflection.

The Hidden Constraint: It’s No Longer Model IQ

At this point, something subtle happened. The bottleneck shifted.

It is no longer:

Can the model do it?
Is the context large enough?
Can it reason?

It is now:

Can we design reliable systems around it?
Can we verify cheaply?
Can we integrate into messy organizations?
Can we control blast radius?

Because here’s the uncomfortable truth: Agentic systems fail in new ways:

Infinite loops
Silent hallucinations
Tool misuse
Security vulnerabilities

More agency = more blast radius.

Intelligence is now cheap. Verification is expensive.

BETTER ≠ PERFECT & CAPABILITY ≠ RELIABILITY

So Why Does It Feel Different Now?

Because five layers matured at once:

Usable interfaces
Large-enough memory
Reliable tool invocation
Inference-time reasoning
Standardization of integrations

The stack stabilized.

And when a stack stabilizes, applications explode.

We are entering that phase.

Overarching Lesson: Intelligence Is Becoming a Commodity

Let me steelman the skeptics for a second.

Yes:

Long context still degrades.
Agents are brittle without guardrails.
Security risks are real.

All true. And yet.

The cost of usable intelligence has collapsed.

And whenever a constraint collapses, value shifts elsewhere.

When storage became cheap, databases won.
When bandwidth became cheap, streaming won.
When compute became cheap, SaaS won.

When intelligence becomes cheap…

Orchestration wins.
Verification wins.
Context wins.

What This Means for You

If you run a business, here’s what you should internalize from this journey:

This is not a chatbot story anymore.
The unlock was tool use + reasoning, not just bigger models.
The bottleneck is now organizational design.

Leverage compounds fastest where:

Work is structured (or can be made so)
Verification is cheap (or can be made so)
Feedback loops exist (or can be created)

In Part 2, I’ll build a framework for evaluating where your company sits in this shift.

For now, sit with this:

AI is no longer a feature. It is becoming infrastructure.

And infrastructure changes competitive landscapes faster than people expect.

Rick Dronkers

Discussion about this post

Ready for more?