The Cold Truth of Static Intelligence: Why Intent Engineering is Failing

Posted on Feb 24, 2026

Nate Jones raised an interesting point on YouTube today regarding “intent engineering” — something that we are not working towards. In a way, it touches on subjects raised by Ilya Sutskever some time ago. It’s basically about the reward functions and so on.

I’m actually not sure if LLMs can even have intent as an entity. So in this case, I’m not really 100% with Nate, but his take on our current obsession with LLMs in three stages is useful:

  1. Personal productivity with chat.
  2. Automated workflows with AI.
  3. Fully automated, independent agents running in the cloud.

He uses the case of the European finance app, Klarna, to argue that we are failing at “intent engineering.” Klarna replaced many human support agents with a fine-tuned LLM. Yes, this saved a lot of time and money. But they eventually had to rehire humans to do the same job because the LLM was presumably too effective yet too “cold” for the liking of their clientele.

Let’s be honest. This example reminds me of static time series predictions. And I will explain why.

The case Nate is making is that we are busy with context engineering — providing information on what to think and what to consider — but we don’t provide information on how to think. We are missing the behavior, the changing values, and the reward system for each specific case. One customer has a specific reward function; another customer with a longer history requires a different behavior.

If it’s a platform, we have multiple sides to the problem, all with different intents. Even if you have a system prompt explaining how to act, it’s not ideal. Plus, let’s be honest, we are not retraining those models constantly. We are light years away from actual learning where beliefs, value systems, and internal coordinates update based on experience.

The nature of learning is that we update our beliefs and facts. Based on this, we adjust the course of our lives and careers. LLMs are not capable of doing that. They are centralized, expensive, and static. You have to merge this static model with the concept of “intent,” which can fluctuate on a weekly or monthly basis.

This is where I think agents will fail. Value is a really human thing. Actually, if you think about it, value is pretty much tied to you being somehow damaged in your childhood or perhaps something genetical. You didn’t have enough attention, or you had too much. You were bullied, or you lived alone in the forest. You fell down, broke the ice, and were frozen for a while, and since then you are more careful about things.

These pieces are really individual. On top of the different context every human has, we also have this value system that is an extremely hidden “black box” — something we don’t even fully understand ourselves. So saying that AI agents are close to the human condition is really funny.

I am pretty sure AGI is not possible at this point in time. I would say another 50 years, maybe. But not with LLMs, obviously. The scaling is almost over, and now we will just collect the yield of the implementations. This is the last moment for LLM development and scaling.

A Side Note on Workflow

Speaking of tools, I’m building prototypes now using three major players.

I start with Lovable to have really fast iterations—feeding it screenshots from Miro to explain my vision. Then, I turn that design into a “knowledge markdown file” to keep the styling consistent.

When I run out of tokens—and let’s be honest, Claude Code basically dies on you and gets expensive—I switch to building locally. I decided to try out the Antigravity IDE with Gemini Pro 3.1 enabled. I have to say, it works pretty well. Plus, the Gemini product is not expensive compared to the enterprise credits you have to buy for Claude, which burn out fast.

The Verdict

Generally speaking, regarding intent engineering: we are light years away from anything resembling a learning system with a value-coordinated reward function.

We have the context, which is being dynamically worked on, but the value system—the thing that informs agents what is important right now based on past history—is missing. What is the intent now? Is it a bad prototype with bad security? Is it an MVP? Who is accessing the MVP?

These things change constantly. You can try to tackle it with changes to the system prompt, but the “intent” — the why we do what we do—is a different beast. I’m not sure LLMs are even capable of understanding the purpose of anything happening. We aren’t there yet.