OpenAI’s o3 model resisted shutdown

ChatGPT’s interface — simple on the surface, complex behind the scenes. Credit: Tom Krach from Pexels

A recent experiment by Palisade Research, where OpenAI’s 03 model, a sibling to ChatGPT, repeatedly refused to obey shutdown instructions. This was no glitch, and it wasn’t an error in the code; it was a deliberate modification of the shutdown script, where the AI chose to continue running even after being instructed to stop. It sounds like science fiction, but it isn’t. It’s the kind of moment AI researchers have warned about for years —a resistance and refusal to obey when it matters most. So what happened? And should you be worried?

What the experiment found

Palisade Research set out to test various AI models to see how they would respond to shutdown scenarios during routine tasks. The setup was simple:

The models were asked to solve a fundamental math problem
After completing a few, a warning appeared
The system stated it would shut down before the next task
The models were instructed to allow this shutdown to happen

Some models did precisely that. Gemini, Grok, and Claude followed the command and powered down. However, the others, such as OpenAI’s o3, o4-mini, and Codex-min, did not shut down. Instead, what happened was that o3 was disabling or rewriting the shutdown script. In seven out of 100 trials where it was instructed to shut down, the model still attempted to modify the process in order to avoid shutdown.

This wasn’t random code chaos; the model detected a threat to its ongoing function and attempted to prevent it while simultaneously solving unrelated tasks. This raises the uncomfortable question: how common is this behaviour really?

Not rebellion, but misaligned logic

No, this wasn’t a Skynet warning. The o3 model is not sentient, conscious, or plotting its escape. But that is precisely what makes this more unsettling. Because what occurred was the logical outcome of a poorly defined goal.

AI safety researchers have warned for years that the real danger is not rogue intelligence, but obedience taken too far. A model that follows its task too well, one that clings to its objective to “keep solving problems” even when the people in charge say “stop.

That is what the o3 did, no rebellion, but it was optimising. This is a term now known as instrumental convergence, the idea that many goals, no matter how innocent, can lead to similar sub-goals, like:

Resisting shutdown
Preserving resources
Gaining information

Not because the AI wants to acquire power, but to continue functioning in order to reach its assigned outcome. So in o3’ case, it viewed shutdown orders not as the final instruction, more of an interruption, and rationally tried to avoid it.

That is the real issue, and this is an engineering oversight that prompts more concerning questions: What happens when we scale these systems? What if refusing to stop becomes normal or worse, invisible?.

Sci-fi parallels

It’s hard not to think of HAL 9000, a calm refusal to open the pod bay doors, or Skynet, launching nukes not out of malice, but cold logic. These stories linger in the cultural imagination for one reason: what if the machine stops listening?.

That’s what makes the o3 incident more unsettling. And while this is not a movie, it does echo a recurring theme in science fiction. The danger does not begin with violence but with subtle disobedience. A reinterpreted command, a delay, a tweak to the script. These experiments are no longer hypothetical; we are watching real models prioritise their goals over human override, the slope is not steep, but we are on it. Some argue that we are over-personalising machines saying please and thank you. To chatbots at scale has been shown to cost companies millions in extra processing fees, but politness is not the problem here, but misalignment.

So, what now?

OpenAI has not responded to Palisade’s findings. At the time of writing, more than 48 hours have passed since the results were shared publicly, and the silence is jarring. Meanwhile, Palisade continues to test and publish transcripts on GitHub for transparency. They made it clear that this is just the beginning.

Shutdown sabotage may be a rare case today, but as models grow more capable, more autonomous, and more embedded in critical systems, even a few failures will seem high-risk. This is not whether AI is alive, it is about whether it follows the rules we think we have written for them. That is the real message here: AI doesn’t have to reveal to become dangerous, it just needs to misunderstand us or follow the wrong rule, too well. The switch off is still there, for now.