Why Your AI Policy Needs to Adapt to Rapid Model Changes

OpenAI's latest update highlights a critical flaw in static AI policies. Learn how to build governance frameworks that adapt to rapid model changes.

May 12, 2026

OpenAI pushed a new default model into ChatGPT this week, and the headline change is one that should matter most to the people least likely to notice. According to the company, GPT-5.5 Instant produced 52.5% fewer hallucinated claims than its predecessor in "high stakes" topics like law, finance, and medicine, and reduced inaccurate claims by 37.3% on conversations users had previously flagged for factual errors. That is a meaningful shift for regulated sectors. It is also a shift that arrived without a press tour or a regulator briefing, and without a clear note to the compliance teams who will now be using a different model than they were last week.

Most AI policies still get this wrong.

Most organisations I work with have an AI policy written around a specific moment. Someone got nervous about ChatGPT in early 2023, the legal team produced a document, IT bolted on a few approved tools, and everyone moved on. The policy assumed the model would stay roughly the same.

It does not.

Your team is typing into a different model now than they were last quarter. Sometimes that change is in your favour. Sometimes it is not. Either way, your policy did not vote on it.

The Mashable piece also notes that GPT-5.5 Instant is available to everyone, unlike Claude Opus 4.7 or the full GPT-5.5, which sit behind paywalls. The reason that matters is because the people most likely to be using the free tier are also the people least likely to have an AI governance function watching what changes. A trainee solicitor checking case summaries. A junior clinician drafting patient notes they will, in theory, review. The model under their fingers was upgraded on Tuesday. Nobody told them. Nobody told you.

I am not arguing the upgrade is bad. Fewer hallucinations in regulated work is a real win for anyone who cares about how AI lands in those sectors. The point is that you only get the benefit if your policy is built to absorb the change rather than be destabilised by it.

So what does a policy that does not wobble actually look like?

Write for the tool category rather than a specific version. "GPT-4" should not appear in your policy. "Generative AI models accessed through approved vendors" should. The version will change four times before your next policy review. Bake that in.

Tier use cases by stakes rather than by tool. Drafting a meeting agenda differs from summarising a clinical letter. The model might be identical. The oversight should differ. Keep humans firmly in the loop where harm is possible, and let people delegate the low-stakes admin freely. (At Bykov-Brett Enterprises we run a session walking through the 28 agents we use to handle inbox triage and back-office work, and the throughline is always the same: the safer the task, the more aggressive the delegation can be.)

Subscribe to changelogs the way you subscribe to security advisories. Most providers publish them. Most organisations ignore them. Someone in your team should be reading OpenAI, Anthropic and Google release notes the same way they read CVE alerts. Five minutes a week.

Re-test the high-stakes prompts on a schedule. If the model changed, your benchmark prompts should be re-run. The new version may be better. It may also be worse on the exact edge case you care about. You do not know until you check.

Decide who owns model selection. Right now it is probably nobody, which means it is whoever clicked "accept" on the latest terms. Name a person. Give them the brief.

None of this is exotic. It is the same logic any regulated organisation already applies to suppliers and third-party data processors. The reason it has not arrived for AI yet is that the tools moved faster than the org charts.

The 5.5 Instant rollout is a test of whether your policy treats the model as a contract or as a moving target. If your team is meaningfully safer this week than last and nobody in the building noticed, the upgrade worked. It also means the next one might make things worse without notice, and nobody will notice that either. Pick which of those you would rather be wrong about.

@JamieBykovBrett

Discussion about this post

Ready for more?