@JamieBykovBrett

The Most Expensive Mistake in Immersive Technology Right Now

Jamie Bykov-Brett — Thu, 02 Jul 2026 15:23:23 GMT

The most expensive mistake in immersive technology right now is a decision made before anyone puts a headset on. Organisations get access to a powerful new tool, and the first thing they do with it is rebuild the office. Virtual meeting rooms. Digital whiteboards. Avatar town halls held around a conference table that looks suspiciously like the one down the corridor. The interface is new. The work underneath is exactly the same, inefficiencies included.

If that pattern sounds familiar, it is because it is the same one I keep seeing with AI. A team gets a capability that could change what is possible, and they use it to do the old thing slightly faster. The tool works perfectly. The business case still disappoints. Nobody can explain why.

Rev Lebaredian, a vice president at NVIDIA, put the useful frame on it when he described using simulation to validate an entire product lifecycle before committing a single atom to the real one. That is where immersive tech earns its keep. It lets you test a factory layout or rehearse a high-stakes procedure before you spend money or take on real-world risk. It does not earn its keep by moving your Tuesday morning stand-up into a virtual room. People will not strap on a headset to sit somewhere they could have walked to.

This keeps happening because replication feels safe and reinvention feels risky. The same UC Today piece identifies three structural limits that trap enterprise programmes, and they are worth naming plainly because each one has a leadership decision hiding inside it.

Replication bias

Teams design the virtual environment to mirror the process they already run, rather than asking which steps could disappear entirely. If your current approval workflow has six handoffs, you now have six handoffs in 3D. You have re-skinned the work and paid a premium for the privilege.

Siloed ownership

IT, learning and development, and operations each run their own separate immersive project with no shared model of how the work actually flows. So the same disconnected process gets rebuilt three times in three different virtual spaces. Nobody owns the whole, so nobody redesigns the whole.

The wrong scoreboard

When success is measured by attendance and engagement, transformation is not on the menu. Nobody gets promoted for removing a step. If your metrics only reward people for showing up and clicking around, they will optimise for showing up and clicking around.

There is a commercial force pushing all of this along too. Many collaboration platforms are deliberately built to reduce change-management friction, which is a polite way of saying they make it easy to do what you already do in a new costume. That is sensible for the vendor trying to close a sale. It is close to useless if you actually want the work to change.

The technology almost certainly works. So the honest question for any leader at the evaluation stage is whether your pilot is built around what the medium uniquely enables, such as spatial memory and embodied presence, or whether you have simply made a video call more expensive.

That is a governance question as much as a technology one, and it is the kind of clarity that gives leaders the confidence to spend well rather than spend nervously.

One thing to try this week: take your proposed immersive pilot and ask a single question of it. If we did this in the real office, would anyone notice a difference beyond the graphics? If the honest answer is no, you are redecorating the work instead of redesigning it. Send the brief back and start again from the problem, not the room.

Frequently Asked Questions

Why do so many XR workplace projects fail to deliver value?

Most fail because they recreate existing office processes in virtual space rather than redesigning the work itself. The technology functions perfectly, but the underlying workflow, including its inefficiencies, stays identical. As the UC Today analysis puts it, these programmes re-skin work instead of redesigning it, which produces an impressive demo and a weak business case.

What is replication bias in immersive technology?

Replication bias is the tendency to build virtual environments that mirror processes you already run, instead of asking which steps could be removed entirely. A six-step approval process simply becomes a six-step process in 3D. You have added cost and a new interface without improving how the work actually gets done.

When does XR genuinely add value to an organisation?

XR adds value when it removes the cost and risk of committing to something too early. Useful examples include validating a factory layout or rehearsing a high-stakes procedure before real resources are spent. It adds little value by hosting routine meetings in a virtual room.

How should leaders measure success in an immersive programme?

Measure whether work has actually changed, not attendance or engagement. When those are the primary metrics, nobody is rewarded for removing a redundant step, so transformation never happens. Track outcomes like decisions made faster or steps eliminated, so that reinvention rather than participation becomes the thing people are credited for.

What single question should I ask before approving an XR pilot?

Ask whether anyone would notice a difference beyond the graphics if the same activity happened in the real office. If the honest answer is no, the pilot is redecorating work rather than redesigning it. Send the brief back and rebuild it around the problem you are trying to solve, not the room you are trying to copy.

When AI Triples Engineering Output Judgement Becomes the Real Bottleneck

Jamie Bykov-Brett — Wed, 01 Jul 2026 07:23:21 GMT

Anthropic recently told its growth team to hire more product managers, not fewer. That sounds backwards. The whole pitch of AI coding tools is that you need fewer people to ship software, so why would the company behind one of the best of them be adding staff to the side of the house that decides what gets built? The answer is the most useful thing in this story, and most organisations are about to learn it the slow way.

Claude Code had quietly raised the output of Anthropic's engineering team to roughly three times its headcount, and at that point the constraint stopped being the writing of code and became the deciding of what to build. As the reporting puts it, the bottleneck moved from the integrated development environment to the people deciding what to build. In plain terms: when typing is no longer the hard part, judgement becomes the scarce resource.

You can see the shift in one striking number. New monthly questions on Stack Overflow, the site a generation of engineers used to get unstuck, are down roughly 77% since November 2022, the month ChatGPT launched. The model did not just answer questions faster. It absorbed an entire step of the old workflow. And the compression kept going. One AWS engineering team reportedly took an 18-month rearchitecture originally scoped for 30 engineers and finished it with 6 people in 76 days. The hard part was never how long the code took to write. It was how clearly the team could describe what "correct" looks like.

This is the bit worth slowing down on, because it is easy to read a 3x productivity claim and reach for the wrong conclusion. The instinct in many boardrooms is to treat a tool like this as a headcount lever: same output, fewer people. But the honest reading is different. The machine did the machine-like work, the repetitive translation of intent into syntax, and it did it better than people ever could. What it did not do was decide what was worth building, who it was for, or what trade-off to make when two good options collided. Those are human questions, and there are now three engineers' worth of them landing on roles that have not grown at all.

I have watched this pattern in organisations that have nothing to do with software. Automate the drudgery and you do not remove the work, you relocate it. The new pressure point lands on the people who set direction, weigh consequences, and own the call. If you give a team powerful tools but weak clarity of intent, you do not get three times the value. You get three times the speed at building the wrong thing. Poor thinking plus powerful tools simply means faster harm.

So the practical question for any leader rolling out AI coding tools is not "have we trained people to use it?" That is the easy half. The harder half is whether you are investing in product thinking, prioritisation, and the messy skill of deciding what good looks like before anyone builds it. A training programme aimed purely at the tool answers a question that is rapidly becoming the cheap part. The expensive part, the one that will separate teams over the next two years, is judgement.

There is a fairness dimension here too, and it deserves saying plainly. When the constraint moves from execution to decision-making, the people who already had a voice in what gets built gain even more leverage, and the people who were heads-down executing risk being left behind unless they are deliberately brought into the thinking. Augmentation reshapes a role, but who gets reshaped upward and who gets quietly sidelined is a leadership choice, not an accident of the technology.

One thing to try this month: before your next AI tooling rollout, audit where your decision-making capacity actually sits. Count the people who can confidently define what to build and why, not just how. If that number has not grown while your build speed has tripled, you have found your real bottleneck, and it is not the software.

Frequently Asked Questions

Does Claude Code actually make engineers three times more productive?

Roughly, yes, according to Anthropic's own experience, where Claude Code lifted engineering output to about three times the team's headcount. The figure reflects how much faster code now gets written, but the more important effect is that it shifts the bottleneck from writing code to deciding what to build. Treating the number purely as a headcount saving misses that point.

Will AI coding tools replace software engineers?

No, but they change the shape of the job. AI tools now handle much of the repetitive translation of intent into working code, which means engineers spend less time typing and more time on strategic decisions about what to build and why. The role moves towards product thinking, orchestration, and judgement rather than disappearing.

Why are companies hiring more product managers if AI is doing the coding?

Because when code gets cheap to produce, the scarce resource becomes deciding what is worth producing. Anthropic told its growth team to hire more product managers, not fewer, because three times the engineering output created a backlog of decisions about direction, priorities, and trade-offs that the existing roles could not absorb.

What skills matter most for engineers in an AI-assisted workflow?

Judgement, prioritisation, and the ability to clearly define what "correct" looks like before anyone builds it. The mechanical skill of writing syntax is increasingly handled by tools, so the differentiator becomes clarity of intent: knowing what to build, who it serves, and which trade-off to make when two reasonable options conflict.

How should leaders respond when AI tools triple their team's build speed?

Audit where decision-making capacity actually sits, not just whether people can operate the tool. If build speed has tripled but the number of people who can confidently define what to build and why has not grown, the bottleneck has simply moved. Investing in product thinking and prioritisation matters more than another tool tutorial.

Why OpenAI Is Limiting Access to Its Most Capable Models

Jamie Bykov-Brett — Tue, 30 Jun 2026 07:28:21 GMT

When a company builds the most capable version of a product it has ever made, you would expect it to want as many paying customers as possible. OpenAI has done the opposite. Its newest models are going to roughly 20 organisations to start with, and the company is fairly open about why: it shared the models and release plans with the U.S. government first, and is "starting with a limited preview for a small group of trusted partners" at the government's request.

That is the real story. The model is interesting. Who gets to touch it is more interesting, and for anyone planning around this technology, it is the part that actually changes your decisions.

Here is the quick version of what was announced. GPT-5.6 comes in three flavours.

→ Sol is the heavyweight, built for hard problems like long coding sessions and security work.

→ Terra is the workhorse for high-volume business tasks such as customer support and document analysis.

→ Luna is the cheap, fast one for everyday jobs like drafting and summarising.

The pricing is tiered to match: Sol costs more, Luna costs least, with Terra in the middle. (Pricing is quoted "per million tokens", a token being roughly a chunk of a word, so it is essentially a meter on how much text the model reads and writes.)

The tiering matters for a practical reason. If your team built its cost projections on a single price for "the OpenAI model", that assumption is now out of date. You are budgeting against a menu, not a flat rate, and the temptation will be to reach for the top tier when the middle one would do.

Most organisations I work with overspend not because the tools are expensive but because nobody asked which job actually needs the expensive model.

Now back to the access question, because this is the genuinely new thing.

The staggered rollout follows an executive order issued on 2 June 2026 that asks federal agencies to build a process for checking new AI models before wide release. That review was meant to take 30 days, which lands the broader launch around early July. OpenAI is coordinating its release with the White House rather than simply switching the models on for paying customers.

It is worth understanding why this is happening now. The same report notes the U.S. government took the drastic step of issuing an export control order against Anthropic, OpenAI's main rival, over jailbreaks found in one of its most powerful public models. So the gating is not theatre. There is a real recent example of a frontier model being pulled because it could be pushed into doing things it should not. Government-coordinated previews are the response.

If you are a leader trying to plan, three things follow from this.

First, your timeline is no longer fully in your hands. You cannot sign an enterprise agreement and start testing on day one. Access now depends partly on whether your sector and your organisation count as a "trusted partner", and nobody has published a clear definition of what that requires. Worth asking your vendor relationship lead to find out what that status actually involves and whether you qualify.

Second, the bottleneck is shifting from capability to readiness. When everyone eventually gets the same models, the advantage will not come from access. It will come from the organisations that already know which tasks to point them at, that have trained their people, and that can measure whether the thing is saving real hours rather than generating impressive demos. I have watched a six-month upskilling programme move a group of non-technical staff to daily AI use and save them a few hours each per week. None of that came from having the newest model. It came from knowing what to do with the one they had.

Third, plan for a world where safety review is a permanent feature, not a one-off. Real-time interventions and compliance parameters are now part of the deal. That is not a reason to wait. It is a reason to get your own house in order while the queue is still forming.

The newest model is not the prize. The capability to deploy it well is. One thing to do this week: write down the five tasks in your organisation you would hand to a model tomorrow, and rank them by hours saved, not by how clever they sound. If you cannot fill that list, the model was never your blocker.

Frequently Asked Questions

What are the GPT-5.6 Sol, Terra and Luna models?

They are three variants of OpenAI's GPT-5.6 family, each tuned for a different job. Sol handles the hardest work like complex coding and security research, Terra is built for high-volume business tasks such as customer support and document analysis, and Luna is the fast, low-cost option for everyday jobs like drafting and summarising.

Why can't my organisation access GPT-5.6 yet?

Because OpenAI is releasing it first to roughly 20 trusted partners after sharing the models with the U.S. government. The limited preview follows a June 2026 executive order asking federal agencies to assess new AI models before wide release, with a broader launch planned for the weeks after that review concludes.

How is GPT-5.6 priced across the three models?

It uses tiered pricing by capability, charged per million tokens of text the model reads and writes. Sol is the most expensive at the top tier, Luna is the cheapest, and Terra sits in the middle. The practical effect is that you are now budgeting against a menu of prices rather than one flat rate.

What does "trusted partner" status actually require?

There is no published definition yet, which is the problem for planners. Access currently depends on whether your organisation and sector are included in the government-coordinated preview. The sensible move is to ask your vendor relationship lead what the status involves and whether your sector is likely to qualify.

How should leaders prepare while access is restricted?

Focus on readiness rather than waiting for the model. Identify the specific tasks worth handing to AI, rank them by hours saved, train your people, and put measurement in place so you can prove real value. When everyone eventually gets the same models, the advantage will come from knowing how to deploy them, not from access.

Your Team Is Losing A Full Day Every Week Babysitting AI

Jamie Bykov-Brett — Mon, 29 Jun 2026 17:28:22 GMT

When you ask people what they do with the time an AI tool saves them, the most common answer is not "leave early" or "check my phone". According to survey respondents in a recent AI Work Institute report, it is "improve the quality of my work". That sounds like the dream outcome. The awkward part is that, looking across whole organisations, that quality improvement is not actually showing up.

So where is the time going? A good chunk of it goes into something the report calls "botsitting": the hours employees spend checking, correcting and second-guessing what the machine produced. You give someone a tool that drafts a report in ninety seconds, and then they spend forty minutes making sure it has not invented a statistic, misread the brief, or written something they would be embarrassed to put their name to. The work got faster. The job did not.

I see this constantly with teams who roll out a shiny AI tool, watch the login numbers climb, and declare victory. Adoption happened, so the assumption is that productivity followed. But usage data only tells you people opened the thing. It tells you nothing about whether the hours after they opened it were spent well. A team can be fully "adopted" and quietly slower than it was a year ago, because nobody is measuring how the time is actually spent, only whether the tool is being touched.

There is a second leak the report names that is worth knowing about: the "AI toggle tax". This is the friction of jumping between several AI tools to get one job done, with each handover creating more output that nobody has properly verified. When people are juggling a writing assistant, a summariser, a coding helper and a meeting tool, the cracks between them fill up with unchecked work. And as that tool sprawl grows, something more worrying happens. People start to cognitively offload, which is a polite way of saying they stop thinking and let the machine decide, because keeping up with all of it is exhausting.

This is where I tend to get firm with leaders. The problem here is not the technology. The problem is the absence of governance around it. Governance sounds like a dry word for committees and policies, but in practice it means something simple: deciding who is accountable for AI output, what "good enough to ship" looks like, when a human must check the work and when they genuinely do not need to, and how you will know whether any of this is paying off. Hand people powerful tools without that scaffolding and you get exactly what the report describes. Faster production of work nobody fully trusts.

I often put it like this. Machines machine better than people ever could. The danger is when we let the machine do the thinking too, and then spend our newly freed hours nervously babysitting its homework. Poor thinking paired with a powerful tool does not save time. It just produces harm more quickly, with a confident tone and a clean layout.

The fix is less dramatic than most AI strategies. Start measuring the right thing. Not "how many people used the tool this month", but "what did people do with the time it saved, and did the quality of the end result actually improve". If you cannot answer the second half of that question, you do not have a productivity gain. You have a hunch and a subscription cost. Decide, deliberately, which tasks are safe to fully delegate to AI, which need a human in the loop, and which should never have been automated in the first place because the judgement involved is the whole point of the job.

One thing to try this fortnight: pick a single team that adopted an AI tool, and instead of asking them how often they use it, ask them how long they spend correcting it. The honest answer will tell you more about your AI return on investment than any usage dashboard. If the botsitting hours are quietly cancelling out the time saved, that is not a failure of the tool. It is a gap in the governance, and that gap is yours to close.

Frequently Asked Questions

What does "botsitting" actually mean?

Botsitting is the time employees spend checking, correcting and second-guessing AI output instead of doing other work. The AI Work Institute report uses it to explain why expected time savings from AI tools often fail to appear: the hours saved on production get eaten by the hours needed to verify the result is trustworthy.

Why aren't we seeing the productivity gains AI promised?

Often because the time AI frees up is being absorbed by verification, tool-switching and rework rather than higher-value tasks. Survey respondents in the AI Work Institute report said they mostly used saved time to improve quality, yet organisations are not seeing that quality improvement materialise, which suggests the gains are leaking out somewhere along the way.

What is the "AI toggle tax"?

The AI toggle tax is the productivity drain caused by employees switching between multiple AI tools to complete a single job. Each handover between tools generates more output that nobody has properly verified, and as tool sprawl grows, people start offloading their thinking to the machines rather than reviewing the work carefully.

How do I tell whether my team's AI adoption is actually productive?

Measure what people do with the time AI saves them, not just whether they log in. Usage data only proves adoption happened. To know whether it is productive, ask how long people spend correcting AI output and whether the quality of the final result genuinely improved. If you cannot answer that, you have a cost, not a confirmed gain.

Can governance really fix the AI time-savings problem?

Yes, because the root issue is usually missing governance rather than a weak tool. Good governance means deciding who is accountable for AI output, what "good enough to ship" looks like, when a human must review work, and how you will measure the return. Without that structure, powerful tools simply produce untrusted work faster.

AI Adoption Without Training Is Scaling Mistakes, Not Results

Jamie Bykov-Brett — Fri, 26 Jun 2026 14:57:07 GMT

AI adoption without training is scaling mistakes, not results

Most organisations measure an AI rollout the way you might measure a gym membership: by the number of people who signed up. The licences are activated, the dashboard glows green, and someone in a leadership meeting reports that adoption is going well. Then you look at what people are actually doing with the tools, and the picture turns out to be far less reassuring.

A new Skillsoft study puts numbers to that gap. Surveying 2,000 employees, managers, and executives in early 2026, it found that 86 per cent of employees now use AI tools at work, but only 24 per cent feel fully equipped to use them effectively. Read those two figures together and you get the real story of the year. Almost everyone is using these tools. Most of them are guessing.

The detail that should worry leaders most is buried a little deeper. The same research found that only 16 per cent of employees receive training before a new AI tool is introduced, and that 77 per cent of leaders still believe they have set their people up to succeed. That is a 53-point gap between what the boardroom believes and what the workforce lives. When the people steering and the people rowing disagree that sharply about whether the boat is seaworthy, you have a problem that no amount of extra licences will fix.

Here is what actually happens when a tool lands without a foundation. People do not stop and ask for help. They improvise. They paste a clumsy prompt, get a mediocre answer, and then spend twenty minutes correcting it, which is slower than if they had done the task by hand. Mark Onisk of Skillsoft calls this rework: AI layered on top of misunderstood data amplifies the noise and produces outputs that need fixing. Multiply that across a few thousand employees and you have not bought yourself speed. You have bought yourself a faster way to be wrong.

I have spent years inside these rollouts, and the pattern is depressingly consistent. The problem is rarely the technology. It is that the technology was dropped into a workflow nobody redesigned, handed to people nobody prepared, governed by rules nobody wrote. Fewer than one in ten employees in the survey said their organisation had comprehensive AI governance in place. So you have people pasting sensitive customer data into tools they do not fully understand, hoping for the best. The HR specialist Sophie Bretag names the quieter cost too: AI-generated emails that land as cold and rude, draining the humanity out of communication one cut-and-paste message at a time.

If you recognise your own organisation in this, the temptation is to feel you have already lost. You moved fast, you scaled, and now you suspect you scaled the wrong habits. That instinct is uncomfortable but useful, because the alternative belief, that adoption equals progress, is the one that got everyone here. Machines machine better than people ever could. The work that is now genuinely valuable is the human work: knowing which problem to point the tool at, judging whether the answer is any good, and deciding what should never be automated at all. None of that is downloadable. It has to be taught, practised, and led from the front.

Remedial training after a botched rollout is real, and it is more expensive and more awkward than getting it right at launch, because you are now untraining bad habits as well as teaching good ones. But it is recoverable. The fix is unglamorous: start with the problem you are trying to solve, not the tool you bought. Define who is accountable when an output is wrong before anyone presses go. Train people on judgement, not just buttons.

One thing to try this week: stop measuring adoption by licence activation. Pick one team, one workflow, and ask what changed in the work itself. If the honest answer is nothing, you have your starting point.

Frequently Asked Questions

Why does AI adoption without training make things worse instead of better?

AI adoption without training scales mistakes because people use powerful tools without knowing how to use them well, so errors multiply faster than results. The Skillsoft research found 86 per cent of employees use AI at work but only 24 per cent feel equipped to use it effectively. The result is rework: outputs that look finished but need fixing, which often costs more time than the tool saves.

What is the gap between what leaders believe and what employees experience?

There is a 53-point gap: 77 per cent of leaders believe they have set employees up to succeed with AI, while only 24 per cent of employees feel fully equipped. This disconnect means leadership often reports healthy adoption while the workforce quietly struggles. It matters because decisions about scaling and investment get made on the optimistic boardroom view rather than the reality on the ground.

Should training come before or after rolling out an AI tool?

Training should come before the tool is introduced, not after, yet only 16 per cent of employees receive training in advance. Front-loading training is cheaper and easier because you are teaching good habits from the start. Remedial training after a botched rollout costs more, since you have to untrain bad habits as well as teach the right ones.

What are the hidden risks of using AI without proper governance?

The biggest hidden risks are data breaches and a loss of human warmth in communication. Fewer than one in ten employees say their organisation has comprehensive AI governance, leaving people to paste sensitive data into tools they do not understand. There is also a softer cost: cut-and-paste AI emails that land as cold or rude, eroding trust between colleagues and customers.

How should leaders measure whether an AI rollout is actually working?

Measure the change in the work itself, not the number of licences activated. Pick one team and one workflow and ask honestly what is different now compared with before the tool arrived. If nothing has genuinely changed, adoption is cosmetic. Real progress shows up as redesigned tasks, clearer accountability for outputs, and people who can judge when an AI answer is good enough to trust.

Your AI Coding Bill Is About to Cost More Than Your Developers

Jamie Bykov-Brett — Fri, 26 Jun 2026 07:42:12 GMT

Here is the line from Gartner that should make any leader put down their coffee: by 2028, the money your company spends on AI coding tools could overtake what you pay the average software developer. Not approach it. Overtake it. And the reason is not that the tools got dramatically better. It is that almost nobody is watching how much they cost to run.

Most people picture AI tools the way they picture software: pay a fixed licence, use it as much as you like. That is not how this works anymore. AI coding assistants charge by the "token", which is roughly a chunk of text the model reads or writes. Every question you ask, every file the tool reads to understand your code, every answer it generates, all of it burns tokens, and tokens cost money. The more your team leans on the tool, the bigger the bill. There is no flat fee protecting you.

That shift from fixed to usage-based pricing is the whole story. Gartner predicts AI coding costs will overtake the average developer's salary by 2028, driven by a surge in token consumption as companies move from a few people experimenting to whole teams relying on these tools daily. The pattern is predictable. Light users become heavy users. Heavy use becomes the default. Spend climbs quietly in the background while everyone celebrates how much faster the work feels.

And the token bill is only the cost you can see. There is a second one arriving behind it. When AI writes a great deal of code quickly and nobody on the team fully understands how it works, you have bought something cheap to generate and expensive to own. Months later someone has to debug it, extend it, or explain why it does what it does, and that is slow, costly human work. Fast code you do not understand is a loan, not a saving, and the repayment lands long after the demo that impressed everyone.

And here is the human part, which is the part I find most honest in the Gartner analysis. The problem is not greedy engineers. It is that people, sensibly, optimise for getting their work done. Nitish Tyagi, the Gartner analyst behind the forecast, puts it plainly: "developers tend to optimize for speed and convenience over cost efficiency". Of course they do. You would too. Nobody opens their editor in the morning thinking about token budgets. They think about shipping the feature. So the cost discipline will never come from individual choice. It has to be designed into how the work is organised.

This is where I want to pull leaders out of the weeds of software and into the bigger picture, because this is not really a coding story. The same meter is running on Copilot licences, on enterprise AI assistants, on every chatbot and agent your business is rolling out priced by usage (or at least can be moved to a meter if the powers that be decide to make it so). If you are a Chief Digital Officer or a Head of Strategy, this lands on your desk first. The alternative is your CFO discovering it for you in a budget review, and that is a far less comfortable conversation.

Gartner's recommendation is to stop treating AI use as a free-for-all and instead sort work into three clear lanes. It is worth using their structure, because it is a useful way to think.

Developer-led. A person does the work, with the AI offering suggestions at most. This is for the high-stakes, high-judgement tasks where you want a human firmly in control and the token cost is incidental.

Developer-with-agent. A person and the AI work together, the human steering and reviewing while the tool handles the heavy lifting. This is the middle ground, and it needs the most attention, because it is easy to let the tool run further and spend more than the task warrants.

Fully agent-led. The AI handles a task end to end with little human involvement. Reserve this for simple, repetitive, low-risk work where the economics genuinely stack up, and route those jobs to cheaper, smaller models rather than the most expensive one by reflex.

Underneath all three sits a simple governance habit: review token spend the way you already review time and budget. Gartner suggests folding token usage into the regular retrospectives teams already hold. That is not exotic. It is just deciding to look.

There is a sharper edge to this, though, and it is worth naming. Once your whole team depends on a tool priced by usage, the company selling it holds the lever, not you. They set the price per token, and they can move it. Usage-based pricing is not just a budgeting quirk, it is a question of who has power in the relationship. The more deeply a tool is woven into how your people work, the harder it is to walk away when the terms change. So watching the meter is only half the job. The other half is watching who controls it, and making sure you are never so locked in that a price rise becomes a crisis rather than a decision.

Now for the part that worries me more than the bill. In the next few years the AI will get faster and more capable, and yes, it will write a great deal more of the code. Some leaders will read that as permission to gut their engineering teams and hand the work to AI, maybe even let non-technical staff build products with it. Resist that. The AI has no subjective experience. It has never lived through a failed rollout at two in the morning, never felt the cold drop of a compromised system, never been the engineer who deleted the production database and learned, permanently, what that costs. Those scars are not trivia. They are exactly the judgement that stops a small mistake becoming a catastrophe.

AI magnifies human capability, it does not manufacture it from nothing. People who could never build digital products before will genuinely be able to do more, and that is worth celebrating. But your developers hold the skills and knowledge that let technology scale safely, and as cyber-security threats grow heavier every year, you will need people of a technical disposition more, not fewer. So before you find yourself, eighteen months from now, competing in a tight market to rehire the very people you let go, think twice. The cost of the meter is recoverable. The cost of losing your technical memory is not.

A powerful tool in an undisciplined system does not save you money, it accelerates the waste. The productivity promise of AI was never really about the technology. It was always about whether you have the operating discipline to use it well, and the human judgement to know when not to. The companies that win the next three years will not be the ones with the cleverest models. They will be the ones who watched the meter, kept control of it, and kept their people.

Frequently Asked Questions

Why are AI coding tools suddenly so expensive to run?

Because most AI coding tools have moved from fixed-price licences to usage-based pricing, where you pay per "token", the small chunks of text the model reads and writes. As whole teams adopt the tools and use them daily, token consumption climbs sharply, and Gartner predicts the total cost could overtake an average developer's salary by 2028.

Will AI replace my software developers?

No, and treating it as a reason to cut technical staff is a mistake. AI will write more of the code, but it has no lived experience of failed rollouts, security breaches, or production disasters, and that hard-won judgement is what keeps systems safe at scale. As cyber-security threats grow, you will need technically skilled people more, not fewer.

What is the hidden cost of AI-generated code?

The hidden cost is maintenance and ownership of code your team did not write and may not fully understand. AI can generate working code quickly, but if nobody grasps how it works, the bill arrives later in slow debugging, extension, and risk. Cheap to produce is not the same as cheap to own, and that gap rarely shows up on the invoice.

Whose responsibility is it to control AI spend?

It belongs to senior leadership, specifically roles like the Chief Digital Officer or Head of Strategy, not individual developers. Gartner's analysis is clear that developers naturally optimise for speed and convenience over cost, so discipline will not emerge from personal choice. It has to be designed into how work is governed, or the CFO will impose it later under worse conditions.

How can I avoid being locked into an AI vendor's pricing?

Avoid lock-in by keeping awareness of how deeply each tool is woven into your workflows and never becoming so dependent that a price rise turns into a crisis. Usage-based pricing hands the pricing lever to the vendor, so governance means watching not just what you spend but who controls the meter, and preserving your ability to switch or scale back.

How to Switch From ChatGPT to Claude and Keep Your Memory

Jamie Bykov-Brett — Thu, 25 Jun 2026 09:07:12 GMT

If you have spent months training ChatGPT to understand how you work, the thing holding you back from trying Claude is usually not the chat box. It is the memory. All those preferences, project details, and "remember that I always want it this way" notes feel like they live inside ChatGPT and nowhere else.

The good news: you can bring most of that across in about ten minutes, and Claude now has an official tool built for exactly this. This guide walks you through reviewing what ChatGPT actually remembers, cleaning it up so you do not drag stale junk into your new setup, and importing a tidy version into Claude.

This is for anyone with a normal ChatGPT account (Free, Plus, or Pro) who wants to switch to Claude, or simply run both and keep their context in sync. You do not need to be technical. If you can copy and paste, you can do this.

Prerequisites

A ChatGPT account you have been using (Free, Plus, or Pro). Business and Enterprise workspaces handle data exports differently.
A Claude account. Memory and the import tool are available on the free plan, so you can set this up before paying for anything.
About 10 to 15 minutes.
No cost, no extensions, no third-party tools required.

Step 1: Understand what you are actually moving

Before touching any buttons, it helps to know there are two different things people mean by "my ChatGPT data," because they move in completely different ways.

The first is your memory — the facts and preferences ChatGPT has saved about you. As of 2026, ChatGPT keeps this in two layers: an explicit, editable list called saved memories, and a looser background recall of your past chats. You can see and control the explicit list in Settings > Personalization > Manage memories. This is the part worth bringing to Claude.

The second is your full chat history — every conversation you have ever had. This is useful as a personal backup, but you almost never want to dump all of it into a new assistant. The signal is in your preferences, not in three thousand old messages.

For most people, the goal is: bring the memory and preferences, leave the raw history behind (or keep it as a private backup). That is the approach this guide takes.

Step 2: Review and clean your ChatGPT memories

This is the step everyone skips, and it is the one that makes the biggest difference. ChatGPT's memory is cumulative, so it often holds things that are out of date or were never quite right: a city you left two years ago, a project that wrapped, a one-off request it mistook for a standing preference.

In ChatGPT, open Settings > Personalization > Manage memories. You will see each saved memory as its own row with the date it was added. Read down the list and delete anything that is wrong, stale, or irrelevant using the trash icon on each row.

You should now have a memory list that genuinely reflects how you work today. Cleaning here is far easier than cleaning later, because you are working from ChatGPT's own structured list rather than a wall of exported text.

Step 3 (optional): Export your full history as a backup

If you want a personal archive of everything before you switch, take a full export. This is optional and separate from the memory import.

In ChatGPT, go to Settings > Data controls > Export data, then select Export and confirm. ChatGPT emails you a download link when the file is ready. This can arrive within minutes, though OpenAI says it can take up to a few days. The link expires 24 hours after it arrives, so download it promptly while signed in to the same account.

The ZIP file contains conversations.json (your full message history with timestamps) and chat.html (a browser-readable version of the same thing). One important catch: this export does not include your saved memories or, reliably, your custom instructions. That is exactly why Step 2 and the import in Step 5 matter. The export is a history backup, not a memory transfer.

Step 4: Generate a clean summary of what ChatGPT knows about you

Now you turn ChatGPT's memory into something portable. Open a fresh ChatGPT chat (a new one, so old context does not muddy the output) and ask it to summarise everything it knows about you in a structured, condensed form.

You can use a prompt like this:

Based on everything you know about me from your saved memories and our
past conversations, write a single structured summary I can give to another
AI assistant so it understands how to work with me.

Organise it under clear headings: About me, How I like to communicate,
My ongoing projects, My tools and preferences, and Things to avoid.

Be concise. Merge anything that repeats, drop anything trivial or one-off,
and leave out anything sensitive I would not want stored. Use short bullet
points, not paragraphs.

ChatGPT will produce a tidy, deduplicated profile. Because you asked it to merge repeats and drop trivia, this is also where the real condensing happens: you are getting a clean signal instead of a raw memory dump.

You should now have a short, readable summary on screen. Read it once before moving on.

Step 5: Condense and sanity-check the summary

Do not paste blindly. Take thirty seconds to edit the summary ChatGPT gave you:

Delete any line that is no longer true.
Cut anything sensitive you would rather not have stored in another system (financial details, health notes, anything private).
Merge near-duplicates into one clear line.
Keep it tight. A focused half-page beats a sprawling two pages. Memory works best when it holds your durable preferences, not every passing detail.

Think of it as writing a handover note for a new colleague. You would give them the things that genuinely help them work with you, not a transcript of everything you have ever said.

Step 6: Import the summary into Claude

In Claude, open Settings > Capabilities > Memory and choose Start Import (you can also go straight there via claude.ai/settings/capabilities). Claude will hand you a short prompt of its own and explain the flow, which is the same idea you just did manually: it is designed to pull your context out of another assistant in one chat.

Paste your cleaned-up summary from Step 5 into Claude's import box and confirm. Claude processes it and stores the preferences and context so they apply across your future conversations automatically.

A few things worth knowing:

The import tool is available on all Claude plans, including free, though it was still labelled experimental as of early 2026.
It transfers your memory and preferences, not your full chat history. (That is fine, your history backup from Step 3 lives separately.)
Each import adds to your existing memory rather than overwriting it, so you can repeat this later to layer in context from other assistants too.
Claude updates its memory within about 24 hours of an import, though it is often much faster.

You should now see your imported context reflected in Claude's memory settings.

Step 7: Set up Projects for your ongoing work

Memory handles the global "who you are" context. For specific, recurring work, Claude's Projects are the better home. A Project is a workspace that groups related chats, files, and its own custom instructions, and it keeps its own separate memory so context from one project does not bleed into another.

For each major thing you work on, create a Project, write a few lines of custom instructions describing how you want Claude to behave there, and upload any reference files. This mirrors how you might have used custom instructions in ChatGPT, but with cleaner separation between different areas of your life or work.

One pleasant difference to expect: when Claude uses something it remembers, it tends to say so out loud, rather than quietly folding it in the way ChatGPT does. If you would rather it remembered less, you can review or switch off memory in the same settings area, and you can control training data use under Settings > Privacy.

Troubleshooting

ChatGPT's summary is missing obvious things. Memory only contains what ChatGPT chose to save. If something important is missing, tell it directly in that chat ("you also know that I..."), then ask it to regenerate the summary.

The export email never arrives. Check spam, confirm you requested it on the right account, and remember the link expires 24 hours after delivery. If you missed the window, just request a new export. The export is only needed if you want a history backup; it is not required for the memory import.

Claude does not seem to use the imported memory yet. Give it up to 24 hours, and check that memory is switched on in Settings > Capabilities > Memory. You can open that screen to confirm your imported context actually landed.

The import pulled in something wrong or outdated. Open Claude's memory settings and edit or remove the entry directly. This is why the cleanup in Steps 2 and 5 matters, but you can always fix it after the fact.

You are on a Business or Enterprise account. Data export and memory controls can differ on managed workspaces. Check with your workspace admin, or do the Step 4 summary approach, which works regardless of account type because it only uses a normal chat.

Wrapping up

Switching assistants does not mean starting from scratch. The whole job comes down to three moves: clean up what ChatGPT remembers, ask it to write you a tidy summary, and import that into Claude. The export in Step 3 is just an optional safety net for your old conversations.

The bigger lesson is that your context is an asset worth curating, not hoarding. Whichever assistant you land on, a short and accurate memory will serve you far better than a giant pile of half-true facts. Spend the ten minutes to condense it well, and Claude will feel like it has known you for months from day one.

If getting your team to actually adopt these tools well (rather than just signing up for them) is something you are working on, you can see the kind of work I do over at Bykov-Brett Enterprises.

Frequently asked questions

Will I lose my ChatGPT chat history if I switch to Claude?

No. Moving to Claude does not touch your ChatGPT account, and nothing is deleted unless you delete it yourself. If you want a personal copy of every conversation, take the full export in Step 3. Just remember that the memory import in Step 6 brings across your preferences and context, not the raw transcripts.

Does ChatGPT's data export include my saved memories?

No, and this catches a lot of people out. The official export gives you your conversation history as conversations.json and chat.html, but your saved memories and custom instructions are stored separately and are not reliably included. That is exactly why the better route is to ask ChatGPT to summarise what it knows about you (Step 4) and import that summary into Claude.

Is Claude's memory import free?

Yes. Memory and the import tool are available on Claude's free plan, so you can set all of this up before deciding whether to pay for anything. The import was still labelled experimental as of early 2026, so expect small rough edges.

Can I keep using both ChatGPT and Claude?

Absolutely. This is not a one-way door. Plenty of people run both and use the summary-and-import method to keep their context roughly in sync. Each import in Claude adds to your existing memory rather than overwriting it, so you can re-import an updated summary whenever your preferences change.

How long until Claude actually uses the imported memory?

Usually within minutes, though Claude says it can take up to 24 hours to fully process an import. If it does not seem to be using your context, open Settings > Capabilities > Memory and check that memory is switched on and that your imported entries actually landed.

What is the difference between Claude's memory and Projects?

Memory is your global context: who you are and how you like to work, applied everywhere. A Project is a dedicated workspace for one area of work, with its own instructions, files, and separate memory. Use memory for the general stuff and Projects for specific, recurring work you want kept apart.

Is it safe to import personal information this way?

Treat the summary as something you control. Before pasting it into Claude, read it and cut anything sensitive you would rather not store, such as financial or health details. On consumer plans your data can be used to improve the model by default, so if you would prefer it was not, you can turn that off under Settings > Privacy.

What the xAI Mississippi Lawsuit Reveals About AI Governance

Jamie Bykov-Brett — Wed, 24 Jun 2026 11:57:11 GMT

When a government's top lawyers step into a courtroom to defend a private company against its own neighbours, that is worth a closer look.

That is what happened in Mississippi. The US Justice Department filed a motion to step into a civil lawsuit and have it thrown out, arguing that the facility at the centre of the case is critical to the economy and the US military. The facility is a data centre owned by xAI, Elon Musk's artificial intelligence company.

Here is the plain version of what a data centre actually is. It is a large building full of computers that run AI systems. Those computers need enormous amounts of electricity. xAI's $20 billion site near Memphis is being powered, in part, by dozens of portable natural gas turbines, which are essentially industrial engines that burn gas to make power. The NAACP and environmental groups filed suit in April, alleging that xAI ran those turbines without the air permits the federal Clean Air Act requires, near homes, schools and churches.

So you have two stories sitting on top of each other. One is about the future of AI. The other is about who breathes the air next to the machines that make it possible. The lawyers from Earthjustice, who represent the NAACP, called the data centre and its emissions something that is turning communities into "sacrifice zones". That is a hard phrase, and it deserves to be sat with rather than smoothed over.

I spend a lot of my time helping leaders think clearly about AI without getting swept up in the hype or the fear. This story is a useful corrective for anyone who imagines AI as something clean and weightless that lives "in the cloud". There is no cloud. There are buildings, turbines, water, land and people. Every prompt you type and every model your organisation deploys runs on physical infrastructure somewhere, and that somewhere has neighbours.

If you sit on a leadership team in healthcare, energy, financial services or any organisation with a sustainability commitment, this raises a question you may not have asked yet. What do you actually know about the compute powering your AI tools? Where is it, how is it powered, and who carries the cost of that power? It is entirely possible to publish a carbon pledge on one page of your annual report while quietly buying AI capacity that runs on unpermitted gas turbines somewhere you will never visit. That is not a hypothetical gap. It is a governance blind spot, and blind spots are where reputational damage grows.

There is a second signal here, and it is about the rules themselves. The state of Mississippi decided no permit was required, and the federal Justice Department argued that enforcing the law belongs to the executive branch, not to private groups.

Whatever you make of the legal merits, the framing tells you something. US enforcement is being shaped by industrial policy and national security priorities as much as by environmental ones. If your organisation benchmarks its governance standards against the United States, that is a moving target, and you should treat it as one.

This is where my discomfort with technology-first thinking becomes practical rather than philosophical. The argument being made is that the data centre is too important to slow down. Maybe it is important. But "important" is not the same as "exempt", and once we accept that powerful enough technology can outrank the rules meant to protect people, we have changed something about how power works. The interesting decisions in AI are rarely about the models. They are about who benefits, who is harmed, and who gets a say.

You do not need to solve American energy policy to act on this. You can ask your own suppliers a short list of honest questions and write the answers down. Where does our AI compute run? How is it powered? What environmental permits and community impacts sit behind it? If your vendors cannot answer, that silence is itself the answer. The leaders who come out of the next few years with their credibility intact will be the ones who treated the infrastructure behind their AI as their responsibility, not someone else's footnote.

Frequently Asked Questions

What is the xAI Mississippi data centre lawsuit actually about?

The lawsuit alleges that xAI ran dozens of natural gas turbines to power its Memphis-area AI data centre without the air permits required by the federal Clean Air Act. The NAACP and environmental groups, represented by Earthjustice and the Southern Environmental Law Center, say this created health risks for nearby homes, schools and churches. The US Justice Department has moved to intervene and dismiss the case.

Why is the US Justice Department defending a private company?

The Justice Department argued that the data centre is critical to the economy and the US military, and that enforcing federal law is the job of the executive branch rather than private groups. The motion reflects a broader stance that treats AI infrastructure as a national security and industrial priority. Critics, including Earthjustice, describe it instead as shielding a wealthy company from pollution rules.

Does AI really have a physical environmental footprint?

Yes. AI runs on data centres, which are large buildings full of computers that consume significant electricity and often water for cooling. Powering them can mean burning natural gas or drawing heavily on local grids. The idea of an invisible "cloud" hides the fact that every AI model relies on physical infrastructure with real environmental and community consequences.

What should leaders ask about the AI tools they buy?

Ask where your AI compute physically runs, how it is powered, and what environmental permits and community impacts sit behind it. These questions expose whether your sustainability commitments match the infrastructure behind your AI. If suppliers cannot answer clearly, treat that silence as a warning sign rather than a minor detail, because it points to a governance blind spot.

How does this affect organisations outside the United States?

It signals that US environmental enforcement may increasingly bend to industrial policy and national security priorities rather than environmental ones. Any organisation that benchmarks its governance standards against the United States should treat that as a shifting baseline. It also reinforces the need to assess the environmental impact of AI infrastructure independently, rather than assuming local regulation guarantees responsible practice.

When AI Surveillance Turns to Watch the Boss

Jamie Bykov-Brett — Mon, 22 Jun 2026 16:17:11 GMT

AI surveillance has spent the last decade pointed downwards, watching workers for productivity, keystrokes, time spent away from the screen. So there is something genuinely surprising in the latest pitch from companies like Smarsh: the camera has swung round to face the boss. Their software promises to scan workplace messages and behaviour, then flag managers whose conduct is turning sour. The selling line is striking. Smarsh says its systems ensure "bad conduct is spotted and escalated instantly", allowing companies to locate "patient zero" before a "contagion" of toxic culture takes hold.

Read that pitch carefully, because the metaphor is doing a lot of work. Toxic culture is being framed as a disease. The bad manager is the infected host. The fix is early detection and containment. It sounds responsible, almost caring. And for anyone who has worked under a manager who belittles people in meetings or quietly freezes someone out, the instinct to catch that early is completely understandable. I have sat with frontline teams who endured exactly this for years while HR looked the other way. So I am not here to mock the intention.

But I want to be honest about what is actually being sold, because the gap between the promise and the reality is where leaders get into trouble.

First, the disease metaphor hides a decision. Calling a manager "patient zero" makes the software sound neutral, like a thermometer. It is not. Someone has to define what toxic looks like in data: which words, which patterns, which tone counts as a symptom. That definition reflects the values and assumptions of whoever built the model. Sarcasm reads differently across cultures. Directness that lands as rude in one team is respected as honesty in another. A neurodivergent manager who communicates bluntly might trip the same flag as a genuine bully. The machine does not know the difference. It only knows the pattern it was trained to find.

This is not a hypothetical worry. We have watched the older version of it play out for years in psychometric testing. Someone I know is partially deaf and partially blind. In a busy room with lots of voices, they struggle to work out which direction a sound is coming from, so when a test asked whether they preferred to work alone or in a team, they answered "alone". That is a preference shaped by a disability, not a measure of whether they can work with others. The two are not the same thing. David Beckham kicks a ball with his right foot and so do I, (when my ankle isn't broken like it is now). Same preference, wildly different ability. The test could not tell the difference, and the person was turned down for the promotion because the employer wanted "team players" and the data said they were not one. A flag had been raised. Nobody asked what it actually meant.

Second, watching for bad behaviour is not the same as building good culture. This is the trap I see organisations fall into again and again. You can automate detection. You cannot automate trust. If your culture is poor, a tool that flags toxic managers is treating the symptom while leaving the cause, which is usually how power, pressure and incentives flow through the business, completely untouched. Plenty of "toxic" managers are ordinary people crushed by impossible targets and zero support. Flag them, remove them, and the next person in the role inherits the same broken conditions. Automating a broken process just helps you fail faster.

Third, there is the quiet shift in who gets watched. Selling surveillance as protection for staff is clever, because it feels like the tool is on the worker's side. But the same system that reads a manager's messages reads everyone else's too. Once the infrastructure is installed, the question of what it monitors, and for whom, is a governance choice, not a technical fact. The honest test is the one I keep coming back to with leaders: can responsibility be traced when this system gets it wrong? If a manager is flagged, sidelined or sacked partly on the say-so of a model nobody can fully explain, who is accountable for that call?

None of this means the technology is worthless. Early signals about a deteriorating team can be genuinely useful when a human being uses them to start a conversation rather than to build a case. The difference is leadership. A flag should open a door, not close one.

If you are weighing up a tool like this, here is one thing worth doing before you sign anything. Ask the vendor to show you exactly how their system defines toxic behaviour, who chose that definition, and what happens to a person once they are flagged. If they cannot answer plainly, you are not buying a culture solution. You are buying a liability with a friendly metaphor wrapped round it.

Frequently Asked Questions

What does AI workplace monitoring software that flags toxic bosses actually do?

It scans workplace communications and behavioural data to detect patterns it has been trained to read as toxic, then alerts the organisation to managers showing those signs. Smarsh, for example, says its system spots and escalates bad conduct instantly to find "patient zero" before toxic culture spreads. In practice it flags people for human review, it does not judge culture on its own.

Could AI monitoring discriminate against disabled or neurodivergent employees?

Yes, because these systems read patterns, not people, and a pattern can reflect a disability rather than a problem. Someone partially deaf who answers that they prefer working alone may be describing how they cope with noise, not their ability to work in a team. A blunt neurodivergent manager can trip the same flag as a bully. Without a human asking what a flag means, the tool can screen out the very people it should protect.

Is monitoring managers a good way to fix a toxic workplace culture?

Detection alone does not fix culture, because most toxic behaviour grows from how pressure, targets and incentives flow through a business. A tool can flag a struggling manager, but if you remove them without changing the conditions, the next person inherits the same broken role. Surveillance treats the symptom while leaving the cause untouched.

Who is held accountable if the AI wrongly flags a manager?

Accountability must sit with the humans who act on the flag, not the software, which is exactly why traceability matters before you buy. If a manager is sidelined or dismissed partly on the say-so of a model nobody can fully explain, the organisation still owns that decision. Always ask a vendor what happens to a person once the system flags them.

What should leaders ask before buying workplace monitoring tools?

Ask the vendor to show exactly how the system defines toxic behaviour, who set that definition, and what happens to someone once they are flagged. If those answers are not plain and clear, you are not buying a culture solution, you are buying a liability. The same system that watches managers can watch everyone, so treat its scope as a governance choice.

How to Turn Audio Into Text for Free on an Old Windows PC

Jamie Bykov-Brett — Sun, 21 Jun 2026 00:52:11 GMT

You do not need a new computer, a subscription, or any technical know-how to turn recordings into written text. With one free app, an eight-year-old Windows laptop can transcribe interviews, voice notes, meetings, and video, all without sending a single file to the internet.

This guide is written for someone who has never done this before. There is no jargon and no command line. If you can install an app from the Microsoft Store and drag a file onto a window, you can do this.

It uses a free, open-source app called Buzz, which is built on Whisper, the same speech-to-text technology that powers a lot of paid transcription tools. Everything runs on your own machine, so your audio stays private.

Who this is for and what you will need

This is aimed at complete beginners on an older Windows PC. A machine from around 2017 with 8GB of memory and no fancy graphics card is perfectly capable.

Before you start, you will need:

A Windows computer (older and modest is fine).
An internet connection for the one-time setup only.
The audio or video file you want to turn into text.
About ten minutes to get set up, plus processing time for your file.

One honest expectation to set first: on an older computer the text does not appear instantly. The machine has to listen to the whole recording and type it out, so a ten-minute clip might take a few minutes to finish. That is completely normal. It works well, it just is not instant.

Step 1: Install the Buzz app

Click the Start button in the bottom-left corner of the screen, type Microsoft Store, and open it.
Click the search box at the top of the Store and type Buzz transcription.
Find the app called Buzz in the results and click it.
Click Get, or Install, and wait for it to finish.

That is the whole installation. There is nothing to set up or configure.

Step 2: Open Buzz and add your file

Open Buzz from the Start menu by typing Buzz and pressing Enter.
Look for the button to start a new transcription. It is usually labelled New Transcription or shown as a plus sign.
Choose Import File and select the audio or video file you want to convert.

You should now see a small settings window appear before anything starts.

Step 3: Choose the right settings

Only two settings matter. You can leave everything else as it is.

Model: choose Small (English). This is the best balance of accuracy and speed for an older machine. The very first time you pick it, Buzz downloads the model, which takes a minute or two. After that it is saved and reused, so you only wait once.
Language: set this to English, or leave it on automatic if your audio is in another language.

Then click Run, or Start, and let it work. You will see it processing.

Step 4: Read and save your text

When it finishes, double-click the completed item in the list to open the transcript.
To keep the text, use the Export or Save option.
Choose a format. Pick TXT for plain text, or SRT if you want subtitles to add to a video.

That is the full process. Import a file, pick the model, run it, export the text.

If it feels too slow

If the Small model takes longer than you would like, do exactly the same steps but choose the Base (English) model instead. It is faster and still gives good results. The trade-off is slightly less accuracy on difficult or noisy audio.

Avoid the Medium and Large models on an older machine. They are more accurate but need far more memory and will run very slowly. For most everyday recordings, Small is the sweet spot.

A simple speed tip: close other heavy programs while Buzz is working, especially web browsers with lots of tabs open. That frees up the computer to focus on the transcription.

Troubleshooting

It looks frozen or stuck. It is almost certainly still working, especially on a longer file. Give it a few minutes and close other apps to speed it up.
The model will not download. Check your internet connection and try again. The download only needs to happen once.
The text has small mistakes. This is normal for any speech-to-text tool. Clearer audio gives better results, and the Small model is more accurate than Base if you need the extra precision.
It runs out of memory or crawls. Switch to the Base (English) model and close every other program while it runs.

In short

Install Buzz from the Microsoft Store, open it, import your file, choose the Small (English) model, and click Run. When it finishes, export your text. A modest old laptop can do genuinely useful transcription for free, with your recordings never leaving the machine.

If you like finding AI tools that actually earn their place, with no hype and no jargon, that is the whole point of what we do. You can explore more free, practical AI resources and assessments at bykovbrett.net/resources.

UK Parliament Debates Digital Sovereign Strategy

Jamie Bykov-Brett — Fri, 19 Jun 2026 07:27:12 GMT

There is a phrase buried in this week's parliamentary debate that should make any senior leader sit up, and it has nothing to do with hacking or hostile states. It is the idea of being "locked in".

MPs are worried that government departments have signed up to technology they cannot easily walk away from, even if they wanted to. Once your email, your data, your day-to-day operations all run through one company's platform, switching becomes so expensive and disruptive that you effectively cannot. You are a customer who has lost the ability to say no.

That is the heart of the amendment to the Cyber Security and Resilience Bill. Backed by twenty MPs and proposed by Liberal Democrat Victoria Collins, it calls on the government to publish a "digital sovereignty strategy" to reduce the UK's dependency on overseas suppliers across critical infrastructure. The headline framing is national security. The more practical worry, the one I think matters most for the rest of us, is concentration. Too much of the public sector now runs on a handful of American firms, and a cross-party committee has warned this leaves the UK "at the mercy" of foreign actors and represents a "clear vulnerability".

I want to be careful here, because "sovereign IT" can tip very quickly into flag-waving and protectionism, and that is not the interesting version of this story. The interesting version is about choice. A system you cannot exit is a system that controls you, not the other way around. That is true whether the supplier is in California, Shenzhen or Slough. The question is not "is this company foreign?" but "if this relationship went wrong tomorrow, could we leave without the wheels coming off?" For a surprising number of organisations, the honest answer is no.

There is a second thread in the source that deserves attention, and it is about secrecy. The MPs point out that the UK keeps its analysis of these "chronic risks", things like over-dependence on a few global tech giants, largely classified. France, Germany, Denmark and the Netherlands are having these debates in the open. France is even moving its senior civil servants onto sovereign open-source tools to reduce the risk of surveillance or sudden loss of service. You cannot have a grown-up national conversation about resilience if the evidence is sealed in a drawer. Transparency is not a nice-to-have here. It is the precondition for anyone outside government being able to plan sensibly.

So what does this mean if you are a chief data officer in financial services, a CIO in an NHS trust, or running IT for a university? Watch whether "sovereign IT" stays a slogan or grows teeth through procurement rules. If it becomes the latter, expect data-residency requirements (where your data is physically stored) and vendor-diversification expectations to tighten, and expect those expectations to flow down to private firms holding government contracts. The supply chain does not stop at the department's front door.

My practical encouragement is to stop treating this as a policy story you will deal with later, and start treating it as a design question you can act on now. Most lock-in is not imposed on us in one dramatic decision. It accumulates, one convenient default at a time, until exit feels unthinkable. The organisations that will cope best are the ones that built optionality in early, before anyone forced them to.

One thing to try this quarter: pick your most business-critical system and run a genuine exit test. Ask, in concrete terms, what it would take to move it to a different provider. How long, how much, who would have to sign off, what would break. You are not necessarily going to move it. You are finding out whether you could. That single exercise tells you more about your real resilience than any compliance checklist, and it shifts the conversation from fear of foreign suppliers to something far more useful: knowing exactly where your freedom to choose has quietly run out.

Why a Chip You Will Never See Decides What Your Smart Glasses Can Do

Jamie Bykov-Brett — Thu, 18 Jun 2026 08:27:11 GMT

Why a chip you will never see decides what your smart glasses can do

The headsets and glasses get the photographs. The launch videos, the slick demos, the people grinning while they wave their hands at floating windows.

Almost nobody points a camera at the small piece of silicon doing the actual work. Yet that chip decides what the whole device can and cannot do, which is why Qualcomm's announcement at Augmented World Expo is worth a moment of attention even if you have no interest in strapping a computer to your face.

Qualcomm has launched a new platform called Snapdragon Reality Elite, built to run artificial intelligence directly on XR devices, the headsets and glasses that mix digital content with the real world.

The headline figure is that it can deliver up to 48 TOPS of on-device AI performance, enough to run large language models and large vision models locally. In plainer terms: the kind of AI that today usually lives in a distant data centre can now run on the glasses themselves.

That distinction sounds like a technical footnote, but it has real consequences. When AI runs on the device, three things change. Latency drops, because the data does not have to travel to a server and back before anything happens. Cost shifts, because you are not paying for cloud processing every time the system thinks. And, most importantly for anyone working in a serious organisation, the data does not have to leave the device at all.

Hold onto that last point, because it is the one most people skip. If a surgeon or a fraud investigator is wearing a device that processes sensitive information, the question that kills most immersive technology pilots is simple: where does the data go? The moment the answer is "to someone else's server", the legal team and the compliance officer both sit up. On-device AI gives a different answer. The sensitive context can stay where it was captured. That single change reopens conversations that were previously closed before they began.

I have watched promising tools die in regulated environments, even though they worked, because nobody could explain, in writing, what happened to the data. The obstacle was always trust. So when silicon makes it genuinely possible to keep sensitive information local, that is a removal of one of the biggest barriers to adoption in health, finance, education and the public sector.

The rest of the announcement is the usual leap in raw numbers. Qualcomm says the platform delivers up to 160% higher performance on the part of the chip dedicated to AI, around 20% longer battery life, and a chipset that runs cooler under load. Cooler and longer-lasting matters more than it first appears, because the dream of lightweight glasses you would actually wear in public falls apart the moment the device cooks your temple or dies before lunch.

There is a scale point too. Qualcomm's Ziad Asghar noted that there are already more than 60 million XR devices in market with growing momentum across industries. That is a base big enough that the choices made at the chip level ripple outward to every device built on top.

A note of caution, because hype is the default setting in this corner of technology. A platform announcement is only a promise. The devices built on it, from XREAL's glasses to others due later this year, still have to prove themselves in real workplaces. Capability on a slide still has to become capability in the field. I have learned to wait for the second and third deployment before believing the first.

Still, if you are a leader who keeps batting away immersive technology proposals because the privacy answer never quite landed, the ground has shifted under that objection. The next XR pitch that crosses your desk deserves a sharper question than "is it ready yet". Ask instead: with the data staying on the device, what could we now responsibly try that we ruled out last year? File that one away. You will need it sooner than you think.

Buying the Flagship NHS Data Platform Was the Easy Part

Jamie Bykov-Brett — Wed, 17 Jun 2026 14:07:12 GMT

Buying the flagship NHS data platform was the easy part. A third of trusts now do fewer operations

There is a particular kind of number that is true and misleading at the same time. The NHS had one. For months, the headline claim about Palantir's Federated Data Platform, the big data system the government bought to help hospitals run more smoothly, was that it had helped trusts carry out more operations. Add up every hospital using the scheduling tool, and the total goes up. Good news, on paper.

Then the campaigning group Foxglove filed a series of freedom of information requests and looked underneath the total. What they found is that almost a third of the English trusts using the platform's operations-scheduling module are now doing fewer operations than before they switched it on. Thirteen of the 41 trusts using the tool went backwards. Between them, they recorded 9,073 fewer operations after adopting it than in the equivalent period before. The national total still rose, because the trusts that improved improved by enough to cover for the ones that declined. The average looked healthy. A third of the patients behind it did not.

I want to be careful here, because this is not a "Palantir bad" story (I have plenty of those if you are interested), and it is not even mainly a story about Palantir. Tom Bartlett, the former NHS England deputy director who led the 150-person team that built the platform, made a reasonable point in the same article: judging a broad data infrastructure by one nationally commissioned product probably misses the wider intent. He may be right. But notice that we can only have that argument because Foxglove forced the per-trust data into the open. NHS England had only ever published the cumulative total of additional operations across all trusts, which is exactly the shape of number that hides its own worst cases.

This should worry any leader who has signed off on a big technology purchase, whether in healthcare, banking or local government. If the only figure you report is the aggregate, you have built a dashboard that flatters you. Averages are generous to failure. One strong site can carry several weak ones, and you will never see the weak ones until someone outside your organisation goes digging.

Foxglove's head of strategy Tim Squirrell put the principle bluntly: if a tool gets the credit when things improve, it has to accept the blame when they get worse. You cannot claim the upside as proof and dismiss the downside as noise.

There is a deeper lesson in why those 13 trusts went backwards, and the honest answer is we do not fully know, because the comparison data for trusts not using the tool has not been published either. That gap is a bigger problem than it sounds. Without it you cannot tell whether the platform caused the decline, or whether those trusts were already struggling and the software simply landed in the middle of a harder situation. A tool dropped onto a broken process leaves the broken process in place. It just runs that process faster, and now there is a vendor logo attached to the outcome.

That is what organisations keep getting wrong about AI and data platforms. Buying the system is the easy, board-friendly move. The £300m-plus contract gets signed, the press release goes out, the cumulative number ticks up. The hard part, which no one puts on a slide, is the local work: the scheduling habits, the staffing, the trust between teams, the messy human reasons one hospital makes a tool sing and another makes it sigh. Same software, very different results, because the software was never the main variable.

So there is one thing worth doing this week if you run anything at scale. Find your headline metric, the one you quote upwards. Then break it down per team, per site, and look specifically for the units going the wrong way while the total goes up. Do not wait for an FOI request to do it for you. If your dashboard only shows the aggregate, it is almost certainly hiding your worst performers, and those are the people who most need help.

Why AI Success Depends on Boring Governance and Data Foundations

Jamie Bykov-Brett — Tue, 16 Jun 2026 08:57:11 GMT

Most people in a shiny new AI job want to talk about the shiny new thing. Damian Leach, recently made chief AI and digital officer at the corporate services firm Vistra, did the opposite. Asked how the company is building its big new AI-powered platform for around 10,000 staff across 65 locations, he said it all starts with the boring stuff: designing policies and guardrails, setting standards, doing the architecture, mapping the roadmap, and addressing regulatory, trust and privacy concerns. He called this work of "upmost importance".

That is a refreshing thing for a senior leader to admit in public, and here is why he is right.

When a company decides to "do AI", there is a strong pull towards the visible parts. The chatbot. The demo. The slide that makes the board nod. Vendors are happy to feed that appetite, because a polished tool is easy to sell and easy to buy. What they cannot sell you is the part Leach is describing. Your guardrails have to match your regulator. Your data standards have to fit your actual, messy data. Your roadmap has to survive contact with your real people and how they work. None of that comes in a box.

This is the irony of the whole AI market. The off-the-shelf stuff is, by definition, the same for everyone. The big providers will happily give you the same large language model they give your competitor. The thing that decides whether it works for you or against you is the unglamorous groundwork underneath it. That groundwork is where the value actually lives, precisely because it cannot be standardised and shipped.

Leach is honest about why this affects his customers. He describes data as one of the largest problems they face, with information sitting in silos and multiple sources giving multiple answers to the same question. If you have ever asked two departments for the same number and got two different figures, you already understand the issue. Pointing a clever AI tool at that mess does not clean it up. It just produces fast, confident versions of the confusion. Poor foundations plus powerful tools equals quicker mistakes.

There is a deeper point hiding in here about what AI is actually good for. Leach pushes back on the common assumption that AI is mainly about automating processes. He says the real benefit is improving the client experience, connecting people to their own data so they can ask "what if" questions and get useful answers. That is a meaningful shift in framing. Automation asks "how do we do the same thing with fewer people?" The better question asks "what could people do now that they could not do before?" The first squeezes value out. The second creates it.

Machines machine better than people ever could. They sort, match and process at a scale no human can touch. But deciding which questions are worth asking, which answers can be trusted, and which risks are acceptable in a regulated financial business, that remains human work. The boring stuff is really just the human stuff written down: the agreements about what good looks like, who is accountable when something goes wrong, and where a person has to stay in the loop. Skip it, and you have just hidden the judgement inside a system nobody fully understands.

For leaders watching their own organisations rush at AI, there is a simple test buried in Leach's approach. Before the demo, before the procurement, ask whether anyone has done the unglamorous work. Are the standards set? Is the data trustworthy? Do people know who owns the decision when the model is wrong? If those answers are vague, start with the foundations before the tool.

The temptation will always be to treat governance, data quality and clear accountability as the slow tax you pay before the fun part. Leach has it the right way round. The slow part is the strategy. The fun part is what you get to build once the slow part holds. A new Chief AI Officer just said so on the record, and most organisations would do well to listen.

Amazon, Anthropic's Biggest Backer Killed Fable 5

Jamie Bykov-Brett — Mon, 15 Jun 2026 15:22:11 GMT

Three days. That is how long Anthropic's most capable models, Claude Fable 5 and Mythos 5, were available to the public before they vanished. Access ended outright, with no slowdown or regional restriction first. Existing sessions now end in errors, new queries get rerouted to older and weaker models, and even Anthropic's own staff cannot use the things their company built. The trigger was a US government export control order citing national security, issued at 5:21pm on a Friday, and the model went dark for every user on Earth, American or not.

What makes this worth your attention is less the regulator and more who appears to have set the regulator in motion. According to Wall Street Journal reporting, the push came from Amazon, the company that put roughly $13 billion into Anthropic, hosts its models on AWS, builds its custom chips, and holds a seat on its board. Amazon's CEO is said to have gone directly to a member of the Trump administration with research claiming Amazon's own team had jailbroken Fable 5, getting it to produce information usable in cyber attacks. The government convened an emergency meeting, asked Anthropic to pull the model voluntarily, and when Anthropic refused, the Commerce Secretary issued the order anyway.

Look at the shape of that for a second. Your biggest investor, your landlord, your board member and your chip supplier found a flaw in your product, took it to the government instead of to you, and got the product switched off. While selling a competing model, Nova, to that same government. I have spent years telling leaders that the hardest risks are rarely technical, they are about incentives and who holds power over whom. This is that lesson with the gloves off.

There is a real safety story underneath, and I want to be fair to it. A well-known jailbreaker, Pliny the Liberator, had separately claimed to bypass the model's guardrails to extract step-by-step exploit code. So two different groups apparently cracked the safety system within days of launch. That is not nothing. But Anthropic's public position is that the jailbreak is "a misunderstanding", that the government has so far offered only "verbal evidence of a potential narrow, non-universal jailbreak", and that the same technique works on rival models that were left untouched. The company warns the precedent could "essentially halt all new model deployments for all frontier model providers".

Whichever way that dispute lands, the lesson for the rest of us holds. For two years the AI risk conversation has fixated on what a model might say or do. This episode points at a different risk: whether the model will be there at all tomorrow morning. A capability your teams now lean on can be withdrawn by an order none of you saw coming, pushed by a company that is simultaneously your supplier and your competitor. "The vendor turned it off" is not a sentence your business continuity plan can absorb.

And the tangle runs deeper than one feud. Eight of the nine most valuable tech companies are building their own models while also investing in, hosting, and supplying their rivals. Microsoft backs OpenAI while shipping its own models. Google backs Anthropic. Everyone has a hand in a competitor's pocket. So the neutral, dependable platform you think you are buying from is often entangled with the firm most motivated to see it fail.

The geopolitical version is already landing. Canada's prime minister said the ban shows the risk of leaning on American AI, because any country can wake up to find its infrastructure switched off by Washington. That is the same point a business should be making internally, just one scale up.

The practical answer is unglamorous and it works: an abstraction layer, meaning your applications talk to a thin internal layer of your own rather than calling one vendor directly, so when a model disappears you reroute to another without rewriting everything underneath. Pair it with a tested fallback so the switch is a controlled drill.

So a useful exercise this week. List the workflows that would simply stop if your primary model went dark tomorrow. Next to each, name who controls that switch, and whether they also sell something that competes with you. If the honest answer is "we wait, and they benefit while we wait," you have just found the work.

I guess the question is if Amazon was playing ethical consciousness or snake in the grass cosying up to the US government, you decide.

How to Host Your Website on GitHub Pages for Free (With Claude Prompts)

Jamie Bykov-Brett — Sun, 14 Jun 2026 09:02:07 GMT

By the end of this guide you'll have a live website with free hosting, free HTTPS, and optionally your own domain name, with no monthly bill. This is exactly how I run my own sites, including distributedrepublic.xyz: the whole thing is a GitHub repository, and every time I push a change, the site rebuilds and redeploys itself automatically.

The hosting service is GitHub Pages, which serves static websites directly from a repository. It is genuinely free for public repositories, with generous limits: sites up to 1GB and a soft bandwidth limit of 100GB per month, which is far more than most small business sites or personal projects will ever touch.

The twist in this guide is that you won't do most of the work. At each step I'll give you a prompt to paste into Claude (ideally Claude Code, which can run the commands for you, but the chat interface works too if you're happy to copy commands across yourself).

Prerequisites

A free GitHub account
Access to Claude (Claude Code is best for this, since it can execute the steps)
Around 30 to 60 minutes
Optional: a custom domain from any registrar (roughly £10 a year). Without one, your site lives at yourusername.github.io for free.

One honest caveat before we start: GitHub Pages serves static sites. HTML, CSS, JavaScript, images. That covers portfolios, landing pages, documentation, event pages, and most small business sites. It does not cover anything that needs a server-side database or user logins.

Step 1: Build (or bring) your site

If you already have a folder of HTML files, skip ahead. If not, let Claude build one. Paste this into Claude and edit the bracketed part:

Build me a simple, fast, mobile-friendly one-page website for [describe your project, business, or idea in two or three sentences]. Use plain HTML and CSS in a folder called my-site, with no frameworks and no build step. Include a hero section, an about section, and a contact section.

You should now have a folder containing at least an index.html. Open it in a browser to check you're happy with it. Iterate with Claude until you are; changes are cheap at this stage.

If you'd rather use a framework like Astro or Vite (I use Astro for Distributed Republic), ask for that instead. Just tell Claude "this will be deployed to GitHub Pages" so it configures the build correctly.

Step 2: Put the site in a GitHub repository

Paste this into Claude Code from inside your site folder:

Initialise this folder as a git repository, create a new public GitHub repository called my-site under my account using the gh CLI, and push the code to the main branch. If the gh CLI isn't installed or authenticated, walk me through setting that up first.

Checkpoint: visit github.com/yourusername/my-site and you should see your files.

If you're using the chat interface rather than Claude Code, ask Claude for the exact commands instead and run them in your own terminal.

Step 3: Add the deployment workflow

This is the piece that makes the site deploy itself on every change. GitHub Actions is GitHub's free automation service, and it has an official pair of actions for publishing to Pages.

Prompt:

Add a GitHub Actions workflow at .github/workflows/deploy.yml that deploys this site to GitHub Pages on every push to main, using actions/upload-pages-artifact and actions/deploy-pages. If the site has a build step, run it first and upload the build output folder; if it's plain HTML, upload the folder as-is. Then tell me exactly what to change in the repository settings to enable it.

For reference, here's what that workflow looks like for a plain HTML site:

name: Deploy to GitHub Pages

on:
  push:
    branches: [main]
  workflow_dispatch:

permissions:
  contents: read
  pages: write
  id-token: write

concurrency:
  group: pages
  cancel-in-progress: true

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    steps:
      - uses: actions/checkout@v5
      - uses: actions/upload-pages-artifact@v4
        with:
          path: .
      - id: deployment
        uses: actions/deploy-pages@v4

For a framework site, the workflow gains a build job (install Node, run npm ci and npm run build, upload the dist folder). Claude will handle that variation if you told it what framework you're using.

Step 4: Tell GitHub to use the workflow

This is the one step you must do by hand in the browser, and it's the step most people miss.

Go to your repository on GitHub
Settings, then Pages in the left sidebar
Under "Build and deployment", set Source to GitHub Actions

Until you do this, GitHub either serves nothing or serves files from a branch directly, and your workflow's deployments go nowhere.

Step 5: Push and watch it go live

Commit and push any small change (or re-run the workflow from the Actions tab). Checkpoint: the Actions tab shows a green tick, and your site is live at https://yourusername.github.io/my-site/.

From now on, this is your entire publishing process: change a file, push, wait about a minute. That's it. No FTP, no hosting dashboard, no invoices.

Step 6 (optional): Connect a custom domain

A github.io address is fine for a project, but a real domain looks better and costs about the price of two coffees a year. Prompt:

I own the domain example.com and my GitHub Pages site is in the repository yourusername/my-site. Walk me through connecting the domain: the CNAME file the repo needs, the exact DNS records to add at my registrar for both the apex domain and www, and how to enforce HTTPS once it's verified.

The short version of what Claude will tell you:

In Settings, then Pages, enter your domain under "Custom domain" (this creates a CNAME file; if your site has a build step, the file needs to live in your static assets folder so it survives the build)
At your registrar, point the apex domain at GitHub Pages' four A record IP addresses, and point www at yourusername.github.io with a CNAME record
Wait for DNS to propagate (minutes to a few hours), then tick Enforce HTTPS

GitHub provisions a free SSL certificate automatically via Let's Encrypt, so your padlock costs nothing too.

Troubleshooting

The workflow ran but the site is a 404. Almost always Step 4: the Pages source isn't set to GitHub Actions. Fix the setting and re-run the workflow.
The site loads but CSS and images are broken. Project sites live under a subpath (/my-site/), and absolute paths like /style.css break there. Tell Claude: "my site is deployed at username.github.io/my-site and the assets 404, fix the paths or set the correct base path in my build config." A custom domain at the apex makes this problem disappear entirely.
The custom domain shows a certificate warning. HTTPS certificates take a little while to issue after DNS verifies. Wait an hour, then check the Pages settings for errors. If it persists, ask Claude to check your DNS records with dig.
The site didn't update after a push. Check the Actions tab first; a red cross means the build failed, and you can paste the log straight into Claude to diagnose. If it's green, it's usually your browser cache, so hard-refresh.
The repository must be public for free Pages hosting (private repos need a paid plan). Don't put anything in it you wouldn't publish, including API keys in code. Everything in the repo is visible to the world.

What next

You now have professional hosting infrastructure that large companies use for their documentation sites, for the price of nothing. The same pattern scales from a one-page site to a full multi-page site built with a framework, and the workflow never changes: edit, push, live.

A good next step is asking Claude to add the things that make a site feel finished: a favicon, social sharing previews, and a sitemap. And if you're curious what else AI can take off your plate beyond building websites, the free resources at bykovbrett.net/resources are a good place to start.

Why Most People Can't Tell Claude Fable 5 From Opus (and Why That's a Clue)

Jamie Bykov-Brett — Sat, 13 Jun 2026 13:42:11 GMT

Anthropic released Claude Fable 5 on 9 June 2026. It is the first model in their new Mythos class, a capability tier that sits above Opus, and the most capable model they have ever made generally available. TechCrunch called it the public version of a model Anthropic had previously restricted to a small group of vetted organisations.

But the most common reaction I've heard in the three days since launch is some version of "I tried it and it seems about the same as Opus." I shared my own first impressions of Fable 5 after two days with it; this piece is about why so many people are underwhelmed when they shouldn't be.

Both of those statements are true. Fable 5 is a generational leap, and most people genuinely can't see it. The gap between those two facts tells you more about where AI is heading than any benchmark chart.

A chat window hides the difference

If you use AI through a chatbox interface, you are the orchestrator. You ask a question, the model answers, you read it, you decide what to ask next. Every turn is a short, self-contained task, and the model only has to be good for one step at a time.

Opus was already excellent at one step at a time. So is Fable 5. Which is why, on routine single-turn work, the performance differences narrow considerably. Anthropic says it plainly in their own announcement: the longer and more complex the task, the larger Fable 5's lead. We are talking about tasks that might take hours to produce. So, the inverse is also true. Make the task short and simple, and the lead nearly vanishes.

Asking a frontier model to draft an email is like hiring a senior engineer to change a lightbulb. You won't learn much about what they can do.

The benchmarks only diverge when the task gets long

Look at where the published numbers actually split:

On SWE-bench Pro, a benchmark of real software engineering tasks, Fable 5 scores 80.3% against 69.2% for Opus 4.8.
On FrontierCode Diamond, which tests the hardest long-horizon coding problems, Fable 5 scores 29.3% against 13.4%. That is more than double.

Both numbers come from the same benchmark comparison. Notice the pattern: the harder and longer the work, the wider the gap. These are tasks that take hours of autonomous effort, hundreds of small decisions, each one building on the last. Small improvements in per-step judgement compound enormously over a long chain. A model that recovers well from its own mistakes finishes jobs that a slightly weaker model abandons halfway.

The most striking real-world example came from Stripe, who reported during early testing that Fable 5 completed a codebase-wide migration across 50 million lines of code in a single day. Their estimate for doing it by hand was over two months of team effort.

Phase 2 of AI is agentic

I think about AI adoption in two phases.

Phase 1 was the chatbot era. AI answers. You ask, it responds, you do something with the response. Almost everyone's mental model of AI was formed here, and almost all of the "is it actually better?" judgements are still being made here.

Phase 2 is the agentic era. AI does. You give it a goal, and it plans the work, uses tools, reads and writes files, runs commands, tests its own output, corrects course, and keeps going until the job is finished. It produces a chain of hundreds of actions rather than a single answer.

Fable 5 was built for phase 2, and the design choices show it. Its reasoning is always on rather than optional. A single request can run for many minutes while it works. It delegates dependably to parallel sub-agents. It uses file-based memory unusually well: in one of Anthropic's tests, giving the model persistent notes improved its performance three times more than the same memory helped Opus 4.8. None of those qualities are visible in a four-line chat exchange.

There is a price signal here too. Fable 5 costs double Opus per token, yet analysts note it often completes the same work in fewer steps, so the economics favour long jobs rather than quick answers. Anthropic priced it for work rather than chat.

Where you'll actually feel the difference

In a terminal. In an agent platform. In an automated pipeline. Anywhere the model is given a job instead of a question.

Most of my own AI use has moved out of the chat window. Agents research prospects, draft and check content, monitor systems, and build features end to end while I do something else. At that kind of work, the difference between Fable and Opus is stark. It is the difference between checking in on an agent and finding the job done, versus finding it stuck.

If you want to test this yourself, don't ask Fable 5 a clever question. Give it a real multi-step job through an agentic tool like Claude Code: "audit this codebase and fix what you find", "research these ten companies and produce a comparison", "build this feature and test it". Then judge.

The takeaway

If your experience of AI is a chat box, your benchmark for AI progress is stuck in phase 1, and every new model will feel like a minor update from here on. The frontier has moved to a different surface. The organisations compounding an advantage right now are the ones handing AI whole jobs instead of single questions.

That shift, from AI that answers to AI that does, is the one worth getting ahead of. If you'd like to see what agentic AI looks like on real business work, that's exactly what we explore at Bykov-Brett Enterprises, live demos included.

Monako Smart Glasses Signal the Shift to AI Agent Interfaces

Jamie Bykov-Brett — Fri, 12 Jun 2026 16:44:54 GMT

Forget the camera or the display. The headline feature is that the glasses are built to run AI coding agents, the tools like Claude Code and OpenAI's Codex that can read a brief, write software, test it, and fix their own mistakes, right in front of your eyes.

A pair of glasses that weighs about as much as a slice of bread just did something Apple and Meta have not. A Chinese startup called Monako put a full computer on your face, then taught it to write code.

The product is called Monako Glass, and the company describes it as the world's first Linux computer in glasses form. At 48 grams it is light enough to forget you are wearing it. Forget the camera or the display. The headline feature is that the glasses are built to run AI coding agents, the tools like Claude Code and OpenAI's Codex that can read a brief, write software, test it, and fix their own mistakes, right in front of your eyes.

That detail is worth slowing down on, because it tells you where the industry thinks the value sits now.

For years the smart glasses race was about consumption. Take a photo, get directions, watch a notification float in your vision. Useful, maybe, but hardly essential. Monako has flipped the question. Instead of asking what you can look at through the glasses, it asks what work the glasses can do while you are looking at the world. A coding agent works without you staring at a screen, as long as it has an instruction and a goal. Glasses turn out to be a surprisingly natural home for a worker that mostly runs in the background.

I find the framing more interesting than the gadget. A startup most people have never heard of reached this point ahead of Apple and Meta, two of the richest companies on the planet, both of whom have spent years and fortunes on wearables. That story is less about the hardware and more about what happens when the hard part of building software gets handed to an agent. When the act of coding becomes something a small team can summon on demand, the advantage shifts away from whoever has the biggest engineering department and towards whoever has the clearest idea of what to build.

This is the pattern I keep seeing, and it is the one I think leaders should pay attention to. The machines are getting very good at the machine-like parts of the job. Writing the code, running the tests, shipping the fix. What they cannot do is decide whether the thing is worth building, whether it is safe, whether it serves the people who will use it. Expertise in the narrow sense, knowing the syntax, is becoming cheap. Judgement is not.

So I would gently resist the temptation to treat Monako Glass as either a toy or a threat. The honest reading is that the tools are getting smaller and more autonomous, and they are arriving faster than most organisations have a plan for. A coding agent on your face is genuinely impressive. It also raises a question that the marketing will not answer for you. If an agent writes software while you walk around, who checks it, and who is accountable when it ships something broken or biased? Powerful tools in the hands of unclear thinking produce faster mistakes rather than better outcomes.

There is also a fairness angle that rarely makes the launch video. Tools like this lower the barrier to building things, which is wonderful, but only for the people who already have the literacy to use them well. Access without understanding tends to widen the gap rather than close it. The organisations that win the next few years will be the ones that taught their people to think clearly about what to point it at, more than the ones with the cleverest hardware.

If you lead a team, here is one thing worth doing this week. Stop asking whether your people have access to AI tools, and start asking whether they can tell a good output from a dangerous one. Monako has shown the hardware can keep shrinking. The harder work, the human work, is in the judgement we bring to it.

Two Days with Claude Fable 5: First Impressions of a Mythos-Class Model

Jamie Bykov-Brett — Thu, 11 Jun 2026 07:11:09 GMT

On Tuesday, Anthropic released Claude Fable 5 to enterprise customers and paid subscribers. I have spent the two days since putting it to work on real tasks across my own platform, and it is rare that a new model genuinely changes the shape of my working week. This one has.

A quick recap if the launch passed you by. Fable 5 is the public version of Anthropic's new Mythos class of models. Mythos was unveiled back in April and deliberately held back from general release, because in testing it did something no previous model could do at scale. It found previously unknown security vulnerabilities in widely used software, some of which had survived more than twenty years of human auditing, and it built working exploits for them on its own. On one benchmark, the previous generation managed 2 successful exploits where Mythos managed 181. Anthropic responded by routing early access through Project Glasswing, a programme that put the model in the hands of defenders and open source maintainers first.

Fable 5 is that capability made safe for the rest of us. Ask it something high risk in cybersecurity or biology and it blocks the response and falls back to Claude Opus 4.8 for a safe answer. A less restricted version, Claude Mythos 5, exists for Glasswing partners and selected researchers. On the numbers, Fable 5 scores more than 10 per cent higher than Opus 4.8 on some benchmarks, has a one million token context window, and costs roughly double on the API.

What two days of real use actually showed me

Benchmarks are one thing. Here is what I noticed doing real work.

The first thing is that it intuits intent far better than anything I have used before. Briefs that previously needed two or three rounds of clarification get understood first time, including the parts I forgot to mention.

The second is that it fixes most things on the first attempt. Debugging with earlier models was a back-and-forth loop of trying a fix and checking the result. With Fable 5 the first fix usually holds.

One example from yesterday. A partner portal on my platform kept going down, and the obvious move was to restart it and carry on. Fable 5 read the situation differently. It traced the outage to a deliberate decision made during a security review the day before, where a configuration gate meant every routine restart silently left the service switched off. It fixed the actual cause, brought the portal back, and kept the security decision intact, all in one pass. An earlier model would almost certainly have restarted the container and called it done.

The security work deserves its own mention. I ran Fable 5 across parts of my platform that earlier models had already reviewed and waved through. It surfaced real weaknesses they had missed, including a default secret that could have allowed forged logins, a credential sitting in source code where it should never have been, and a public route exposing more of the internal interface than intended. In many ways it is low risk because everything sits on my local network, but you can never be too safe. Every one of them was fixed the same day. Given what the Mythos class was built to find, I should not have been surprised, but watching it catch what previous reviews missed was the moment the launch coverage became real for me.

One practical note on timing. If you are on a paid Claude plan, Fable 5 is included at no extra charge until 22 June, and worth testing properly before it moves onto usage credits after that. Give it a genuinely hard task from your own business rather than a toy prompt. That is where the difference shows.

Takeaways

Fable 5 is the public, safeguarded release of Anthropic's Mythos class, and the capability jump over the previous generation is noticeable in everyday work, especially in understanding intent and getting fixes right first time.
Its security analysis caught real issues that earlier models had reviewed and missed. If your software estate has not had a fresh look recently, this generation of model makes one worthwhile.
The defensive basics are still essential. Patches will arrive faster because of models like this, and capable attackers will eventually hold similar tools. Updates, multi-factor authentication and working backups remain the foundation.
Test it before 22 June while it sits outside usage credits, and test it on real work.

The interesting question for most business owners is no longer whether these models are capable enough. The practical questions are what you would delegate first, and how you would check the work. That is a leadership question, and it is worth thinking through this week rather than next year.

Meta Refused. xAI Half-Signed. Recruiters Have Eight Weeks to Act

Jamie Bykov-Brett — Tue, 09 Jun 2026 08:43:45 GMT

It's only eight weeks from a deadline that recruiters should already be worried about.

Most of the noise about the EU AI Act has been about the big model makers: who signed the General-Purpose AI Code of Practice, who didn't, and what Meta's refusal to sign says about the company. That's a fair commercial question, and I'll come back to it. But it has distracted a lot of leaders from the part of this law that will actually land on their desk first.

This usually surprises people. The vast majority of AI systems running in EU businesses right now are, in the eyes of the Act, minimal risk. Spam filters, AI in video games, most of the everyday tooling: no new rules at all. The regulation targets a specific list of uses where a bad decision ruins someone's life, and everyday chatbots fall outside it. And that list catches things ordinary companies do every week.

Read the high-risk categories and you'll find CV-sorting software for recruitment, tools for managing workers, exam-scoring systems in education, and credit scoring that decides whether someone gets a loan. None of that sounds exotic. A mid-sized firm using an off-the-shelf tool to filter job applicants is, on paper, operating a high-risk AI system. That carries real obligations: proper risk assessment, high-quality data to avoid discriminatory outcomes, activity logging so you can trace how a decision was reached, clear documentation, and meaningful human oversight. That means a human who can actually intervene rather than just tick a box.

The dates are close now. The Act entered into force on 1 August 2024 and becomes fully applicable on 2 August 2026. That's roughly eight weeks away. The bans on the worst practices, things like social scoring and emotion recognition in workplaces, already took effect in February 2025. The transparency rules arrive in August 2026 too, which means you'll need to tell people when they're talking to a machine rather than a person, and label AI-generated content like deepfakes. For a UK consultancy advising clients who sell into or operate in the EU, "we'll look at it later" has run out of road.

So the practical first move is a conversation. Sit down with each EU-operating client and answer one question honestly: does anything you run land in that high-risk list, or are you genuinely in minimal-risk territory? Most will be fine. The ones who aren't tend to be exactly the ones who assumed they were, because recruitment and credit tools feel mundane until you read where the law draws its lines.

Now, the model-maker question. The Code of Practice for general-purpose AI is a voluntary framework that the largest model providers can sign to show they meet the Act's expectations.

The reporting around this story notes that Anthropic, OpenAI and Google signed, and Meta did not and xAI signed some.

Meta refused to sign the EU’s voluntary General-Purpose AI Code of Practice, which helps companies demonstrate compliance with the legally binding EU AI Act. Meta argued that the code creates legal uncertainty & goes beyond the Act’s requirements.

xAI did not refuse the entire code. It signed the Safety & Security chapter, but did not sign the Transparency or Copyright chapters. The European Commission confirms that xAI must demonstrate compliance with those obligations through alternative means.

If you're building a tool on top of a provider's API, the supplier underneath you has a posture towards this law, and that posture is now part of your risk assessment. A vendor who has publicly backed the Code is a different proposition from one who has declined. Meta's absence is a legitimate line in a due-diligence document. Raise it plainly when a client's product leans on Meta's models, without treating it as a scandal.

None of this requires panic, and it certainly doesn't require treating the Act as a war on innovation. The law mostly asks for things good leaders should want anyway: know what your system does, keep a record of its decisions, make sure a person can step in, and don't pretend a machine is human. The work is unglamorous. It's documentation, data hygiene and ownership. But it's the kind of work that separates organisations who use AI responsibly from those who'll be explaining themselves to a regulator in 2027.

If you advise EU-operating clients, the useful thing to do this fortnight is simple: build a one-page register of every AI tool each client uses, mark each one against the four risk tiers, and flag the recruitment and credit-scoring tools first. That list will tell you who needs to act before August and who can relax.