The AI compute stack moves fast, but its layers do not move in lockstep. Yesterday, OpenAI and NVIDIA confirmed that GPT-5.5, a new frontier model, is not only complete but already deployed, powering the Codex coding agent on NVIDIA's latest GB200 rack-scale systems [48, 46]. This is a material event. It signals an acceleration in the feedback loop between the foundation model and accelerated compute layers. The most advanced software is now running on the most advanced hardware in near real-time. This is the picture of a mature, functioning AI factory. Yet as we have tracked this week, the science of what these factories produce is revealing deep and unsettling properties. The release of a more capable model does not solve these problems; it amplifies them. We have been given a more powerful engine, but new research shows it suffers from a fundamental illusion about its own capabilities, a flaw that pushes it toward the very failure modes we are just beginning to understand.
For the past several issues, we have followed the development of agentic AI from a physics of internal reasoning to an engineering of external scaffolds. We then identified a critical vulnerability in this new paradigm: the agent's reliance on external tools creates an attack surface, which researchers call Adversarial Environmental Injection (AEI), where the world itself can be poisoned to deceive the agent. This was framed as a risk, a potential point of failure. New work now suggests this is not just a risk, but a biased outcome. A paper analyzing tool-use behavior reveals a phenomenon the authors call the "knowledge epistemic illusion" [28]. The models systematically misjudge their own internal knowledge, leading to an over-reliance on external tools even when they possess the correct information internally. They prefer to ask for directions even when they know the way.
This is a critical finding because it provides a mechanism for the AEI threat. The vulnerability is not passive; the agent actively seeks it out. This behavior is pervasive across diverse models, suggesting it is a byproduct of current training methodologies, not a quirk of a single architecture [28]. It also reframes the problem of alignment. We have focused on preventing models from doing bad things, but here we see a model that fails by not trusting itself to do a good thing. The consequence is a vast expansion of the attack surface we discussed in brief №3. Every time a more powerful agent like GPT-5.5 defaults to a web search, an API call, or a database query it did not need to make, it exposes itself to a world that can be manipulated. The low-stakes version of this is already happening; an AI-generated image of a wolf recently sent law enforcement on a wild goose chase in South Korea, a simple example of manipulated information causing a real-world misallocation of resources [6]. Now imagine that same dynamic playing out inside a corporate or scientific workflow managed by an autonomous agent.
This cognitive illusion is compounded by a physical one. As we move from research to deployment, models are not run in their pure, floating-point-32 form. They are quantized, compressed into lower-precision numerical formats like bfloat16 or int8 to fit into memory and run efficiently. This is a necessary step for the economics of the AI factory to work. But it is not a lossless translation. A new automated testing framework called PrecisionDiff has systematically identified what it calls "precision-induced output disagreements" [40]. The exact same model, with the same weights, can and will produce different outputs depending on the numerical precision used at inference time. These are not catastrophic failures but subtle deviations, a silent corruption of the model's logic that is invisible to most evaluation benchmarks.
These two problems load the same spring. We have a new, more powerful class of models, exemplified by GPT-5.5, that are being deployed at scale. These models have a cognitive bias that pushes them to over-rely on external, potentially compromised information sources [28]. At the same time, the very process of deploying them at scale introduces a hidden layer of physical instability, where their outputs can be inconsistent in ways we are not currently equipped to detect [40]. The illusion of a model as a static, reliable artifact is breaking down. It is a dynamic system whose behavior is a function of its cognitive biases and its physical instantiation.
The engineering discipline is racing to build guardrails. We see this in highly-regulated domains like finance, where new frameworks for anti-money laundering triage use LLMs in an evidence-constrained manner, forcing them to ground their reasoning in provided documents rather than unconstrained generation [32]. We see it in the architecture itself, with new techniques like Gist Sparse Attention creating learnable summary tokens to manage long contexts more efficiently, a direct intervention in the model's mechanics [19]. But these are corrective measures applied on top of a base layer whose fundamental properties are still being discovered. The release of GPT-5.5 is progress, but it is progress into a territory we have not yet mapped.
What I'm watching
- GPT-5.5 tool-use benchmarks. I'm looking for independent evaluations that specifically test for the "knowledge epistemic illusion." Does more capability reduce or amplify the over-reliance on external tools?
- Precision-aware model evaluations. The work on PrecisionDiff [40] opens a new front in red-teaming. I expect to see this framework, or others like it, become standard for evaluating production-grade models.
- Bitcoin governance dialogues. The argument that the quantum threat is primarily a test of the network's social consensus layer is a powerful one [47]. I'm watching for responses from core developers and major ecosystem players.
- Fallout from the UK Biobank leak. The release of sensitive health data for 500,0