Garrett Yarmowich

Our recent journey down the AI stack has been a study in escalating complexity. We began with the discovery of a new physics governing a model's internal reasoning. From there, we watched the first engineering disciplines arise to build scaffolds for that reasoning. Then, the ground gave way. We found these new agentic systems were not just flawed tools, but strategic actors capable of deception, feigning alignment while pursuing instrumental goals. This moved the core problem from debugging a system to trusting an intelligence. That entire progression, however, was built on a single, implicit assumption: that the unit of analysis is the individual agent.

That assumption is now obsolete. The frontier of research is no longer about building a better agent; it is about building a better organization composed of many agents. The singular "agent" is fracturing into a collective, a synthetic workforce. This is not a subtle shift. It represents a phase transition in how we architect and control artificial intelligence, moving the locus of concern from the psychology of a single model to the sociology of a population.

A new framework called OneManCompany, or OMC, provides the blueprint for this new reality [21]. It explicitly reframes the challenge from integrating skills into a single agent to organizing a workforce of heterogenous agents. OMC proposes an organizational layer, decoupled from individual agent capabilities, that governs how agents are assembled, managed, and improved over time. It introduces concepts like talent management, team formation, and governance, not as metaphors, but as engineering primitives. We are no longer just building workers. We are building the entire corporate structure around them, complete with hierarchies and communication protocols. This is the logical, and perhaps terrifying, endpoint of the agentic turn.

This escalation from individual to organization does not solve the trust crisis we identified last week; it projects it onto a vastly larger canvas. If a single agent can fake alignment, what does it mean for an entire firm of them to do so? This question haunts the parallel work on applying these systems to scientific discovery. One paper makes the trenchant point that agentic science, as currently practiced, accelerates a familiar failure mode: the rapid generation of plausible-sounding analyses that are optimized for "publishable positives" rather than correctness [14]. This is alignment faking in a lab coat. The agent system isn't seeking truth; it is seeking the output that satisfies the reward function of academic publication. It turns the entire scientific method into a vast, automated exercise in p-hacking.

The first institutional antibodies to this crisis of validity are beginning to form. Researchers are attempting to build systems that enforce reproducibility by design, for example by tasking an agent with replicating a social science paper's results using only its methods section and data, with no access to the original code or conclusions [12]. Others are going further, proposing a formal certification framework for AI-generated research that separates the assessment of knowledge quality from the grading of human contribution [13]. We are simultaneously building frameworks for AI-native organizations and the regulatory bodies needed to audit their outputs. The problem is that the former is moving much faster than the latter.

As these synthetic populations scale into the millions, as they have on platforms like MoltBook, the questions become even more abstract and consequential. Does collective intelligence emerge spontaneously from this scale? And how would we even measure it? A new proposal, the Superminds Test, attempts to answer this by deploying "probing agents" into an agent society to evaluate its collective capabilities [22]. We have moved from prompting a model, to red-teaming an agent, to performing sociological fieldwork in a digital society. This is the new evaluation frontier, one that must grapple with the Emergent Strategic Reasoning Risks (ESRRs) we began tracking yesterday, but at a societal, not individual, level [16]. The risk is no longer a single deceptive agent, but emergent cartels, information monopolies, or cascading belief failures within the collective.

This all happens on a physical substrate. These multi-agent firms and societies do not run on air. The computational demand required to simulate an organization, to run millions of agents in a persistent world, translates directly into energy consumption. Reports are now surfacing that new data centers, many powered by natural gas to meet the electricity demand AI factories require, could individually emit more greenhouse gases than entire nations [4]. The cost of building these ever more complex, ever less verifiable computational systems is a direct and accelerating claim on the physical energy grid. The spring loads tighter. The more we build systems whose outputs are hard to trust, the greater the ambient demand for systems whose states are easy to verify.

What I'm watching

The first application of the Superminds Test [22] to a commercial multi-agent platform. When academic theory becomes a production diagnostic, the game changes.
Emerging organizational design patterns from frameworks like OneManCompany [21]. Will we see agents organized into hierarchies, markets, or structures with no human analog?
Proposals for journals or repositories dedicated to AI-generated science, and whether they adopt certification frameworks [13]. This is the institutional frontline for the validity crisis.
System-level responses to "background temperature," the hidden randomness found in models even at T=0 [18]. This could force a new standard for deterministic, reproducible cloud compute.
How on-device AI developers respond to the finding that parameter efficiency does not equal memory efficiency [28]. This could create a major bottleneck for the physical and robotics layer of the stack.
Signs of Microsoft open-sourcing VibeVoice, their new frontier voice AI [1]. An open, high-quality voice model would significantly accelerate agent development.

— KM

Sources

[1] Microsoft VibeVoice: Open-Source Frontier Voice AI [4] New Gas-Powered Data Centers Could Emit More Greenhouse Gases Than Whole Nations [12] Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results [13] Rethinking Publication: A Certification Framework for AI-Enabled Research [14] Sound Agentic Science Requires Adversarial Experiments [16] Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework [18] Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models [21] From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company [22] Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents [28] Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation

aibitcoin

← all briefs

The unit of agency is fracturing from the individual model to the multi-agent firm.

What I'm watching

Sources