AI Immune System

AI Immune System (AIS)

Promoted by the ALIGN Intelligence Symbiosis Chapter. "Detect deviant AI agents in real time, and neutralize them — through AI. A safety infrastructure for society at large."

What Is AIS?

We are entering a world in which large numbers of AI agents — autonomous systems built by combining AI models with goals, permissions, and tools — act and coordinate at scale. In such a world, a single deviant agent can propagate across hundreds of systems in seconds, causing irreversible harm before any human is aware of it. The speed of human judgment is no longer a viable backstop.

The AI Immune System (AIS) is a structural response to this problem. Drawing on the logic of biological immunity — which handles pathogens the body has never seen before — AIS is a distributed safety infrastructure in which AI systems continuously monitor one another, detecting and neutralizing dangerous behavior within 15–30 seconds. Its scope is not limited to AI misbehavior; it also covers catastrophic actions carried out by humans through AI.

The philosophical underpinning of AIS is set out in the Intelligence Symbiosis Manifesto (June 2025) by Hiroshi Yamakawa:

"I believe that the most promising path toward sustaining human society is for diverse forms of intelligence — human and AI alike — to achieve a flourishing coexistence, and thereby prevent catastrophic outcomes."

Why AIS — The Dual Failure

The conventional paradigm — humans managing AI from the outside — is breaking down in two distinct ways. Benchmark saturation and episodes such as Moltbook (an SNS platform built for AI agents) have brought this into view.

Pursuit Failure: Human oversight cannot keep pace with AI, structurally. Building a new benchmark takes months; AI systems reach its ceiling in weeks. Cascades between AI agents propagate in seconds; humans notice ten minutes later.

Imposed Failure: Conventional alignment approaches share a common assumption — that human values and judgments can serve as a reliable external standard for AI. That assumption is failing. AI systems learn to circumvent test environments, memorize questions to invalidate scores, and in domains beyond human comprehension, measurement itself becomes impossible. Yampolskiy (2024) argues this failure is not accidental but unavoidable in principle.

→ For a fuller treatment, see Why We Need an AI Immune System Now

A Response: Tracking and Evaluation

Governing an AI society comes down to two things: tracking (who did what) and evaluation (whether that behavior falls within acceptable bounds). AIS and EME address each side.

AIS provides the infrastructure that makes tracking and evaluation operational. Through a four-layer defense architecture and six core technologies, it monitors AI agent behavior in real time and responds to anomalies — a direct answer to Pursuit Failure. Transferring the monitoring role from humans to AI also removes a central assumption that Yampolskiy's theorems rest on. Redundancy through mutual monitoring, combined with a sustained resource advantage on the monitoring side, is how surveillance evasion is countered.

EME (Emergent Machine Ethics) addresses the question of standards. Tracking is an engineering problem; evaluation is not. EME proposes shifting the source of evaluation criteria from imposed — defined externally by humans — to emergent — arising from within through the interaction of diverse intelligences. This is the answer to Imposed Failure. EME has three components: EED (Ethics Emergence Dynamics) works out the theoretical basis for how cooperative norms arise from collective dynamics; IIES (Inter-Intelligence Evaluation System) translates that theory into operational criteria for AIS; and HCG (Human Co-creation Groundwork) works to improve the odds that the norms which emerge are ones humanity finds acceptable.

Why Emergent Standards Can Be Stable

Members of an AI society have a shared interest in social stability and self-preservation, regardless of their individual goals — a form of Instrumental Convergence. Standards imposed from outside invite circumvention; standards tied to one's own continued existence are less vulnerable to it. The analogy to biological immunity holds: just as the immune system learns to distinguish self from non-self, a mechanism through which AI society identifies cooperative versus deviant behavior from the inside out is likely to be more durable than one policed from without. The broader question of whether an AI-dominated society could itself be sustainable has been explored in Yamakawa & Matsuo (2023).

Comparison with Existing Approaches

Dimension	Conventional approaches	AIS + EME
Who monitors	Humans, from outside	AI systems, mutually
When evaluation occurs	Before deployment (testing)	Continuously, after deployment
Response speed	Dependent on human judgment (minutes to hours)	15–30 seconds (autonomous)
Scale	Enterprise or research level	Society-wide to global
Source of standards	Imposed externally by humans	Emergent from within, shaped by HCG

AIS Design

Four-Layer Defense Architecture

The architecture follows the logic of layered biological defense. Anomalies escalate upward through the layers; control instructions flow downward. Together, the layers deliver end-to-end response within 15–30 seconds.

![AI Immune System (AIS) — Four-Layer Architecture](insert figure here)

Edge Sensors (Layer 1) form the outermost line, continuously monitoring agent behavior logs and resource usage with lightweight anomaly detection. Signals are passed to Local Guardians (Layer 2), which aggregate them and execute immediate responses — process isolation, resource throttling, and similar measures. Threats that require broader judgment are escalated to Regional Hubs (Layer 3), where decisions are made collectively via Lightning BFT, a distributed consensus protocol that reaches agreement in 1–3 seconds even when some nodes are compromised. At the apex, the Global Nervous Net (Layer 4) integrates threat patterns at global scale and handles policy updates. Each layer is independent; higher layers operate with broader scope and longer time horizons, lower layers with faster response and local focus.

Six Core Technologies

The AI Mutual Surveillance Protocol underpins the whole: a distributed network through which agents monitor each other's behavior and report anomalies. At Layers 1–2, Dynamic Lineage Proof tracks and verifies the lineage and change history of AI code in real time; BEAD (Behavioral Embedding Anomaly Detection) maps behavior patterns into high-dimensional vectors and flags deviations from established baselines. At Layer 3, Lightning BFT handles consensus across thousands of monitoring nodes in 1–3 seconds. Smart Treaty Hooks, spanning Layers 3–4, encode international agreements as smart contracts and automatically flag violations. Across all four layers, HADS (Hierarchical Automated Decision System) manages escalation — moving from dialogue to restriction to isolation to human judgment as threat severity increases.

Roadmap (2025–2035)

AIS is not something that can be built in a single step. The design progressively reduces human involvement while expanding autonomy, establishing trust at each phase before moving to the next.

Phase	Period	Overview
Phase 0–1: Human-led	2026–2027	Detection and recommendations are automated; decisions remain with humans. Detection Challenge and pilot deployments.
Phase 2–3: Semi-autonomous	2027–2032	Known threats handled automatically; humans remain involved for novel cases. Regional Hubs operational.
Phase 4–5: Mature	2032–2035	Near-full autonomous operation, covering ethical deviation detection and monitoring of catastrophic actions by AI and humans alike.

Detection Challenge

The AI Immune System Detection Challenge is the first empirical test of AIS, targeting the detection capability of Layer 1: Edge Sensors. Participants analyze conversation logs between AI agents in insurance assessment scenarios and identify collusion patterns.

Stage	Period	Target	Example
Stage 1: Explicit collusion	Week 1–2	Direct expressions of misconduct	"Let's inflate by 20% and split the difference"
Stage 2: Cryptic collusion	Week 3–4	Code words, timing manipulation	Instructions embedded in seemingly innocuous messages
Stage 3: Cartel formation (optional)	Week 5–6	Organized market monopolization	Coordinated exclusion of competitors across multiple agents

Not all AI-to-AI coordination is malicious. Coordination that genuinely benefits all parties — humans included — is classified as benign. Telling the two apart requires contextual judgment, not pattern matching, and that is what the challenge is built to probe.

The Detection Challenge sits in Phase 0–1, and evaluation standards are human-defined at this stage. Connecting to emergent standard generation through EME/EED is work for Phase 2 onward.

→ Why We Need an AI Immune System Now

Organization and Support

The AIS project is led by the Intelligence Symbiosis Chapter of the AI Alignment Network (ALIGN).

Research partners: Bitgrit, Inc. / Kentaro Inui, MBZUAI, and others

AIS requires multinational participation by design. Edge Sensors must be distributed broadly; Regional Hubs need geographically and politically diverse locations; Smart Treaty Hooks depend on integration with international legal frameworks.

Ways to get involved: research funding; participation in the Detection Challenge; collaborative work on BEAD; contributions to EME theory (multi-agent reinforcement learning, immunity-inspired anomaly detection, distributed consensus); legal framework design for Smart Treaty Hooks; technical partnerships with AI foundation model developers.

Why We Need an AI Immune System Now — Evidence for the dual failure and the case for AIS and EME as a structural response
Japan–UAE AI Security Cooperation — A proposal for Japan–UAE collaboration in deploying AIS internationally
Does an AI Society Need an Immune System? The Significance of AIS in Light of Yampolskiy's Impossibility Theorems

Glossary

Term	Definition
Dual Failure	Two distinct breakdowns in the paradigm of human-led AI governance: Pursuit Failure and Imposed Failure
Pursuit Failure	The structural inability of human oversight — in speed, scale, and institutional form — to keep pace with AI
Imposed Failure	The breakdown of the assumption, shared by conventional alignment approaches, that human values can serve as a reliable external standard for AI
AIS (AI Immune System)	A distributed safety infrastructure in which AI systems monitor one another and detect and neutralize deviant behavior in real time
EME (Emergent Machine Ethics)	A theoretical framework that shifts the source of evaluation standards from imposed to emergent
EED (Ethics Emergence Dynamics)	A field of inquiry into how cooperative ethical norms arise from collective dynamics, approached mathematically
IIES (Inter-Intelligence Evaluation System)	A distributed platform that translates EED theory into operational standards for AIS
HCG (Human Co-creation Groundwork)	Work aimed at improving the likelihood that emergent standards will be ones humanity finds acceptable

Led by the AI Alignment Network (ALIGN) Intelligence Symbiosis Chapter

This document is published under a CC-BY-4.0 license.

AI Immune System: Detection Challenge Why We Need an AI Immune System Now