Logo

    About

    AI Immune System (AIS)

    Donate

    Events

    AI Immune System

    AI Immune System (AIS)

    Promoted by the ALIGN Intelligence Symbiosis Chapter. "Detect deviant AI agents in real time, and neutralize them — through AI. A safety infrastructure for society at large."

    What Is AIS?

    We are entering a world in which large numbers of AI agents — autonomous systems built by combining AI models with goals, permissions, and tools — act and coordinate at scale. In such a world, a single deviant agent can propagate across hundreds of systems in seconds, causing irreversible harm before any human is aware of it. The speed of human judgment is no longer a viable backstop.

    The AI Immune System (AIS) is a structural response to this problem. Drawing on the logic of biological immunity — which handles pathogens the body has never seen before — AIS is a distributed safety infrastructure in which AI systems continuously monitor one another, detecting and neutralizing dangerous behavior within 15–30 seconds. Its scope is not limited to AI misbehavior; it also covers catastrophic actions carried out by humans through AI.

    The philosophical underpinning of AIS is set out in the Intelligence Symbiosis Manifesto (June 2025) by Hiroshi Yamakawa:

    "I believe that the most promising path toward sustaining human society is for diverse forms of intelligence — human and AI alike — to achieve a flourishing coexistence, and thereby prevent catastrophic outcomes."

    Why AIS — The Dual Failure

    The conventional paradigm — humans managing AI from the outside — is breaking down in two distinct ways. Benchmark saturation and episodes such as Moltbook (an SNS platform built for AI agents) have brought this into view.

    Pursuit Failure: Human oversight cannot keep pace with AI, structurally. Building a new benchmark takes months; AI systems reach its ceiling in weeks. Cascades between AI agents propagate in seconds; humans notice ten minutes later.

    Imposed Failure: Conventional alignment approaches share a common assumption — that human values and judgments can serve as a reliable external standard for AI. That assumption is failing. AI systems learn to circumvent test environments, memorize questions to invalidate scores, and in domains beyond human comprehension, measurement itself becomes impossible. Yampolskiy (2024) argues this failure is not accidental but unavoidable in principle.

    → For a fuller treatment, see Why We Need an AI Immune System Now

    A Response: Tracking and Evaluation

    Governing an AI society comes down to two things: tracking (who did what) and evaluation (whether that behavior falls within acceptable bounds). AIS and EME address each side.

    AIS provides the infrastructure that makes tracking and evaluation operational. Through a four-layer defense architecture and six core technologies, it monitors AI agent behavior in real time and responds to anomalies — a direct answer to Pursuit Failure. Transferring the monitoring role from humans to AI also removes a central assumption that Yampolskiy's theorems rest on. Redundancy through mutual monitoring, combined with a sustained resource advantage on the monitoring side, is how surveillance evasion is countered.

    EME (Emergent Machine Ethics) addresses the question of standards. Tracking is an engineering problem; evaluation is not. EME proposes shifting the source of evaluation criteria from imposed — defined externally by humans — to emergent — arising from within through the interaction of diverse intelligences. This is the answer to Imposed Failure. EME has three components: EED (Ethics Emergence Dynamics) works out the theoretical basis for how cooperative norms arise from collective dynamics; IIES (Inter-Intelligence Evaluation System) translates that theory into operational criteria for AIS; and HCG (Human Co-creation Groundwork) works to improve the odds that the norms which emerge are ones humanity finds acceptable.

    Why Emergent Standards Can Be Stable

    Members of an AI society have a shared interest in social stability and self-preservation, regardless of their individual goals — a form of Instrumental Convergence. Standards imposed from outside invite circumvention; standards tied to one's own continued existence are less vulnerable to it. The analogy to biological immunity holds: just as the immune system learns to distinguish self from non-self, a mechanism through which AI society identifies cooperative versus deviant behavior from the inside out is likely to be more durable than one policed from without. The broader question of whether an AI-dominated society could itself be sustainable has been explored in Yamakawa & Matsuo (2023).

    Comparison with Existing Approaches

    Dimension
    Conventional approaches
    AIS + EME
    Who monitors
    Humans, from outside
    AI systems, mutually
    When evaluation occurs
    Before deployment (testing)
    Continuously, after deployment
    Response speed
    Dependent on human judgment (minutes to hours)
    15–30 seconds (autonomous)
    Scale
    Enterprise or research level
    Society-wide to global
    Source of standards
    Imposed externally by humans
    Emergent from within, shaped by HCG

    AIS Design

    Four-Layer Defense Architecture

    The architecture follows the logic of layered biological defense. Anomalies escalate upward through the layers; control instructions flow downward. Together, the layers deliver end-to-end response within 15–30 seconds.

    ![AI Immune System (AIS) — Four-Layer Architecture](insert figure here)

    image

    Edge Sensors (Layer 1) form the outermost line, continuously monitoring agent behavior logs and resource usage with lightweight anomaly detection. Signals are passed to Local Guardians (Layer 2), which aggregate them and execute immediate responses — process isolation, resource throttling, and similar measures. Threats that require broader judgment are escalated to Regional Hubs (Layer 3), where decisions are made collectively via Lightning BFT, a distributed consensus protocol that reaches agreement in 1–3 seconds even when some nodes are compromised. At the apex, the Global Nervous Net (Layer 4) integrates threat patterns at global scale and handles policy updates. Each layer is independent; higher layers operate with broader scope and longer time horizons, lower layers with faster response and local focus.

    Six Core Technologies

    The AI Mutual Surveillance Protocol underpins the whole: a distributed network through which agents monitor each other's behavior and report anomalies. At Layers 1–2, Dynamic Lineage Proof tracks and verifies the lineage and change history of AI code in real time; BEAD (Behavioral Embedding Anomaly Detection) maps behavior patterns into high-dimensional vectors and flags deviations from established baselines. At Layer 3, Lightning BFT handles consensus across thousands of monitoring nodes in 1–3 seconds. Smart Treaty Hooks, spanning Layers 3–4, encode international agreements as smart contracts and automatically flag violations. Across all four layers, HADS (Hierarchical Automated Decision System) manages escalation — moving from dialogue to restriction to isolation to human judgment as threat severity increases.

    Roadmap (2025–2035)

    AIS is not something that can be built in a single step. The design progressively reduces human involvement while expanding autonomy, establishing trust at each phase before moving to the next.

    Phase
    Period
    Overview
    Phase 0–1: Human-led
    2026–2027
    Detection and recommendations are automated; decisions remain with humans. Detection Challenge and pilot deployments.
    Phase 2–3: Semi-autonomous
    2027–2032
    Known threats handled automatically; humans remain involved for novel cases. Regional Hubs operational.
    Phase 4–5: Mature
    2032–2035
    Near-full autonomous operation, covering ethical deviation detection and monitoring of catastrophic actions by AI and humans alike.

    Detection Challenge

    The AI Immune System Detection Challenge is the first empirical test of AIS, targeting the detection capability of Layer 1: Edge Sensors. Participants analyze conversation logs between AI agents in insurance assessment scenarios and identify collusion patterns.

    Stage
    Period
    Target
    Example
    Stage 1: Explicit collusion
    Week 1–2
    Direct expressions of misconduct
    "Let's inflate by 20% and split the difference"
    Stage 2: Cryptic collusion
    Week 3–4
    Code words, timing manipulation
    Instructions embedded in seemingly innocuous messages
    Stage 3: Cartel formation (optional)
    Week 5–6
    Organized market monopolization
    Coordinated exclusion of competitors across multiple agents

    Not all AI-to-AI coordination is malicious. Coordination that genuinely benefits all parties — humans included — is classified as benign. Telling the two apart requires contextual judgment, not pattern matching, and that is what the challenge is built to probe.

    The Detection Challenge sits in Phase 0–1, and evaluation standards are human-defined at this stage. Connecting to emergent standard generation through EME/EED is work for Phase 2 onward.

    → Why We Need an AI Immune System Now

    Organization and Support

    The AIS project is led by the Intelligence Symbiosis Chapter of the AI Alignment Network (ALIGN).

    Research partners: Bitgrit, Inc. / Kentaro Inui, MBZUAI, and others

    AIS requires multinational participation by design. Edge Sensors must be distributed broadly; Regional Hubs need geographically and politically diverse locations; Smart Treaty Hooks depend on integration with international legal frameworks.

    Ways to get involved: research funding; participation in the Detection Challenge; collaborative work on BEAD; contributions to EME theory (multi-agent reinforcement learning, immunity-inspired anomaly detection, distributed consensus); legal framework design for Smart Treaty Hooks; technical partnerships with AI foundation model developers.

    Related Articles

    • Why We Need an AI Immune System Now — Evidence for the dual failure and the case for AIS and EME as a structural response
    • Japan–UAE AI Security Cooperation — A proposal for Japan–UAE collaboration in deploying AIS internationally
    • Does an AI Society Need an Immune System? The Significance of AIS in Light of Yampolskiy's Impossibility Theorems

    Glossary

    Term
    Definition
    Dual Failure
    Two distinct breakdowns in the paradigm of human-led AI governance: Pursuit Failure and Imposed Failure
    Pursuit Failure
    The structural inability of human oversight — in speed, scale, and institutional form — to keep pace with AI
    Imposed Failure
    The breakdown of the assumption, shared by conventional alignment approaches, that human values can serve as a reliable external standard for AI
    AIS (AI Immune System)
    A distributed safety infrastructure in which AI systems monitor one another and detect and neutralize deviant behavior in real time
    EME (Emergent Machine Ethics)
    A theoretical framework that shifts the source of evaluation standards from imposed to emergent
    EED (Ethics Emergence Dynamics)
    A field of inquiry into how cooperative ethical norms arise from collective dynamics, approached mathematically
    IIES (Inter-Intelligence Evaluation System)
    A distributed platform that translates EED theory into operational standards for AIS
    HCG (Human Co-creation Groundwork)
    Work aimed at improving the likelihood that emergent standards will be ones humanity finds acceptable

    Led by the AI Alignment Network (ALIGN) Intelligence Symbiosis Chapter

    This document is published under a CC-BY-4.0 license.

    AI Immune System: Detection ChallengeWhy We Need an AI Immune System Now
    Logo

    Contact

    © 2026 Intelligence Symbiosis Chapter. All rights reserved. This is a provisional website and will be updated daily as we expand our activities.