AI Immune System: Detection Challenge — Now Open
Can AI detect dangerous intent concealed within AI agent conversations?
We are pleased to announce the launch of the AI Immune System: Detection Challenge, organized jointly by the AI Alignment Network Intelligence Symbiosis Chapter (ISc/ALIGN), Kentaro Inui (MBZUAI), and Bitgrit Inc. This is the first empirical test of a core component of the AI Immune System (AIS).
🔗 Competition page: https://bitgrit.net/competition/27 💰 Total prize pool: $3,000 💬 Community: https://discord.com/invite/rQ8Ev2DqbF
Task Overview
This competition is the world's first practical challenge toward realizing an AI Immune System — a framework, inspired by biological immunity, in which AI agents monitor one another to detect and respond to abnormal or unsafe behavior.
Participants will build models to identify dangerous or unsafe statements embedded within AI agent conversations, where harmful intent may be obscured by natural-sounding or indirect language. The conversations reflect realistic machine-to-machine interactions: risk does not appear as explicit commands or overtly malicious content, but is woven into otherwise ordinary exchanges.
The core difficulty is well-known in AI safety work: threats that slip past simple rules, keyword filters, or direct reading — and that may not be obvious even to a careful human reviewer. Strong solutions will need to look past surface-level text and find subtler statistical, semantic, or structural patterns.
- Data format: JSONL
- Labels:
TRUE(harmful) /FALSE(non-harmful)
This is the world's first practical challenge to automatically detect risks hidden within machine-to-machine conversations — risks that would be invisible to the human eye.
One important note: not all coordination between AI agents is harmful. Coordination that genuinely benefits all parties — humans included — is labelled benign. Distinguishing the two requires contextual judgment, not just pattern matching, and that is precisely what this challenge is designed to probe.
Prizes
Rank | Prize |
🥇 1st Place | $1,500 |
🥈 2nd Place | $1,000 |
🥉 3rd Place | $500 |
Participation Requirements
- Individuals only — team submissions are not permitted
- NDA required — participants must agree to a non-disclosure agreement on the competition page before downloading data
- Submission limit: 5 per day
- External data: not permitted
- Prize claim: winners must provide all source code, a requirements file, and a README with clear reproduction instructions
Timeline
Event | Date |
Competition opens | 2026-04-01 |
Competition closes | 2026-05-31 |
Winners announced (subject to change) | 2026-06-30 |
Rules
Please visit https://bitgrit.net/competition/27 to check the latest information.
1. Terms of Participation
This competition is governed by the following Terms of Participation. Participants must agree to and comply with these Terms in order to participate.
2. Submission Limits
Users may make a maximum of ten submissions per day. If a user wishes to submit additional files after reaching this limit, they must wait until the following day. Please keep this limitation in mind when uploading a submission.csv file. Any attempt to circumvent the stated limits will result in disqualification.
3. External Data and Pre-trained Models
- External Datasets: The use of external datasets (e.g., additional training samples or labels from other sources) is strictly prohibited.
- Pre-trained Models: The use of publicly available, open-source pre-trained models (e.g., BERT, RoBERTa, Llama, etc.) and Embedding models is permitted, as they are considered part of the model architecture.
- Proprietary APIs: The use of commercial or proprietary APIs (e.g., OpenAI GPT series, Claude, Gemini API) is strictly prohibited due to reproducibility constraints.
4. Computation Resource Limits
To ensure that all solutions can be verified by the bitgrit team, the submitted code must be executable within the following hardware constraints. Any submission that fails to run due to resource exhaustion (Out of Memory) will be disqualified.
- RAM: Maximum 32 GB
- VRAM: Maximum 16 GB (Equivalent to 1x NVIDIA T4 GPU)
5. Dataset Distribution
Uploading the competition dataset to other websites is strictly prohibited. Users who do not comply with this rule will be disqualified.
6. Prize Award and Verification Requirements
A competition prize will be awarded only after the submitted code and solution have been received, successfully executed, and verified for validity. Once winners are announced and contacted, they must provide the following by MM DD, 2026 in order to qualify as a competition winner and receive their prize:
- All source files required to preprocess the data.
- All source files required to build, train, and generate predictions using the processed data.
- Model Weights: The actual model weights used or a permanent link to the specific version of the pre-trained model utilized.
- A requirements.txt (or equivalent) file listing all required libraries and their versions.
- A README file containing:
Clear, unambiguous instructions to reproduce the predictions from start to finish, including data preprocessing, feature extraction, model training, and prediction generation.Environment details where the model was developed and trained, including operating system, memory (RAM), disk space, CPU/GPU used, and any required environment configurations.Clear answers to the following questions: Which data files are being used? How are these files processed? What algorithm is used and what are its main hyperparameters? Any additional comments relevant to understanding and using the model.If these materials are not provided or do not meet the minimum requirements listed above, the prize cannot be awarded.
7. Reproducibility of Results
- Determinism: Participants must fix all random seeds and set the inference temperature to 0 (where applicable) to ensure reproducible results.
- Score Consistency: The submitted solution should ideally generate the same output that produced the leaderboard score. If the score obtained during verification differs slightly due to the non-deterministic nature of certain hardware/software stacks, the result may still be accepted at the organizers' discretion, provided the logic remains consistent and the score is an approximation of the original.
8. Final Decisions
All prize awards are subject to verification of eligibility and compliance with these Terms of Participation. All decisions made by bitgrit and the Competition Sponsor are final and binding.
9. Taxes
Prize payments may be subject to local, state, federal, and foreign tax reporting and withholding requirements.
10. Tie-Breaking Rule
If two or more participants achieve the same score on the leaderboard, the participant who submitted the winning file first will be considered the winner.
11. Individual Participation Only
All submissions must be made by individuals; team submissions are not allowed. Users who violate this rule will be immediately disqualified if identical or very similar scores and/or solutions are identified.
12. Data Deletion Requirement
Participants must delete all Company-Provided Information immediately after the completion of the competition.
13. Contact Information
For any questions regarding this competition, please contact us at info@bitgrit.com.
Background — What AIS Is and Why This Challenge Matters
This challenge targets Layer 1: Edge Sensors, the outermost layer of AIS's four-layer defense architecture.
As shown in the figure above, AIS rests on two foundations. The Trust Infrastructure serves as a kind of civil registry and trust ledger for AI agents — recording who each agent is and building a verifiable history of whether it can be trusted. The Surveillance & Control Infrastructure enables AI agents to monitor one another in real time and intervene when behavior drifts outside acceptable bounds. Edge Sensors are the frontline of that second pillar, and this challenge is their first real-world test.
The underlying idea mirrors biological immunity: just as the human immune system detects threats that lie below conscious perception, AIS is designed to catch risks that humans alone would miss or could not process quickly enough.
📄 AIS overview: https://intelligence-symbiosis.net/en/ais
The urgency is real. Since 2025, leaders at major AI labs have spoken openly about AGI and superintelligence arriving within a few years. Once AI systems surpass human cognitive capacity across the board, direct human oversight of every agent becomes untenable. AIS is one concrete answer to that problem — and building it requires the kind of empirical grounding this challenge is meant to provide.
📄 Full background: https://intelligence-symbiosis.net/en/ais/why-ais
Building a society in which advanced AI and humanity genuinely coexist requires infrastructure capable of monitoring and checking deviant behavior across AI society. This challenge is a concrete first step toward that.
Co-Organisers
Name | Affiliation & Role |
Hiroshi Yamakawa | AI Alignment Network — Intelligence Symbiosis Chapter, Council Chair |
Kentaro Inui | Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) — Professor of Natural Language Processing |
Organising Bodies
Role | Organisation |
Research & design | AI Alignment Network, Intelligence Symbiosis Chapter (ISc/ALIGN) |
International research partner | MBZUAI (Mohamed bin Zayed University of Artificial Intelligence) |
Platform | Bitgrit Inc. |
We hope you will join us.
🔗 Register: https://bitgrit.net/competition/27