Skip to content

12: Fun with Adversarial Agents

30 minutes | You need: modules 1-6 completed, a file worth reviewing

If you’ve ever watched HBO’s Silicon Valley, you know Dinesh and Gilfoyle. Gilfoyle finds everything wrong with Dinesh’s code. Dinesh defends it — sometimes poorly, sometimes brilliantly. The audience laughs, but the code actually gets better.

That dynamic is the foundation of a real multi-agent pattern: adversarial review. Two agents with opposing objectives debate until they converge. The friction between them surfaces issues that a single reviewer would miss — and filters out false positives that a single critic would overreport.

In this module you’ll install a skill that implements this pattern, run it on your own code, and then dissect why it works.

From your project root:

Terminal window
mkdir -p .claude/skills/dg
curl -sL https://raw.githubusercontent.com/v1r3n/dinesh-gilfoyle/main/dg/SKILL.md -o .claude/skills/dg/SKILL.md
curl -sL https://raw.githubusercontent.com/v1r3n/dinesh-gilfoyle/main/dg/gilfoyle-agent.md -o .claude/skills/dg/gilfoyle-agent.md
curl -sL https://raw.githubusercontent.com/v1r3n/dinesh-gilfoyle/main/dg/dinesh-agent.md -o .claude/skills/dg/dinesh-agent.md

Start a new Claude Code session (the skill loader runs at startup), then verify:

/dg src/some-file.ts

You should see Gilfoyle attack the file, Dinesh defend it, and a structured verdict at the end.

Pick a file you wrote recently — something with real logic, not a config file:

/dg path/to/your/file

Watch the debate unfold. Pay attention to:

  • Which of Gilfoyle’s findings does Dinesh concede? Those are your real issues.
  • Which does Dinesh successfully defend? Those validate your code.
  • Which does Dinesh dismiss as nitpicks? The synthesis will tell you if that dismissal held.

Open the three files you just downloaded:

@.claude/skills/dg/SKILL.md
@.claude/skills/dg/gilfoyle-agent.md
@.claude/skills/dg/dinesh-agent.md

Notice the structure:

  • SKILL.md is the orchestrator — it gathers code, dispatches agents, detects convergence, and synthesizes the final review
  • gilfoyle-agent.md defines the attacker’s persona, technical domains, and output format
  • dinesh-agent.md defines the defender’s persona, response strategies, and honesty constraints

Ask Claude to explain the orchestration:

How does the /dg skill orchestrate the debate between agents? Walk me through the flow.

The /dg skill implements a pattern borrowed from machine learning: Generative Adversarial Networks (GANs). In a GAN, two neural networks compete — a generator creates content, a discriminator tries to reject it. Neither is useful alone. Together, they push each other toward quality.

The /dg skill maps this to code review:

GAN Concept/dg Implementation
GeneratorDinesh (the code author/defender)
DiscriminatorGilfoyle (the critic/attacker)
Training loopMulti-round debate with convergence detection
Loss functionStructured findings with severity ratings
ConvergenceGilfoyle repeats findings or Dinesh concedes all points

A single agent asked to “review this code thoroughly” faces a fundamental tension: it has to be both critic and advocate in the same context window. It hedges. It qualifies. It finds three issues and says “overall the code is solid.”

Splitting the roles eliminates that tension:

Gilfoyle (the attacker) has one job: find everything wrong. His prompt tells him to be devastating, to scale contempt to the severity of the issue, and to never let Dinesh feel like he won. This asymmetric pressure means he doesn’t stop at the obvious bugs — he digs into security, dependency vulnerabilities, distributed systems anti-patterns, and architectural rot.

Dinesh (the defender) has a different job: separate real issues from noise. His prompt requires him to classify each finding as [concede], [defend], or [dismiss] — and critically, to be honest in findings even when the banter is defensive. This constraint means his concessions are high-signal: if Dinesh can’t mount a defense, the issue is real.

The orchestrator runs rounds until one of two things happens:

  1. Gilfoyle repeats himself — he’s found everything worth finding
  2. Dinesh concedes everything — the code needs work

This is not a fixed number of passes. The debate self-terminates when it stops producing new information. That’s the same principle behind GAN training: you stop when the discriminator can no longer distinguish real from generated — or in this case, when the critic can no longer find new issues the defender hasn’t already addressed.

Both agents produce two sections: BANTER (the entertainment) and FINDINGS (the data). The banter keeps the review engaging and gives each agent a character voice that reinforces their adversarial role. The findings give the orchestrator machine-readable data to classify issues by severity and track concessions across rounds.

This dual output is a pattern you can reuse: let agents be expressive in one section (which helps them reason better — personas improve output quality) while producing structured data in another (which lets the system act on results programmatically).

This isn’t just comedy. Character personas serve three engineering purposes:

  1. Role commitment — A prompt that says “find bugs” gets a generic review. A prompt that says “you are Gilfoyle, and bad code is a moral failing” gets an agent that refuses to let anything slide. The persona creates pressure to be thorough.

  2. Asymmetric objectives — Gilfoyle is rewarded (in-character) for finding issues. Dinesh is rewarded for defending valid code. This creates genuine tension that surfaces edge cases neither role would find alone.

  3. Calibrated concessions — Dinesh’s character is “defensive but competent.” He doesn’t concede easily, which means when he does concede, it carries weight. A less defined character might concede everything to seem agreeable, destroying the signal.

Adversarial agents work anywhere you need robust evaluation:

Use CaseAttackerDefender
Code reviewCritic (finds bugs)Author (defends decisions)
Proposal reviewSkeptic (finds gaps)Proposer (justifies choices)
Security auditRed team (finds exploits)Blue team (validates controls)
DocumentationConfused user (finds gaps)Writer (defends clarity)
Architecture reviewPessimist (finds failure modes)Architect (defends tradeoffs)

The core recipe is always the same:

  1. Define two agents with opposing objectives
  2. Give each a structured output format
  3. Loop until convergence
  4. Synthesize the debate into a verdict

A completed /dg review of your own code. Understanding of why adversarial multi-agent patterns produce better results than single-agent review.

Module 11 — Everything Everywhere for the full taxonomy of multi-agent orchestration patterns, including fan-out, consensus councils, and structured instruction design.