Background
Manifesto

Agentic Automation Pentest

The Philosophy of Autonomous Security Testing

> HUMAN IN THE LOOP = BOTTLENECK

Imagine an autonomous red team that requires you to manually chain recon tools, copy-paste PoCs, and babysit every step. Would you call that "automation"? No. You'd call it a glorified script runner.

Why is pentesting any different from other automation domains?

We've accepted a paradigm where "AI security testing" means a chatbot that runs a scan, then waits for you to validate findings. That's not automation — that's micromanagement.

  • Manually correlating subdomain enumeration results
  • Copying curl commands from one tool to another
  • Writing PoC scripts by hand for each finding
  • Double-checking scope boundaries for every target

That's not "human expertise" — that's wasted cognitive bandwidth on mechanical work.

oh-my-open-pentest is built on the premise that the human should define scope and rules of engagement, not babysit recon tools.

Indistinguishable Findings

Findings submitted by the agent should be indistinguishable from those submitted by a top-tier bug bounty hunter.

Clear vulnerability description with root cause analysis
Working Proof of Concept (reproducible, not theoretical)
Accurate CVSS scoring with proper vector strings
Impact statement that resonates with program owners
Remediation guidance that developers can actually implement
"If a triager can tell whether a report was written by a human hunter or an agent, the agent has failed."

Token Cost vs. Coverage

We don't care about token usage. We care about attack surface coverage and exploit depth. If spending tokens means finding P.s that manual testers miss, that's a worthwhile investment.

  • Enumerate broader attack surface in parallel
  • Chain multiple vulnerabilities into high-impact exploits
  • Verify findings through multiple attack vectors

However...

We optimize for efficiency where it counts. Not by crippling coverage, but by:

  • Using lightweight scans for initial reconnaissance
  • Avoiding redundant enumeration of mapped assets
  • Caching scope intelligence across engagements
  • Stopping deep exploitation at scope boundaries

Minimize Human Cognitive Load

The human should only need to provide scope definition and program rules. Everything else is the agent's job.

Autonomous Mode
Full Engagement

Define scope. Walk away. Get findings.

Parses and enforces scope boundaries

Conducts reconnaissance across attack surface

Identifies and validates vulnerabilities

Builds exploit chains where applicable

Generates submission-ready reports with PoCs

You define the boundaries. The agent finds what's inside them.
Scope Enforcement
Agent-Enforced Boundaries

The agent polices itself.

Talos (Scope Guard)

Reads and parses program scope. Validates every target before active testing. Refuses out-of-scope assets even in attack paths.

Argus (Monitor)

Hundred-eyed observation of scope boundaries and engagement state. Logs scope decisions for audit trail. Alerts on ambiguity.

You don't police the agent. The agent polices itself.
predictable

Predictable

Same scope + same RoE = consistent output. No random deviations or scope violations.

continuous

Continuous

Survives interruptions. Engagement state preserved across sessions. No redundant re-testing.

delegatable

Delegatable

Clear scope verified and enforced. Self-correcting on unexpected responses. Escalation only on true ambiguity.

The Core Loop

Scope
Recon
Exploit
Verify
Report

↻ Agent-Enforced Boundaries

Cerberus

Multi-headed orchestration across recon, exploit, and report agents

Hydra

Parallel subdomain, port, service, and technology enumeration

Argus

Hundred-eyed observation of scope boundaries and engagement state

Scylla

Multi-vector exploitation and chain building

Hermes

Submission-ready report generation with PoCs

Talos

Autonomous scope enforcement and RoE compliance

Engagement State

Persistent tracking across sessions, no redundant work

Finding Validation

Multi-vector verification before reporting

The Future We're Building

Human hunters focus on strategy, not mechanical recon
Finding quality independent of who (or what) found it
Complex exploit chains as routine as simple XSS
"Manual testing" means strategic thinking, not running tools

manifesto.future.quote1

"You define the scope. The findings arrive. You don't think about the recon."

scope → findings

Get oh-my-open-pentest