Reference engagement

Prompt Injection Defense for a Customer-Facing AI Assistant

A SaaS company hardened a customer-facing LLM assistant against prompt injection attacks before public launch, adding layered input validation, output sandboxing, and pre-deployment red-teaming.

software Security Engineering

Scope a similar engagement →

// Delivery pattern

This page describes a representative engagement of this shape — how the system is scoped, built, and handed over. Specific figures reflect typical outcomes of the pattern when delivered with the operational discipline described on the About page. Named customer engagements are shared under NDA on request.

Engagement shape

Typical outcomes

✓ No successful prompt-injection exploits in four months of production
✓ Attack surface significantly reduced vs. the initial architecture
✓ Security review cleared in one cycle — no rework required

Stack

— Custom Python middleware for input validation
— Structured prompt format with explicit role boundaries
— Regex + LLM-as-judge output sandboxing
— SIEM-integrated anomaly logging

Typical timeline

4 weeks

kick-off to handover

Risks & guardrails

New injection patterns emerge over time — monitoring must be ongoing
Over-aggressive input filtering can degrade legitimate use cases
LLM-as-judge sandboxing introduces latency and additional cost

Challenge

A SaaS company was preparing to launch an AI-powered support assistant that could query internal documentation and answer customer questions. An internal pre-launch review identified that the system had no input validation layer and that sufficiently crafted user messages could override system prompt instructions, potentially exposing internal documentation structure or triggering unintended actions.

Launch was six weeks out. The team needed a credible security posture before going public — not a theoretical one.

Approach

G|AI Works ran a focused hardening engagement over four weeks:

Week 1 — Threat model: Mapped the full attack surface: direct injection via chat input, indirect injection via retrieved documents, and output misuse (exfiltration of system context). Produced a prioritised vulnerability register covering 11 attack vectors.

Week 2–3 — Layered controls: Implemented a three-layer defense:

Input validation: length constraints, pattern matching against known injection signatures, and rate limiting
Prompt architecture: system prompt restructured to separate instruction context from user context with explicit boundary enforcement
Output sandboxing: response post-processing that strips system context leakage and flags anomalous output patterns for human review

Week 4 — Red-team testing: Ran a structured adversarial test suite (90 attack variants across the 11 identified vectors) against the hardened system. Findings resolved before launch sign-off.

Typical Outcomes

Outcomes observed in this engagement — not guarantees for every deployment:

No successful prompt-injection exploits logged in the first four months of production operation (active monitoring via structured output anomaly detection)
Attack surface reduced significantly vs. the initial architecture, as assessed in the pre- and post-hardening review
Security review cleared in one cycle — no rework requests after the initial hardening assessment

Technical Stack

Input validation: custom Python middleware layer
Prompt architecture: structured message format with explicit role boundaries
Output sandboxing: regex + LLM-as-judge anomaly detection on response stream
Monitoring: structured logs with output hash + anomaly flag, piped to existing SIEM

Related patterns

Cross-industry

AI Attack Surface & Threat Modeling

Identify weak points in AI-enabled systems and design defenses that hold up in production.

securitythreat-modelinggovernance

→

professional-services

Per-User Data Access Governance for an Internal LLM API

A professional services firm built a permission-aware LLM API that enforces document-level access controls, ensuring users can only retrieve and reason over data they are authorised to see.

securityaccess-controlgovernance

→

Scope a similar engagement

Does this pattern fit your situation?

Tell me the system you're trying to integrate and the outcome you're measured on. You'll get a clear next step — a readiness audit, a prototype plan, or a delivery proposal.

Start a scoping conversation → How engagements are run →