Use Case
Prompt Injection Defense for a Customer-Facing AI Assistant
A SaaS company hardened a customer-facing LLM assistant against prompt injection attacks before public launch, adding layered input validation, output sandboxing, and pre-deployment red-teaming.
At a glance
Outcomes
- ✓ No successful prompt-injection exploits in four months of production
- ✓ Attack surface significantly reduced vs. the initial architecture
- ✓ Security review cleared in one cycle — no rework required
Stack
- — Custom Python middleware for input validation
- — Structured prompt format with explicit role boundaries
- — Regex + LLM-as-judge output sandboxing
- — SIEM-integrated anomaly logging
Typical timeline
4 weeks
kick-off to handover
Risks & guardrails
- New injection patterns emerge over time — monitoring must be ongoing
- Over-aggressive input filtering can degrade legitimate use cases
- LLM-as-judge sandboxing introduces latency and additional cost
Challenge
A SaaS company was preparing to launch an AI-powered support assistant that could query internal documentation and answer customer questions. An internal pre-launch review identified that the system had no input validation layer and that sufficiently crafted user messages could override system prompt instructions, potentially exposing internal documentation structure or triggering unintended actions.
Launch was six weeks out. The team needed a credible security posture before going public — not a theoretical one.
Approach
G|AI Works ran a focused hardening engagement over four weeks:
Week 1 — Threat model: Mapped the full attack surface: direct injection via chat input, indirect injection via retrieved documents, and output misuse (exfiltration of system context). Produced a prioritised vulnerability register covering 11 attack vectors.
Week 2–3 — Layered controls: Implemented a three-layer defense:
- Input validation: length constraints, pattern matching against known injection signatures, and rate limiting
- Prompt architecture: system prompt restructured to separate instruction context from user context with explicit boundary enforcement
- Output sandboxing: response post-processing that strips system context leakage and flags anomalous output patterns for human review
Week 4 — Red-team testing: Ran a structured adversarial test suite (90 attack variants across the 11 identified vectors) against the hardened system. Findings resolved before launch sign-off.
Typical Outcomes
Outcomes observed in this engagement — not guarantees for every deployment:
- No successful prompt-injection exploits logged in the first four months of production operation (active monitoring via structured output anomaly detection)
- Attack surface reduced significantly vs. the initial architecture, as assessed in the pre- and post-hardening review
- Security review cleared in one cycle — no rework requests after the initial hardening assessment
Technical Stack
- Input validation: custom Python middleware layer
- Prompt architecture: structured message format with explicit role boundaries
- Output sandboxing: regex + LLM-as-judge anomaly detection on response stream
- Monitoring: structured logs with output hash + anomaly flag, piped to existing SIEM
Related Use Cases
Ready to scope this?
Let's talk about your project.
Tell us what you're building. We'll respond with a clear next step: an audit, a prototype plan, or a delivery proposal.
Start a project →