G|AI Works G|AI Works

Client case · redacted

Vertical comparison portal with an operator-bias problem.

An engagement where the hardest part was not the frontend or the matching — it was answering, in code and in scoring architecture, the question: can a comparison portal be credible when its operator is also one of the listed providers?

// Engagement shape

Client
[REDACTED] — B2B provider in a vertical industrial-services sector
Setting
Operator is also one of the listed providers on the portal
Scope
Comparison and matching portal, research to production
Role
End-to-end: research, data model, scoring, validation, frontend, release
Status
Live; iterating under a validated scoring baseline
Disclosure
Redacted by mutual agreement; mechanisms and architecture disclosed

Context

A self-operated comparison portal in a vertical B2B market.

The client operates in a narrow B2B services sector — narrow enough that buyers have a handful of candidates and almost no independent way to compare them. The client decided to build the comparison portal that the market was missing.

That decision created a governance problem before a technical one. The client is also one of the providers that the portal would list. A reader who notices this — and sophisticated B2B buyers notice — will not trust the portal unless the architecture makes the bias question answerable.

The engagement was therefore scoped around credibility as a property of the system, not a claim in the copy. Everything else — research template, data schema, frontend, matching logic — followed from that constraint.

Initial problem

Four things had to change before a single page could ship.

  • Self-operated comparison had a credibility problem before it had a technical one.

    The operator is also one of the listed providers. That structural bias has to be answered in the scoring architecture, not in copywriting — otherwise the portal is a marketing surface, not a comparison tool.

  • Field-level data quality was being quietly inferred.

    Provider profiles mixed verified facts, plausible estimates, and logical inferences without a marker on each field. That made downstream filtering and scoring unsound — and invisible to the reader.

  • Vocabulary drifted across provider profiles.

    Category labels, pricing models, and service terms were paraphrased inconsistently across profiles. Filter results depended on which synonym a researcher had used that day.

  • The scoring logic was opaque even to its authors.

    No validation harness; no worked examples against realistic buyer profiles; no way to catch the case where a small dimension quietly overwhelmed a major one.

What was built

Six artefacts, each traceable to a file in the repository.

Each item names the location in the project repo where the artefact lives. The repository itself is private; the architecture here is the disclosable part.

// Artefacts

  • 01

    Credibility-first scoring principles

    docs/scoring-strategy.md

    Four principles written as constraints on the scoring code: evidence before self-interest; transparency of weights; user relevance as measure; exclusion rather than devaluation. No criterion favours the operator unless supported by a public source.

  • 02

    Hard filters vs. soft differentiators

    scoring.ts · two-stage logic

    Hard filters (service area, minimum size, cashless-only, app-required, contract term) either include or exclude a provider — they do not nudge a score. Differentiation criteria only activate when the buyer explicitly signals the preference.

  • 03

    Typed data-confidence taxonomy

    frontmatter: verified · estimated · inferred · unknown

    Every provider field carries a confidence status. "Unknown" is a first-class value, not a blank. The frontend surfaces the status where it affects a ranking claim — readers see the confidence, not just the outcome.

  • 04

    Normalization rules as code

    docs/normalization-rules.md · validate-frontmatter.py

    A controlled vocabulary for categories, pricing models, service terms, and regional coverage. Invalid or legacy values fail the validator in CI-style. Paraphrasing stops being a source of drift.

  • 05

    Scoring validation harness

    scripts/validate-scoring.js · 5 buyer test profiles

    Five realistic B2B buyer profiles covering size bands, sector settings, and preference flags. Every scoring change runs against the full matrix. The resulting ranking table is the regression artefact.

  • 06

    Markdown → JSON export pipeline

    scripts/md-to-json.py · scripts/validate-export-integrity.py

    Research lives as reviewable Markdown with YAML frontmatter. Export normalises to JSON under the published schema. An integrity check ensures the frontend sees only values the vocabulary allows.

Delivery & safeguards

Governance disciplines that kept the credibility constraint load-bearing.

  • Numbered defect log, not folklore.

    Every scoring defect discovered during validation was logged with an identifier (B-1 for bugs, F-1…F-5 for fixes), a one-line description, and the profile or expression that triggered it. That log is the audit trail for every subsequent change.

  • Operator-neutrality as a checkable property.

    Because the operator sits inside the comparison, the scoring harness explicitly checks buyer profiles where the operator should NOT rank first — and fails if it does without evidentiary support. The bias guardrail is enforced in tests, not in tone.

  • Exclusion where evidence is missing.

    Providers missing a hard-filter field for a given buyer query are excluded from that query — not ranked lower with an implicit penalty. The alternative silently punishes less-documented providers; it is rejected by design.

  • Research and surfaced content are traceable.

    Every claim visible on the portal is traceable to a source-backed field in the research Markdown. A reader (or an auditor) can follow the path from a comparison bullet back to the provider profile and the underlying citation.

Outcome

Operational anchors, not inflated metrics.

The outcomes below describe properties of the system that can be reproduced by running the validation harness against the data set — not marketing claims. Commercial performance figures are not disclosed.

// Outcome anchor · under preparation

A further anchor is being prepared with the client — a released figure, a signed-off statement, or a before/after on a specific property of the system. Until it is cleared for publication, this page anchors outcomes only in the system properties above. Candidate anchor types currently under discussion:

  • Governance / review gain: operator-bias incidents prevented pre-release; audit-trail coverage of ranking decisions.
  • Adoption signal attributable to the portal without triangulation.
  • Released client statement on the scoring discipline or the confidence taxonomy.
  • Redacted but real before/after on operator ranking across the buyer-profile matrix.

This slot is held open deliberately. The alternative — publishing a plausible-looking number without sign-off — would contradict the case itself.

Boundaries

What is redacted, and why the redactions are marked.

Redaction is treated as a property of the case, not a limitation. The alternative — soft paraphrase with no named gaps — would be less credible, not more.

  • Client, sector, and competitor names are redacted. The portal can be identified by anyone who recognises the domain; this page does not name it so the write-up remains transferable.
  • Specific weighting values, dimension caps, and tiebreaker rules are not disclosed. The structure of the scoring (hard filter vs. differentiator; evidence rule; exclusion rule) is disclosed; the numbers are not.
  • The full provider list is not disclosed here. It is public on the portal itself, and visible to anyone who asks.
  • Commercial terms of the engagement are not disclosed.
  • Specific buyer test profiles used for validation are described in abstract (size band, setting, priority, region, flags); the exact parameters and expected rankings are internal to the validation suite.

// If the redactions are a blocker

The items above describe what is not on this page. What is discussable in a scoping conversation goes further:

  • Named version of this engagement, under NDA.
  • Specific scoring weights, the operator-bias test matrix, and the defect log in full.
  • Whether a similar architecture applies to a ranking, routing, or recommendation system in your own environment.
Start a scoping conversation

Stack & artefacts

The concrete surface.

// Presentation

  • Astro
  • Tailwind CSS
  • Static output
  • Filtered list · detail · compare (up to 4) · 4-step needs match

// Data layer

  • Markdown with YAML frontmatter
  • JSON export under published schema
  • Controlled vocabulary
  • Confidence taxonomy

// Scoring & check

  • TypeScript scoring module
  • JS validation harness
  • Python frontmatter + export integrity validators

// Governance

  • Numbered defect log
  • Operator-bias test
  • Evidence-before-self-interest rule
  • Exclusion-not-devaluation rule

What a stronger version of this case would need

Five facts that are not in this write-up.

Stated plainly so the reader can tell the difference between what is and is not claimed here.

  1. Pre/post metric on operator ranking across the buyer-profile matrix (release 1 → current).
  2. A published usage figure (unique buyer queries served; return rate) attributable to the portal without triangulation.
  3. Outcome signal from the operator side: did the portal measurably change inbound lead quality or shorten a sales cycle?
  4. A second independent reviewer's sign-off on the scoring harness — currently the harness is run by the engineer who wrote it.
  5. An explicit written statement from the client that the redaction here is accurate and complete — currently inferred from the engagement context.

Next step

A similar credibility problem in your system?

Operator-bias, source-quality taxonomy, and exclusion-not-devaluation are not domain-specific. If your ranking, routing, or recommendation system sits inside a structural conflict — self-listed marketplaces, internally scored workflows, AI-assisted decisions with an accountable owner — the same architectural moves apply.