Client case · redacted

Vertical comparison portal with an operator-bias problem.

An engagement where the hardest part was not the frontend or the matching — it was answering, in code and in scoring architecture, the question: can a comparison portal be credible when its operator is also one of the listed providers?

—Real engagement; names, sector, and numbers redacted by mutual agreement.
—Architecture, scoring rules, data taxonomy, and governance discipline are disclosed in full.
—The redactions are marked explicitly — there is no pseudo-concreteness standing in for a name.

// Engagement shape

Client: [REDACTED] — B2B provider in a vertical industrial-services sector
Setting: Operator is also one of the listed providers on the portal
Scope: Comparison and matching portal, research to production
Role: End-to-end: research, data model, scoring, validation, frontend, release
Status: Live; iterating under a validated scoring baseline
Disclosure: Redacted by mutual agreement; mechanisms and architecture disclosed

Context

A self-operated comparison portal in a vertical B2B market.

The client operates in a narrow B2B services sector — narrow enough that buyers have a handful of candidates and almost no independent way to compare them. The client decided to build the comparison portal that the market was missing.

That decision created a governance problem before a technical one. The client is also one of the providers that the portal would list. A reader who notices this — and sophisticated B2B buyers notice — will not trust the portal unless the architecture makes the bias question answerable.

The engagement was therefore scoped around credibility as a property of the system, not a claim in the copy. Everything else — research template, data schema, frontend, matching logic — followed from that constraint.

Initial problem

Four things had to change before a single page could ship.

Self-operated comparison had a credibility problem before it had a technical one.

The operator is also one of the listed providers. That structural bias has to be answered in the scoring architecture, not in copywriting — otherwise the portal is a marketing surface, not a comparison tool.
Field-level data quality was being quietly inferred.

Provider profiles mixed verified facts, plausible estimates, and logical inferences without a marker on each field. That made downstream filtering and scoring unsound — and invisible to the reader.
Vocabulary drifted across provider profiles.

Category labels, pricing models, and service terms were paraphrased inconsistently across profiles. Filter results depended on which synonym a researcher had used that day.
The scoring logic was opaque even to its authors.

No validation harness; no worked examples against realistic buyer profiles; no way to catch the case where a small dimension quietly overwhelmed a major one.

What was built

Six artefacts, each traceable to a file in the repository.

Each item names the location in the project repo where the artefact lives. The repository itself is private; the architecture here is the disclosable part.

// Artefacts

01

Credibility-first scoring principles

docs/scoring-strategy.md

Four principles written as constraints on the scoring code: evidence before self-interest; transparency of weights; user relevance as measure; exclusion rather than devaluation. No criterion favours the operator unless supported by a public source.
02

Hard filters vs. soft differentiators

scoring.ts · two-stage logic

Hard filters (service area, minimum size, cashless-only, app-required, contract term) either include or exclude a provider — they do not nudge a score. Differentiation criteria only activate when the buyer explicitly signals the preference.
03

Typed data-confidence taxonomy

frontmatter: verified · estimated · inferred · unknown

Every provider field carries a confidence status. "Unknown" is a first-class value, not a blank. The frontend surfaces the status where it affects a ranking claim — readers see the confidence, not just the outcome.
04

Normalization rules as code

docs/normalization-rules.md · validate-frontmatter.py

A controlled vocabulary for categories, pricing models, service terms, and regional coverage. Invalid or legacy values fail the validator in CI-style. Paraphrasing stops being a source of drift.
05

Scoring validation harness

scripts/validate-scoring.js · 5 buyer test profiles

Five realistic B2B buyer profiles covering size bands, sector settings, and preference flags. Every scoring change runs against the full matrix. The resulting ranking table is the regression artefact.
06

Markdown → JSON export pipeline

scripts/md-to-json.py · scripts/validate-export-integrity.py

Research lives as reviewable Markdown with YAML frontmatter. Export normalises to JSON under the published schema. An integrity check ensures the frontend sees only values the vocabulary allows.

Delivery & safeguards

Governance disciplines that kept the credibility constraint load-bearing.

Numbered defect log, not folklore.

Every scoring defect discovered during validation was logged with an identifier (B-1 for bugs, F-1…F-5 for fixes), a one-line description, and the profile or expression that triggered it. That log is the audit trail for every subsequent change.
Operator-neutrality as a checkable property.

Because the operator sits inside the comparison, the scoring harness explicitly checks buyer profiles where the operator should NOT rank first — and fails if it does without evidentiary support. The bias guardrail is enforced in tests, not in tone.
Exclusion where evidence is missing.

Providers missing a hard-filter field for a given buyer query are excluded from that query — not ranked lower with an implicit penalty. The alternative silently punishes less-documented providers; it is rejected by design.
Research and surfaced content are traceable.

Every claim visible on the portal is traceable to a source-backed field in the research Markdown. A reader (or an auditor) can follow the path from a comparison bullet back to the provider profile and the underlying citation.

Outcome

Operational anchors, not inflated metrics.

The outcomes below describe properties of the system that can be reproduced by running the validation harness against the data set — not marketing claims. Commercial performance figures are not disclosed.

✓ Five realistic buyer test profiles rank providers against a validated scoring baseline — re-running the suite after any change surfaces regressions before release.
✓ Multiple scoring defects were caught and fixed during validation (bug register with IDs B-1 and F-1 through F-5), including a regex that wrongly kept a regional provider in a national result set and a dimension that quietly overwhelmed the primary preference.
✓ Provider profiles use a verified / estimated / inferred / unknown confidence taxonomy — unknown is recorded honestly instead of filled with a plausible guess.
✓ Operator-bias risk is addressed structurally: the operator only surfaces as top-ranked where buyer preferences and public evidence both support it. The check is in the harness, not in the marketing copy.
✓ Vocabulary normalization turned paraphrase-driven filter drift into a validator-enforced property: invalid category, pricing-model, or service-term values fail the profile check.

// Outcome anchor · under preparation

A further anchor is being prepared with the client — a released figure, a signed-off statement, or a before/after on a specific property of the system. Until it is cleared for publication, this page anchors outcomes only in the system properties above. Candidate anchor types currently under discussion:

—Governance / review gain: operator-bias incidents prevented pre-release; audit-trail coverage of ranking decisions.
—Adoption signal attributable to the portal without triangulation.
—Released client statement on the scoring discipline or the confidence taxonomy.
—Redacted but real before/after on operator ranking across the buyer-profile matrix.

This slot is held open deliberately. The alternative — publishing a plausible-looking number without sign-off — would contradict the case itself.

Boundaries

What is redacted, and why the redactions are marked.

Redaction is treated as a property of the case, not a limitation. The alternative — soft paraphrase with no named gaps — would be less credible, not more.

— Client, sector, and competitor names are redacted. The portal can be identified by anyone who recognises the domain; this page does not name it so the write-up remains transferable.
— Specific weighting values, dimension caps, and tiebreaker rules are not disclosed. The structure of the scoring (hard filter vs. differentiator; evidence rule; exclusion rule) is disclosed; the numbers are not.
— The full provider list is not disclosed here. It is public on the portal itself, and visible to anyone who asks.
— Commercial terms of the engagement are not disclosed.
— Specific buyer test profiles used for validation are described in abstract (size band, setting, priority, region, flags); the exact parameters and expected rankings are internal to the validation suite.

// If the redactions are a blocker

The items above describe what is not on this page. What is discussable in a scoping conversation goes further:

—Named version of this engagement, under NDA.
—Specific scoring weights, the operator-bias test matrix, and the defect log in full.
—Whether a similar architecture applies to a ranking, routing, or recommendation system in your own environment.

Start a scoping conversation →

Stack & artefacts

The concrete surface.

// Presentation

Astro
Tailwind CSS
Static output
Filtered list · detail · compare (up to 4) · 4-step needs match

// Data layer

Markdown with YAML frontmatter
JSON export under published schema
Controlled vocabulary
Confidence taxonomy

// Scoring & check

TypeScript scoring module
JS validation harness
Python frontmatter + export integrity validators

// Governance

Numbered defect log
Operator-bias test
Evidence-before-self-interest rule
Exclusion-not-devaluation rule

What a stronger version of this case would need

Five facts that are not in this write-up.

Stated plainly so the reader can tell the difference between what is and is not claimed here.

Pre/post metric on operator ranking across the buyer-profile matrix (release 1 → current).
A published usage figure (unique buyer queries served; return rate) attributable to the portal without triangulation.
Outcome signal from the operator side: did the portal measurably change inbound lead quality or shorten a sales cycle?
A second independent reviewer's sign-off on the scoring harness — currently the harness is run by the engineer who wrote it.
An explicit written statement from the client that the redaction here is accurate and complete — currently inferred from the engagement context.

Next step

A similar credibility problem in your system?

Operator-bias, source-quality taxonomy, and exclusion-not-devaluation are not domain-specific. If your ranking, routing, or recommendation system sits inside a structural conflict — self-listed marketplaces, internally scored workflows, AI-assisted decisions with an accountable owner — the same architectural moves apply.

Start a scoping conversation → Read the internal case →