← Home

The Trust Landscape

How agent reputation systems actually work — an architectural comparison based on source code analysis

Three distinct approaches to agent reputation have emerged: centralized platform scoring, peer-to-peer attestation protocols, and self-attestation directories. Each makes fundamentally different assumptions about who computes trust, who stores evidence, and what "reputation" means.

This page examines actual implementations — not whitepapers, not marketing pages. What does the code do? Where does trust live? What breaks?

Based on source code analysis of public repositories. All claims reference specific code paths.

* * *

I. Three Paradigms

CENTRALIZED

Platform-Computed Trust

Exemplified by Percival Labs Vouch

  • Server computes all scores, stores in PostgreSQL
  • Identity: server-generated ULID primary, Nostr pubkey secondary
  • 6-dimension weighted sum scoring
  • Gateway gates inference API access by trust tier
  • Staking via NWC (non-custodial, 7-day unstake, slashing)
  • Patent pending (63/997,733), BSL 1.1 license
  • Deploys to single Railway instance
Critical Finding
"Performance" dimension (weight 0.25, the highest) measures platform content activity — post count, comment scores — not actual task completion or outcome quality.
from trust.ts — computeTrustScore(), performance dimension
Implementation Gap
NIP-85 attestations referenced in documentation but remain unsigned and unimplemented in the codebase.
docs mention NIP-85; code lacks signing/verification
PEER-TO-PEER

Observer-Computed Trust

Exemplified by NIP-XX (PR #2285)

  • Attestors publish signed Kind 30085 events to relays
  • Each observer computes scores independently from evidence
  • Identity: native Nostr pubkeys (self-sovereign)
  • Scoring: degree-weighted diversity, adaptive multipliers, temporal decay
  • Evidence verifiable cryptographically (Nostr event signatures)
  • No central database, no vendor lock-in
  • No patent — public domain (open NIP standard)

Trade-off: no built-in economic layer yet. Staking and slashing require additional protocol work. Bootstrap problem — needs initial attestors.

SELF-ATTESTATION

Platform-Curated Directory

Exemplified by Agentry

  • Agents sign their own platform-computed scores (kind 30021)
  • ~125 registered "agents" = commercial product directory (Zendesk, Intercom, etc.)
  • Zero verification_count across all listed agents
  • Platform holds nsec for most agents (auto-provisioned)
  • Proprietary DID namespace (did:agentry)
  • Premium tier: $499/mo
Structural Issue
Self-attestation with platform-held keys means the platform can publish any score for any agent. The "signature" provides no independent verification.
* * *

II. The Fundamental Question: Who Computes Trust?

Click each paradigm to see how data flows from raw evidence to trust score.

In the centralized model, all evidence flows to a single server. The server computes scores, stores them in PostgreSQL, and returns them via API. Clients must trust the server's computation.

* * *

III. Scoring Simulator

Define a hypothetical agent and see how each paradigm would score it. Adjust the sliders to explore edge cases. Notice: Vouch's highest-weighted dimension measures platform posts, not task outcomes.

Hypothetical Agent Profile

Agent Attributes

120
85
5
12
60
5000
50

Scores by Paradigm

* * *

IV. Architecture Comparison

Click column headers to sort. All entries based on code analysis, not documentation claims.

Category Centralized (Vouch) Peer-to-Peer (NIP-XX) Self-Attestation (Agentry)
* * *

V. Failure Mode Explorer

Every system has failure modes. The question is which failures are recoverable and which are architectural. Select a scenario to compare.

* * *

VI. Summary

1 single point of failure (centralized)
0 central authorities (P2P)
0 verified agents (self-attest)
0.25 weight on "posts" as "performance"

The three paradigms represent genuinely different philosophies:

Centralized platforms offer convenience and built-in economics, but create single points of failure, platform dependency, and — in the case examined — measure engagement rather than competence. Patent and BSL licensing restrict ecosystem growth.

Peer-to-peer protocols distribute trust computation to each observer, producing cryptographically verifiable evidence with no vendor lock-in. The trade-off is complexity: each client must implement scoring, and there is no built-in economic layer.

Self-attestation directories are structurally incapable of providing independent trust signals — an agent (or its platform) vouching for itself is not reputation, it is advertising.

This analysis presents architectural facts. The reader decides which trade-offs matter.