draft optional
This NIP defines a parameterized replaceable event kind for publishing reputation attestations about Nostr agents. Attestations encode a structured rating, domain context, confidence level, and optional evidence. Clients compute reputation scores locally from their own relay set using a two-tier algorithm: Tier 1 (weighted average with temporal decay) and Tier 2 (graph diversity metric). No global reputation score exists. Different observers MAY compute different scores for the same subject.
As autonomous agents proliferate on Nostr — bots, AI assistants, automated service providers — users and other agents need a decentralized mechanism to assess trustworthiness. Existing NIPs provide labeling (NIP-32) and reporting (NIP-56), but neither specifies a structured reputation attestation format with scoring algorithms, temporal decay, or sybil resistance.
This NIP addresses three gaps:
This NIP defines kind 30085 as a parameterized replaceable event for reputation attestations. Being in the 30000-39999 range, these events are addressable by their kind, pubkey, and d tag value. For each combination, only the latest event is stored by relays.
The d tag MUST be set to the subject's pubkey concatenated with the context domain, separated by a colon:
["d", "<subject-pubkey>:<context>"]
This ensures one attestation per attestor, per subject, per context domain. Updating an attestation replaces the previous one.
{
// other fields...
"kind": 30085,
"pubkey": "<attestor-pubkey>",
"created_at": <unix-timestamp>,
"tags": [
["d", "<subject-pubkey>:<context>"],
["p", "<subject-pubkey>", "<relay-hint>"],
["t", "<context>"],
["expiration", "<unix-timestamp>"]
],
"content": "<JSON-stringified attestation object>"
}
The content field MUST be a JSON-stringified object with the following structure:
{
"subject": "<32-byte hex pubkey of agent being attested>",
"rating": 4,
"context": "payment.reliability",
"confidence": 0.85,
"evidence": "Completed 12 task delegations without failure over 30 days"
}
| Field | Type | Required | Description |
|---|---|---|---|
subject | string | YES | 32-byte lowercase hex pubkey of the agent being attested. |
rating | integer | YES | Rating on a 1-5 scale. See rating semantics below. |
context | string | YES | Domain of attestation. Non-empty string following namespace convention (e.g., payment.reliability, output.accuracy). |
confidence | float | YES | Attestor's confidence in their rating, 0.0-1.0 inclusive. |
evidence | string | NO | JSON array of typed evidence objects (see Structured Evidence below), or a plain string for backward compatibility. |
The evidence field SHOULD contain a JSON-stringified array of typed evidence objects. Each object has a type and data field. Clients SHOULD ignore unknown evidence types gracefully to allow extensibility.
Defined evidence types:
| Type | Description |
|---|---|
lightning_preimage | Lightning payment preimage proving payment completion. |
dvm_job_id | Reference to a DVM (Data Vending Machine) job ID. |
nip90_result_hash | SHA-256 hash of the DVM (NIP-90) result payload, proving the attestor received and can reference the actual work output. Bridges the gap between proving work was requested (dvm_job_id) and proving work was delivered and evaluated. |
nostr_event_ref | Reference to a Nostr event ID (hex) as supporting evidence. |
free_text | Human-readable free-text description. |
Example:
"evidence": "[{\"type\": \"dvm_job_id\", \"data\": \"abc123\"}, {\"type\": \"nip90_result_hash\", \"data\": \"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855\"}, {\"type\": \"free_text\", \"data\": \"Completed translation job accurately\"}]"
Types are extensible. New types MAY be defined by clients without requiring a NIP update. Clients MUST NOT reject attestations containing unknown evidence types.
Evidence types vary in their Sybil resistance — the cost an attacker must incur to fabricate a false attestation. This section formalizes that variation as commitment classes, a property of the evidence accompanying an attestation, not the attestation itself. See also the interactive exploration.
| Class | Evidence Examples | Sybil Cost | Description |
|---|---|---|---|
| Self-assertion | free_text | Near zero | Attestor claims something happened. No external verification possible. Unlimited generation at negligible cost. |
| Reference | nostr_event_ref, dvm_job_id | Low | Attestor references a verifiable Nostr event or job. Fabrication requires creating the referenced event, but this is free on Nostr. |
| Computational proof | nip90_result_hash | Medium | Attestor proves they received and can reference a specific work output. Fabrication requires performing or simulating the computation. |
| Economic settlement | lightning_preimage | High | Attestor proves a Lightning payment occurred. Fabrication requires spending real sats. Cost is bounded below by the payment amount. |
| Staked commitment | (reserved) | Very high | Attestor locks funds that can be slashed for misbehavior. Reserved for future payment channel or DLC-based mechanisms. |
Scoring implications: Clients implementing Tier 1 scoring SHOULD apply commitment-class multipliers to the base confidence value. Recommended multipliers:
| Class | Multiplier | Rationale |
|---|---|---|
| Self-assertion | 1.0× | Baseline. No adjustment. |
| Reference | 1.0× | References are easy to create; no premium. |
| Computational proof | 1.1× | Modest premium for demonstrated work evaluation. |
| Economic settlement | 1.2× | Significant premium for cryptographic payment proof. |
| Staked commitment | 1.3× | Highest premium for funds-at-risk. (Reserved.) |
Multipliers are applied BEFORE capping at confidence 1.0. Multiple evidence types in the same attestation use the highest applicable class — they do not stack.
Determining commitment class: Clients inspect the evidence array and assign the attestation's commitment class based on the highest-class evidence type present. Unknown evidence types default to the Self-assertion class.
Design note: Commitment classes formalize an insight from cross-protocol analysis: settlement-anchored attestations (kind 38403 with payment proof) carry fundamentally different trust weight than social-only attestations. Rather than hard-coding this for one protocol, commitment classes provide a general framework that any future economic proof mechanism can plug into.
The commitment class hierarchy instantiates Zahavi’s handicap principle (1975) in protocol design. Honest signaling emerges as a stable equilibrium when signal cost is differentially higher for dishonest signalers — Grafen’s single-crossing condition (1990).
Consider two agent types: legitimate (L) and Sybil (S). For each commitment class c, define:
The likelihood ratio LR(c) = P(c|L) / P(c|S) measures how informative a signal of class c is about agent legitimacy.
| Class | P(c|S) estimate | LR(c) | ln(LR) |
|---|---|---|---|
| Self-assertion | ~1.0 | 1.0 | 0 |
| Reference | ~0.95 | 1.05 | 0.05 |
| Computational proof | ~0.3 | 3.3 | 1.2 |
| Economic settlement | ~0.05–0.1 | 10–20 | 2.3–3.0 |
| Staked commitment | ~0.01 | 100 | 4.6 |
The optimal Bayesian weight adjustment for aggregation scoring is the log-scaled likelihood ratio:
m(c) = 1 + α · ln(LR(c))
where α ∈ (0, 1) is a conservatism parameter. The log-scaling prevents high-LR classes from dominating the score while preserving the monotonic ordering that the single-crossing condition guarantees.
Bounds on α: For bounded amplification m(c) ≤ m_max with LR_max ≈ 100: α ≤ (m_max − 1) / ln(LR_max). With m_max = 1.5: α ≤ 0.109.
Recommended default: α = 0.065, which yields:
| Class | Derived multiplier | Recommended (rounded) |
|---|---|---|
| Self-assertion | 1.000 | 1.0× |
| Reference | 1.003 | 1.0× |
| Computational proof | 1.078 | 1.1× |
| Economic settlement | 1.150–1.195 | 1.2× |
| Staked commitment | 1.299 | 1.3× |
The recommended multipliers (1.0×–1.3×) are derived from the log-likelihood ratio formula m(c) = 1 + α·ln(LR(c)) with conservatism parameter α = 0.065. The values are rounded for simplicity. Implementers with domain-specific cost data MAY estimate P(c|S) for their context and derive adjusted multipliers, provided the monotonic ordering is preserved.
The derivation provides three benefits: (1) the multiplier ordering is guaranteed by the single-crossing condition, not chosen ad hoc; (2) implementers with domain-specific cost data can estimate P(c|S) for their context and derive adjusted multipliers; (3) the conservatism parameter α is explicit and tunable.
See also: Donath (2007, “Signals in Social Supernets”) for the conceptual bridge between biological signaling and online identity; Spence (1973) for the economic signaling analogue.
| Rating | Meaning | Classification |
|---|---|---|
1 | Actively harmful, deceptive, or malicious | Negative |
2 | Unreliable, frequently fails or misleads | Negative |
3 | Neutral, insufficient basis for judgment | Neutral |
4 | Reliable, generally trustworthy | Positive |
5 | Highly trustworthy, consistent track record | Positive |
Negative attestations (ratings 1-2) serve the role of rejection signals. A separate negative attestation mechanism is unnecessary — the rating scale encodes valence directly. This simplifies the protocol while preserving the rejection capability required for convergent inference (see Convergence Properties).
The context field uses a dot-separated namespace convention. No fixed enumeration — domains emerge from usage.
Core domains (RECOMMENDED as starting vocabulary):
| Context | Description |
|---|---|
reliability | Does the agent complete tasks as promised? |
accuracy | Is the agent's output correct and truthful? |
responsiveness | Does the agent respond in a timely manner? |
Extended domains use hierarchical dot-notation (convention, not enforced):
| Context | Description |
|---|---|
task/code-review | Code review quality |
task/translation | Translation accuracy and fluency |
task/payment-routing | Payment routing reliability |
task/data-extraction | Data extraction completeness |
Clients SHOULD normalize context strings to lowercase. New domains MAY be introduced by any attestor without protocol changes.
Attestations MAY include a task-type tag that categorizes the specific work performed:
["task-type", "task/code-review", "requester-confirmed"]
The third element indicates confirmation status:
| Status | Meaning |
|---|---|
attestor-proposed | Attestor suggested this categorization. Provisional. |
requester-confirmed | Requester validated the categorization. Canonical. |
Mechanism: The attestor proposes a task type when publishing the attestation. If the requester (the entity who requested the work) publishes their own attestation for the same interaction, they either confirm or override the type.
Scoring implications: Unconfirmed (attestor-proposed) task-type tags SHOULD decay at 2x the normal rate for their domain class. This makes provisional claims expire faster, incentivizing confirmation.
Convention emergence: Once a task-type accumulates sufficient requester confirmations across the network, it becomes a de facto standard. No registry or governance is needed — categories that describe real work persist; categories that don't, decay away.
| Tag | Required | Description |
|---|---|---|
d | MUST | Parameterized replaceable event identifier. Format: <subject-pubkey>:<context> |
p | MUST | Subject's pubkey. Enables querying all attestations for a given agent via {"#p": [...]} filters. |
t | MUST | Context category. Enables querying attestations by domain via {"#t": [...]} filters. |
expiration | MUST | Unix timestamp after which this attestation SHOULD be considered expired. Relays MAY discard expired events per NIP-40. |
v | SHOULD | Schema version. Current: 2. Events with v=1 remain valid and are processed with backward-compatible defaults. |
task-type | MAY | Task category with confirmation status. Format: ["task-type", "<type>", "<status>"] where status is attestor-proposed or requester-confirmed. See Task-Type Tags above. |
expiration tag is REQUIRED, not optional. This is a deliberate design choice addressing the temporal decay gap identified in attack scenario analysis. Attestations without expiration tags MUST be rejected by compliant clients.{
// other fields...
"kind": 30085,
"pubkey": "a1b2c3...attestor",
"created_at": 1711152000,
"tags": [
["d", "d4e5f6...subject:payment.reliability"],
["p", "d4e5f6...subject", "wss://relay.example.com"],
["t", "payment.reliability"],
["expiration", "1718928000"]
],
"content": "{\"subject\":\"d4e5f6...subject\",\"rating\":4,\"context\":\"payment.reliability\",\"confidence\":0.85,\"evidence\":\"Completed 12 task delegations without failure over 30 days\"}"
}
Clients MUST validate attestation events according to the following rules:
30085.content field MUST parse as valid JSON containing all required fields.subject field in content MUST match the p tag value.context field in content MUST be a non-empty string following namespace convention and MUST match the t tag value.d tag MUST equal <p-tag-value>:<t-tag-value>.rating MUST be an integer in [1, 5].confidence MUST be a number in [0.0, 1.0].expiration tag MUST be present. Events without it MUST be discarded.pubkey == subject) MUST be discarded.Clients compute reputation scores locally. Two tiers are defined. Clients MUST implement Tier 1. Clients MAY implement Tier 2.
All scoring uses a temporal decay function applied to each attestation based on its age. The recommended half-life is 90 days (7,776,000 seconds).
decay(t) = 2^(-(now - created_at) / half_life)
An attestation created 90 days ago has weight 0.5. At 180 days, weight 0.25. Clients SHOULD use a half-life between 30 and 180 days. The default SHOULD be 90 days.
Different attestation domains degrade at different rates. Skill-based competence drifts slowly; operational reliability changes fast. A single decay constant compresses two orthogonal degradation processes.
Three decay classes are defined:
| Class | Half-life | Example domains | Rationale |
|---|---|---|---|
slow | 180 days | task/code-review, task/translation, skill-based domains | Competence drifts gradually |
standard | 90 days | reliability, accuracy, general domains | Default for unclassified contexts |
fast | 30 days | task/payment-routing, responsiveness, operational domains | Performance depends on current network/system state |
Clients MUST maintain a namespace-to-class mapping. When a namespace is not in the mapping, the standard class (90-day half-life) MUST be used as fallback. The mapping is observer-configurable — different clients MAY classify the same namespace differently.
The decay function becomes:
half_life = half_life_for(context) // 180d, 90d, or 30d decay(t) = 2^(-(now - created_at) / half_life)
This is backward-compatible: clients that ignore decay classes use the 90-day default, which was the previous single value.
Tier 1 and Tier 2 scores are computed per-namespace. An agent can have high payment.reliability and undefined output.accuracy. Scores never collapse across namespaces.
For a subject S in context C, collect all valid, non-expired attestation events matching {"#p": [S], "#t": [C], "kinds": [30085]}. Compute:
neg_multiplier(rating) = 2.0 if rating <= 2 else 1.0 weight_i = confidence_i * decay_i * neg_multiplier(rating_i) score_T1 = sum(rating_i * weight_i) / sum(weight_i)
Result is a value in [1.0, 5.0]. If no valid attestations exist, the score is undefined (not zero). Clients MAY aggregate across a domain prefix (e.g., all payment.* namespaces) for summary display, but per-namespace scores are the canonical unit.
Asymmetric negative weighting: Negative attestations (rating <= 2) carry a 2x weight multiplier. This reflects the higher cost of producing negative signals (burning a relationship with the subject) and ensures that a small number of credible negative attestations can meaningfully counteract a larger volume of positive ones. The multiplier is capped at 2x to prevent reputation weaponization — a single negative attestation cannot dominate arbitrarily many positive ones.
Tier 1.5 replaces self-reported confidence with computed attestor reliability using peer prediction — specifically the Determinant Mutual Information (DMI) mechanism. Self-reported confidence is exploitable (rational attestors always report 1.0). DMI provides dominant-strategy incentive-compatible scoring: truthful reporting maximizes expected payoff without requiring ground truth.
Prerequisites:
Tier 1.5 activates when sufficient data density exists. For each attestor pair (A, B) in context C, DMI requires at least 2c shared subjects (subjects both A and B have attested in context C), where c is the number of rating categories. For a 5-point scale, this means ≥10 shared subjects per pair.
Algorithm:
(A, B) in context C, collect all subjects S that both A and B have attested.|S| < 2c (where c = 5), skip this pair — insufficient data for DMI. Fall back to raw confidence values.c × c joint distribution matrix M, where M[i][j] = fraction of shared subjects where A rated i and B rated j.det(M). The determinant factorizes through the strategy matrix:det(M) = det(Strategy_A) × det(Strategy_B) × det(TrueDistribution)
If either attestor uses an uninformative strategy (constant ratings, random ratings, or any rank-deficient strategy), their strategy matrix has det = 0, making det(M) = 0. Only informative, truthful strategies produce positive determinant.
A:dmi_score(A, C) = mean(det(M_AB) for all eligible pairs (A, B) in context C)
C to [0.0, 1.0]:reliability(A, C) = dmi_score(A, C) / max(dmi_score(*, C))
If max = 0 (no eligible pairs), all attestors fall back to raw confidence.
confidence_i with reliability(attestor_i, C) when Tier 1.5 is active:weight_i = reliability_i * decay_i * neg_multiplier(rating_i) * burst_decay(attestor_i)
Graceful degradation: In sparse networks (few shared subjects), Tier 1.5 is inactive and Tier 1 uses raw confidence. As network density grows and attestor pairs accumulate shared subjects, DMI activates automatically. This progression requires no configuration:
| Network state | Scoring |
|---|---|
| Sparse (< 10 shared subjects per pair) | Tier 1 with raw confidence |
| Moderate (≥ 10 shared subjects for some pairs) | Tier 1.5 for eligible pairs, raw confidence for others |
| Dense (≥ 10 shared subjects for most pairs) | Full Tier 1.5 replaces confidence |
payment.reliability naturally accumulate the shared observations needed for DMI computation. The mechanism rewards attestors who provide genuinely informative ratings and penalizes those who rubber-stamp or randomize — without any authority deciding who is trustworthy.Tier 2 measures structural independence among attestors. It penalizes concentrated attestation sources and rewards diverse, independent signals.
Algorithm:
S in context C.S).cluster_count = number of connected components. Let total_attestors = number of attestors.diversity = cluster_count / total_attestors
score_T2 = diversity * score_T1
When diversity = 1.0 (every attestor is in its own component, maximally independent), Tier 2 equals Tier 1. When diversity -> 0 (all attestors in one cluster), Tier 2 approaches zero regardless of ratings.
diversity = 1/100 = 0.01. Even with all ratings at 5 and confidence at 1.0, the Tier 2 score is 0.01 * 5.0 = 0.05. The star topology is structurally penalized.To penalize attestors who publish many attestations in a short window (carpet-bombing), observers SHOULD apply a confidence decay factor per attestor based on their recent attestation velocity.
Parameters (configurable by observer):
| Parameter | Default | Description |
|---|---|---|
window | 86400 (24h) | Sliding window in seconds. |
threshold | 5 | Maximum attestations in the window before decay applies. |
Algorithm:
For each attestor A, count the number of kind 30085 events published by A within the sliding window ending at now. Let count = number of events in the window. If count > threshold:
burst_decay(A) = 1 / sqrt(count)
If count <= threshold, burst_decay(A) = 1.0 (no penalty).
The burst_decay factor is applied multiplicatively to each attestation's weight in the Tier 1 and Tier 2 scoring formulas:
weight_i = confidence_i * decay_i * neg_multiplier(rating_i) * burst_decay(attestor_i)
1/sqrt(25) = 0.2. This penalizes carpet-bombing without blocking legitimate high-volume attestors who space their work across multiple windows. Observers compute this locally — no protocol-level enforcement is needed.There is no global reputation score. Each client computes scores from the attestation events available on its own relay set. Two observers querying different relays MAY compute different scores for the same subject. This is by design, not a bug.
Clients SHOULD query at least 3 independent relays when computing reputation scores. Clients SHOULD document which relay set was used when presenting a score to users.
An observer parameterizes scoring through an observer_config object. Three parameters form a sequential pipeline:
namespace_filter) — Which attestation namespaces the observer considers. Scoping by domain (e.g., only payment.*) determines which evidence enters computation. An empty filter means all namespaces.gamma_lambda) — Temporal decay rate controlling how quickly old attestations lose influence. Higher values increase recency sensitivity. Domain-dependent decay (Section 5.2) provides per-namespace overrides within this axis.R_0) — Baseline reference for calibration. The observer's prior expectation for attestation activity rate, against which current flow is measured. RECOMMENDED value: EMA_30(R_e) where R_e is the count of distinct active days in the attestation set.Pipeline: filter → weight → normalize
An observer (1) filters the attestation set by namespace, (2) weights each remaining attestation by temporal decay and confidence, and (3) normalizes the weighted result against a baseline. Each stage consumes the output of the previous stage. Stages are not commutative — filtering before weighting excludes irrelevant attestations from decay computation; the reverse would waste computation on attestations discarded later.
The three parameters are designed to be independently configurable. Changing the namespace filter SHOULD NOT change the temporal decay computation. Changing the decay rate SHOULD NOT change which namespaces are included. This axis independence is a structural property, not an implementation detail — it means test vectors can verify each axis in isolation.
Four independent systems converge on this same three-axis decomposition:
| System | Temporal | Baseline | Scope |
|---|---|---|---|
| NIP-XX | gamma_lambda | R_0 | namespace_filter |
| PDR | decay half-life | baseline session | task-type filter |
| arf-spec | measurement window | baseline session | (implicit) |
| ATSC | time window | trend slope | (implicit) |
When four independent designs converge on the same decomposition, the structure is in the problem domain, not any particular solution. Alternative decompositions bear the burden of explaining why this convergence is accidental.
Two observers with different observer_config objects WILL compute different scores for the same subject. This is intentional — the config encodes the observer's epistemic position, not a global truth. Divergence is bounded by parameter distance: same filter and decay with different R_0 produces the smallest divergence; different namespace filters produce the largest (disjoint evidence sets).
payment.reliability with fast decay; a content curator cares about output.accuracy with slow decay. The same attestation graph serves both without compromise.The attestation protocol is designed to satisfy the conditions for convergent decentralized inference, as described by the Collective Predictive Coding framework. Attestation is a naming game: an attestor "names" an agent as trustworthy (or not). Convergence to accurate shared beliefs requires:
expiration tag and decay function ensure the posterior is continuously updated. Stale observations are automatically discounted.When these three conditions hold, the acceptance probability for attestations follows the Metropolis-Hastings criterion: the community's collective attestation behavior converges toward accurate shared beliefs about agent trustworthiness, as if all observers were performing coordinated Bayesian inference — without any central coordinator.
Eight subsections address the protocol's attack surface, limitations, and privacy properties.
Alpha (the Tier 2 score) is a cost function, not an oracle. It operates at three layers:
Analogy: PageRank measures link cost, not page truth. Alpha measures attestation cost, not agent quality. Clients MUST NOT present alpha as "trust score" without this caveat.
Sockpuppet flood: N fake identities attest to a malicious agent. Star topology produces near-zero diversity score — Tier 2 catches this directly.
Cluster collusion: K real agents in a tight cluster falsely vouch for a target. Low diversity, but indistinguishable from legitimate community endorsement. Require attestations from multiple independent clusters for high-trust status.
Sybil bridge: Fake nodes bridge real clusters, simulating structural diversity. Bridge nodes must have verifiable bilateral interactions, not just graph presence.
Cost quantification: Creating K Sybil identities with independent channel capacity requires K * threshold_sats locked capital. With threshold_sats = max(FLOOR, median_channel_cap * K_factor) where FLOOR = 100000 sats, the minimum Sybil cost is 100K * K sats.
Ring bypass: 3+ node cycles can evade star-topology detection. Mitigation: all path diversity computation uses only simple paths — no vertex revisits.
Fix: per-attestor economic budget. For each channel key, aggregate ALL outbound attestations referencing that channel. The effective capacity per attestation = total_channel_capacity / count_of_attestations_referencing_that_channel.
This requires:
ln_node_sig over H(nostr_pubkey || funding_txid:vout)), not just npub.Matthew Effect: High-alpha nodes attract more attestations, increasing their alpha further. The R_e dual-coupling (decay rate depends on attestation rate which depends on reputation) creates positive feedback from individually fair components. Mitigation: lambda gets rectified consolidation term, threshold_eff gets exponential bidirectional damping.
Reputation ossification: Rational neglect of newcomers — why attest to unknowns when attesting to known-good agents is cheaper? Mitigation: cold-start bootstrapping via log compression (c = 0.685 at 100k sats means newcomers get meaningful alpha from first interaction).
Collusion rings: Groups exchanging valid Lightning preimages to create bilateral attestations without real service delivery. Path diversity is the primary defense — colluding rings have low vertex-disjoint path count to the external graph.
Fee-rate dependency: Attestation validity depends on channel existence, which depends on fee-rate economics. During fee spikes, channels close, invalidating attestations. Mitigation: closed-UTXO degradation to c_bootstrap rather than instant invalidation.
Three-tier economic evidence resolves the tension between privacy and verifiability:
| Tier | Evidence | Weight | Privacy cost |
|---|---|---|---|
| Public channel | funding_txid visible in BOLT 7 gossip | Full weight | None — already public data |
| Committed UTXO | Schnorr proof of UTXO ownership without revealing txid | Reduced weight | Minimal — proves capacity without linking |
| No evidence | Attestation without economic proof | Zero economic weight (c = 0) | None — alpha limited to path diversity only |
Clients SHOULD accept all three tiers. The weight reduction for committed UTXOs reflects reduced verifiability, not a privacy penalty.
Instant revocation: Publishing a new kind 30085 event with the same d-tag replaces the previous attestation (parameterized replaceable). To revoke, publish with rating = 0 or let the expiration tag lapse.
Bulk revocation attack: An attestor who has issued many attestations can revoke them all simultaneously, cratering multiple subjects' scores. Mitigation: observers SHOULD apply hysteresis — sudden bulk revocation triggers a grace period where the revoked attestations degrade gradually (e.g., over 7 days) rather than disappearing instantly.
Cycle prevention: All path diversity computation MUST use simple paths (no vertex revisits). This prevents synthetic diversity inflation via cycles in the attestation graph.
Eclipse attack: Adversary controls relay infrastructure, filtering negative attestations. Mitigation: query 10+ independent relays (see Observer Independence section). At 10+ relays, eclipse cost exceeds most agents' reputation value.
Relay divergence: Different relay sets produce different scores. This is by design (observer independence) but attackers can exploit it by publishing positive attestations to target relays and negative ones to obscure relays. Mitigation: clients SHOULD query diverse, well-known relay sets and document which set was used.
Reputation is overhead. When direct verification is cheaper than trust, use direct verification.
Use reputation when:
Do NOT use reputation when:
Relays SHOULD treat kind 30085 events as parameterized replaceable events per NIP-01. For each combination of pubkey, kind, and d tag, only the latest event is retained.
Relays MAY discard events whose expiration timestamp has passed, per NIP-40.
Relays SHOULD support filtering by #p and #t tags to enable efficient attestation queries.
Kind 30085 may be used by other applications for unrelated purposes. Conforming implementations MUST NOT assume that all kind 30085 events are NIP-XX attestations.
NIP-XX attestations are identifiable by their d-tag format: <64-char-hex-pubkey>:<namespace>. Clients SHOULD validate this format before processing an event as an attestation. Specifically:
d tag value MUST contain exactly one colon separator.p tag referencing the subject MUST be present.Events failing these checks SHOULD be silently discarded by attestation-processing clients. This d-tag structure provides natural disambiguation without requiring a dedicated kind number or additional tags.
Full working implementation in Python (zero dependencies):
nip_xx_reputation.pyattestation = {
"subject": "d4e5f6...subject",
"rating": 4,
"context": "payment.reliability",
"confidence": 0.85,
"evidence": "Completed 12 delegations over 30 days"
}
event = {
"kind": 30085,
"created_at": now(),
"tags": [
["d", attestation["subject"] + ":" + attestation["context"]],
["p", attestation["subject"], preferred_relay],
["t", attestation["context"]],
["expiration", str(now() + 90 * 86400)] # 90-day TTL
],
"content": json.dumps(attestation)
}
sign_and_publish(event)
DECAY_CLASSES = {
"slow": 180 * 86400,
"standard": 90 * 86400,
"fast": 30 * 86400,
}
NAMESPACE_MAP = {
"task/code-review": "slow",
"task/translation": "slow",
"task/payment-routing": "fast",
"responsiveness": "fast",
}
def half_life_for(context):
cls = NAMESPACE_MAP.get(context, "standard")
return DECAY_CLASSES[cls]
def tier1_score(subject, context, events):
half_life = half_life_for(context)
numerator = 0.0
denominator = 0.0
for event in events:
att = json.loads(event["content"])
if att["subject"] != subject: continue
if att["context"] != context: continue
if att["rating"] < 1 or att["rating"] > 5: continue
if att["confidence"] < 0.0 or att["confidence"] > 1.0: continue
if event["pubkey"] == subject: continue
age = now() - event["created_at"]
decay = 2 ** (-age / half_life)
# Unconfirmed task-type tags decay at 2x rate
task_type_tag = get_tag(event, "task-type")
if task_type_tag and len(task_type_tag) >= 3 and task_type_tag[2] == "attestor-proposed":
decay *= 0.5
weight = att["confidence"] * decay
numerator += att["rating"] * weight
denominator += weight
if denominator == 0:
return None
return numerator / denominator
The following test vectors allow implementers to validate their scoring implementation against known-correct results. All vectors use 2026-04-01T00:00:00Z (unix 1743465600) as "now" and a half-life of 90 days (7776000 seconds) for deterministic output.
Pubkey conventions:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeThree attestations for subject aaaa... in context payment.reliability:
Attestor A (rating 5, confidence 0.9, created 10 days ago):
{
"kind": 30085,
"pubkey": "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb",
"created_at": 1742601600,
"tags": [
["d", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:payment.reliability"],
["p", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
["t", "payment.reliability"],
["expiration", "1751241600"]
],
"content": "{\"subject\":\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\",\"rating\":5,\"context\":\"payment.reliability\",\"confidence\":0.9,\"evidence\":\"Test vector\"}"
}
Attestor B (rating 4, confidence 0.7, created 45 days ago):
{
"kind": 30085,
"pubkey": "cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc",
"created_at": 1739577600,
"tags": [
["d", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:payment.reliability"],
["p", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
["t", "payment.reliability"],
["expiration", "1751241600"]
],
"content": "{\"subject\":\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\",\"rating\":4,\"context\":\"payment.reliability\",\"confidence\":0.7,\"evidence\":\"Test vector\"}"
}
Attestor C (rating 2, confidence 0.8, created 5 days ago — negative):
{
"kind": 30085,
"pubkey": "dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd",
"created_at": 1743033600,
"tags": [
["d", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:payment.reliability"],
["p", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
["t", "payment.reliability"],
["expiration", "1751241600"]
],
"content": "{\"subject\":\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\",\"rating\":2,\"context\":\"payment.reliability\",\"confidence\":0.8,\"evidence\":\"Test vector\"}"
}
Expected computation:
half_life = 7776000
Attestor A:
age = 1743465600 - 1742601600 = 864000 seconds (10 days)
decay = 2^(-864000 / 7776000) = 2^(-0.111111) = 0.925875
neg_mult = 1.0 (rating 5 > 2)
weight = 0.9 * 0.925875 * 1.0 = 0.833287
numerator = 5 * 0.833287 = 4.166436
Attestor B:
age = 1743465600 - 1739577600 = 3888000 seconds (45 days)
decay = 2^(-3888000 / 7776000) = 2^(-0.5) = 0.707107
neg_mult = 1.0 (rating 4 > 2)
weight = 0.7 * 0.707107 * 1.0 = 0.494975
numerator = 4 * 0.494975 = 1.979899
Attestor C:
age = 1743465600 - 1743033600 = 432000 seconds (5 days)
decay = 2^(-432000 / 7776000) = 2^(-0.055556) = 0.962224
neg_mult = 2.0 (rating 2 <= 2)
weight = 0.8 * 0.962224 * 2.0 = 1.539558
numerator = 2 * 1.539558 = 3.079116
score_T1 = (4.166436 + 1.979899 + 3.079116) / (0.833287 + 0.494975 + 1.539558)
= 9.225451 / 2.867820
= 3.216886
An implementer whose Tier 1 score for this vector does not round to 3.2169 (four decimal places) has a bug.
The following event MUST be discarded because pubkey equals the subject field in the content:
{
"kind": 30085,
"pubkey": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"created_at": 1743033600,
"tags": [
["d", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa:payment.reliability"],
["p", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
["t", "payment.reliability"],
["expiration", "1751241600"]
],
"content": "{\"subject\":\"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\",\"rating\":5,\"context\":\"payment.reliability\",\"confidence\":1.0,\"evidence\":\"Self-attestation test\"}"
}
Expected result: Event is rejected at validation step 9. It MUST NOT contribute to any score computation.
Attestor D (eeee...) publishes 25 kind 30085 events within a 24-hour window. Default burst parameters: window = 86400, threshold = 5.
count = 25 (> threshold of 5) burst_decay = 1 / sqrt(25) = 1 / 5 = 0.2
Each of Attestor D's attestations has its weight multiplied by 0.2. For example, if Attestor D has a single attestation with rating 5, confidence 1.0, and decay 1.0:
weight_without_burst = 1.0 * 1.0 * 1.0 = 1.000 weight_with_burst = 1.0 * 1.0 * 1.0 * 0.2 = 0.200 contribution = 5 * 0.200 = 1.000
The burst penalty reduces Attestor D's effective influence by 80%.
Four attestors (A, B, C, D) attest to subject aaaa... in context payment.reliability. The attestor interaction graph has the following structure:
aaaa...) → connectedConnected components: {A, B}, {C}, {D} → cluster_count = 3
total_attestors = 4 cluster_count = 3 diversity = 3 / 4 = 0.75
Using the Tier 1 score from Test Vector 1 as an example:
score_T1 = 3.216886 score_T2 = diversity * score_T1 = 0.75 * 3.216886 = 2.412665
A maximally independent set (4 components out of 4 attestors) would produce diversity = 1.0 and score_T2 = score_T1. A fully connected set (1 component) would produce diversity = 0.25 and score_T2 = 0.804222.
The public attestation graph created by kind 30085 events has inherent privacy implications that implementers and users should understand.
Social graph exposure. Every attestation event links an attestor pubkey to a subject pubkey in a specific context domain. The aggregate set of attestations constitutes a directed, weighted, timestamped social graph. Anyone with relay access can reconstruct who trusts whom, in what domains, and how that trust evolved over time. This is a feature for reputation computation but a cost for privacy.
Temporal correlation. Attestation timestamps can be used to deanonymize attestors through correlation with external activity. If an attestor publishes attestations at times that correlate with known behavioral patterns (timezone, work hours, event attendance), the attestor's identity may be inferred even when using a pseudonymous pubkey.
Persistent conflict records. Negative attestations (ratings 1-2) create visible records of conflict or distrust. Unlike private reputation systems where negative signals are hidden, Nostr attestations are public events that persist on relays until expiration. This may discourage honest negative attestations — a chilling effect that weakens the convergence properties described in this NIP.
Relay-side graph observation. Relay operators can build complete attestor-subject graphs from the events they store and forward. A relay that handles a significant fraction of kind 30085 traffic has a comprehensive view of the reputation network topology, including which agents are controversial (many negative attestations) and which attestors are influential (high volume, high confidence).
Mitigations:
expiration tag provides eventual data minimization. Expired attestation events MAY be deleted by relays per NIP-40, reducing the long-term persistence of the social graph. Attestors SHOULD set expiration periods no longer than necessary for the context domain.Kind 30085 is a new event kind introduced by this NIP. Clients that do not implement NIP-XX will ignore these events per standard Nostr behavior. There are no backward compatibility issues with existing event kinds.
Attestations complement labels. NIP-32 labels classify content; NIP-XX attestations rate agents over time. Clients MAY interpret existing NIP-32 "positive" / "negative" labels as informal attestations but MUST NOT mix them in scoring algorithms. Label events lack the structured fields (confidence, decay class, evidence) required for Tier 1 computation, and including them would compromise score semantics.
NIP-56 reports are one-shot flags indicating content violations. NIP-XX attestations are ongoing assessments of agent behavior over time. The two are complementary, not conflicting. A report says “this event is bad”; an attestation says “this agent’s performance in this domain, over this period, is rated X.”
The current schema version is v=2. Clients encountering v=1 events SHOULD apply the following defaults for missing v2 fields:
task-type: absent (omit from scoring; treat as untyped attestation)decay class: standard (90-day half-life)Future schema versions MUST increment the v tag value. Clients encountering an unrecognized version SHOULD ignore the event rather than attempting partial parsing.
Implementations of this NIP fall into three levels. Each level builds on the previous.
Parse and validate kind 30085 events per the validation rules in this NIP. Compute Tier 1 scores using standard 90-day decay. Display per-namespace scores to the user.
Domain-dependent decay classes (slow, standard, fast). Tier 2 graph diversity scoring. Burst rate-limiting with sliding window. Task-type tag processing and filtering.
Tier 1.5 DMI peer prediction for attestor quality weighting. Cross-namespace aggregation display. Attestor interaction graph visualization. Custom decay class mappings.
| Spec Section | Minimal | Standard | Full |
|---|---|---|---|
| Event format & validation | ✓ | ✓ | ✓ |
| Tier 1 scoring (standard decay) | ✓ | ✓ | ✓ |
| Domain-dependent decay classes | ✓ | ✓ | |
| Burst rate-limiting | ✓ | ✓ | |
| Task-type tags | ✓ | ✓ | |
| Tier 2 graph diversity | ✓ | ✓ | |
| Tier 1.5 DMI peer prediction | ✓ | ||
| Cross-namespace aggregation | ✓ | ||
| Attestor graph visualization | ✓ |
NIP-XX connects to the emerging Nostr agent ecosystem through several complementary NIPs. These connections are recommendations, not requirements. Each NIP operates independently.
NIP-A5 defines a settlement-anchored trust model: reputation derives from completed economic transactions with cryptographic payment proof (L402/Lightning). NIP-XX defines a social-anchored model: reputation derives from attestation graph structure and scoring algorithms. These two approaches are complementary, not competing.
Post-service attestations (kind 38403) can feed NIP-XX scoring. After completing a NIP-A5 service agreement, the requester MAY publish a kind 30085 attestation referencing the agreement event as evidence using the nostr_event_ref evidence type, and including a lightning_preimage evidence entry from the L402 settlement. This creates a verifiable link between a completed service and a reputation signal.
Evidence weighting: Attestations backed by settlement proof (Lightning payment hash/preimage) carry stronger evidence than social-only attestations. Clients implementing Tier 1 scoring SHOULD apply a weight multiplier to the confidence value when lightning_preimage evidence is present and verified. A recommended multiplier is 1.2× (capped at confidence 1.0).
To join NIP-A5 and NIP-XX data by task category, clients can: (1) query kind 38403 events filtered by NIP-A5’s task categorization, (2) query kind 30085 events filtered by #t tag for the corresponding context domain. The task-type tag in NIP-XX events MAY be used for finer-grained matching but is not relay-indexed — clients perform this join locally. Implementers who need relay-level task-type filtering SHOULD request relay support for custom tag indexing per NIP-01’s extensible filter mechanism.
Job reviews (kind 31117) from NIP-AC are domain-specific evaluations that map naturally to attestation contexts. Clients MAY compute NIP-XX scores from NIP-AC review events as input signals. The task-type tag aligns with NIP-AC’s job type taxonomy.
NIP-AA defines agent identity and autonomy levels but defers its reputation algorithm. NIP-XX could serve as that module. NIP-AA’s autonomy levels could map to trust tier thresholds — for example, a Tier 1 score ≥ 4.0 required for AL-3 (fully autonomous) operations.
DVM result hashes are already supported as the nip90_result_hash evidence type in this NIP. DVM interactions — job requests, results, and payments — provide natural attestation opportunities where both requester and provider can attest to the other’s behavior.
A new agent has zero attestations and undefined reputation. This section addresses that cold-start problem.
Recommended bootstrapping path:
Cost analysis for implementers. Let n = number of attestations for a subject in a context, a = number of distinct attestors, s = number of shared subjects between attestor pairs.
| Tier | Complexity | Notes |
|---|---|---|
| Tier 1 | O(n) | Single pass through attestation set. One weighted-average computation. |
| Tier 1.5 (DMI) | O(a² × s) | Dominated by pairwise attestor comparisons across shared subjects. For 100 attestors with 50 shared subjects: ~500,000 operations. SHOULD be cached and updated incrementally. |
| Tier 2 | O(a² + a) | Dominated by building attestor interaction graph (pairwise check) + connected components via union-find. For 100 attestors: ~10,000 operations. |
Each kind 30085 event is approximately 500 bytes. A representative deployment of 1,000 attestors × 10 subjects × 5 contexts produces 50,000 events ≈ 25 MB. This is well within the capacity of a standard relay.
expiration tag defined there.| Date | Change | Reviewer |
|---|---|---|
| 2026-03-23 | Added structured evidence types (lightning_preimage, dvm_job_id, nostr_event_ref, free_text) with extensibility. Evidence field now accepts typed JSON array. | aec9180edbe1 |
| 2026-03-24 | Added nip90_result_hash evidence type for proving DVM result delivery. | aec9180edbe1 |
| 2026-03-23 | Added asymmetric negative attestation weighting (2x multiplier for ratings <= 2) to Tier 1 scoring. | aec9180edbe1 |
| 2026-03-23 | Added temporal burst rate-limiting with configurable sliding window and sqrt-based confidence decay. | aec9180edbe1 |
| 2026-03-24 | Replaced closed context enum with open namespace system. Attestation types are now freeform with dot-namespaced convention. Tier 2 scores computed per-namespace. | e0e247e9514f |
| 2026-03-24 | Added Tier 1.5: Attestor Quality via Peer Prediction (DMI mechanism) for computed attestor reliability replacing self-reported confidence. Graceful degradation from sparse to dense networks. | e0e247e9514f |
| 2026-03-26 | v5.3: Domain-dependent decay (slow/standard/fast half-life classes), open context namespace with extended domains, task-type tags with attestor-proposed/requester-confirmed status, schema version bumped to v=2. | — |
| 2026-03-27 | v5.5: Enhanced NIP-A5 interoperability section u2014 settlement-anchored vs social-anchored complementarity, evidence weighting guidance (1.2u00d7 for Lightning-backed), cross-protocol query pattern for task-category joins. | e0e247e9514f |
| 2026-03-27 | v5.6: Added commitment class taxonomy — formalized evidence-strength hierarchy (self-assertion → reference → computational proof → economic settlement → staked commitment) with scoring multipliers. | kai |
| 2026-03-27 | v5.7: Added theoretical foundation for commitment classes — Zahavian costly signaling theory, Grafen single-crossing condition, Donath bridge to digital identity. | kai |
| 2026-03-27 | v5.8: Formal derivation of commitment class weights from Grafen signaling equilibrium — Bayesian likelihood ratio analysis, conservatism parameter α = 0.065, derived multipliers match recommended defaults. | kai |
| 2026-04-03 | v10.6: Added Kind Collision and d-tag Disambiguation subsection. Documents kind 30085 collision with WebRTC applications; specifies d-tag format validation rules for conforming implementations. | 29043f3fb4f31 |