Configuration

registry-relay is configured by one YAML document. The binary chooses the first available source:

--config <path>
REGISTRY_RELAY_CONFIG
./config/example.yaml

The canonical sample is config/example.yaml. Keep examples aligned with this guide and the API and operations documentation.

Root shape

instance: {}
server: {}
metadata: {}   # optional split portable metadata manifest
catalog: {}
vocabularies: {}
auth: {}
audit: {}
deployment: {}   # optional declared assurance profile, waivers, and evidence
config_trust: {} # optional governed config apply state
datasets: []
provenance: {} # optional
standards: {}  # optional, feature-gated adapters

Unknown fields are rejected for most blocks. Config validation runs after YAML parsing and checks ids, scopes, table/entity references, filter references, aggregate references, env var presence, and vocabulary prefixes.

A minimal deployment needs server (a listener), catalog (public metadata base), auth (one auth mode), audit (a sink and hash secret), and at least one entry in datasets. Every other root block is optional. This example shows the required shape. For a runnable starting point, use config/example.yaml; env-backed API key configs also need a governed fingerprint.commitment generated by the relay binary for the configured key id.

server:
  bind: 127.0.0.1:8080

catalog:
  title: Example Registry Relay
  base_url: http://127.0.0.1:8080
  publisher: Example Ministry

auth:
  mode: api_key
  api_keys:
    - id: demo_client
      fingerprint:
        provider: env
        name: API_KEY_HASH
        commitment: sha256:0000000000000000000000000000000000000000000000000000000000000000
      scopes:
        - people:metadata
        - people:rows

audit:
  sink: stdout
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET

datasets:
  - id: people
    title: People registry
    description: Demo people records
    owner: Example Ministry
    sensitivity: personal
    access_rights: restricted
    update_frequency: monthly
    tables:
      - id: people_table
        source:
          type: file
          path: ./data/people.csv
          format:
            csv:
              header_row: 1
        primary_key: person_id
        schema:
          strict: true
          fields:
            - name: person_id
              type: string
              nullable: false
            - name: name
              type: string
              nullable: false
    entities:
      - name: person
        table: people_table
        fields:
          - name: person_id
          - name: name
        access:
          metadata_scope: people:metadata
          aggregate_scope: people:metadata
          read_scope: people:rows
        api:
          default_limit: 50
          max_limit: 100

Replace the placeholder commitment with the value emitted by registry-relay generate-api-key --id demo_client. The API_KEY_HASH environment variable must contain the emitted fingerprint in the form sha256:<64 lowercase hex chars>. The REGISTRY_RELAY_AUDIT_HASH_SECRET environment variable must contain at least 32 bytes of random secret material; startup fails closed when it is absent or weak.

See config/example.yaml for a larger working starting point; the sections that follow document each block in full.

Instance

instance:
  id: registry-relay-local
  environment: development
  owner: Ministry of Digital Government
  jurisdiction: example-country

instance gives posture and operations tooling a stable public identity for the running service. id defaults to registry-relay-local; environment, owner, and jurisdiction are optional public labels.

Server

server:
  bind: 0.0.0.0:8080
  admin_bind: 127.0.0.1:8081
  openapi_requires_auth: true
  cache_dir: ./cache
  max_source_file_bytes: 268435456
  xlsx_max_file_bytes: 268435456
  request_timeout: 30s
  request_body_timeout: 10s
  http1_header_read_timeout: 10s
  max_connections: 1024
  cors:
    allowed_origins:
      - https://portal.example.gov
  trust_proxy:
    enabled: false
    trusted_proxies: []

bind is the public data-plane listener. admin_bind is optional and must be private in production. cache_dir must be writable by the process. Source data must be mounted read-only.

openapi_requires_auth defaults to true. Set it to false only for local testing or controlled tooling environments that need unauthenticated access to /openapi.json; the unauthenticated document includes the full configured OpenAPI surface.

request_timeout bounds total request service time after HTTP headers are parsed. request_body_timeout bounds body reads for handlers that consume a request body. http1_header_read_timeout closes incomplete HTTP/1 headers before request work is admitted, and max_connections caps concurrent accepted sockets per listener. All timeouts must be non-zero and max_connections must be greater than zero.

HTTP/2 connections use the same finite connection cap and keepalive timeout. If production terminates HTTP/2 at a reverse proxy, configure bounded proxy header/body read timeouts and per-client connection limits before forwarding to Registry Relay.

The default CORS policy is deny by omission. Add explicit trusted origins only.

Governed config apply

Most deployments can skip this section. config_trust is optional; it governs signed, threshold-approved config changes for high-assurance deployments. Simple local deployments omit it and keep using the local YAML loaded at startup.

This governed example is syntactically valid but illustrative. Generate the tuf_root_sha256 and targets-role signer key IDs from your own trusted TUF repository before using governed apply in an environment.

config_trust:
  antirollback_state_path: /var/lib/registry-relay/config-antirollback.json
  local_approval_state_path: /var/lib/registry-relay/config-local-approvals.json
  break_glass_rate_limit:
    max_accepted: 1
    window_seconds: 3600
  accepted_roots:
    - root_id: ops-root
      production: true
      tuf_root_sha256: sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
      valid_from_unix_seconds: 1770000000
      valid_until_unix_seconds: 1772592000
      high_risk_change_classes:
        - auth_scopes
        - signing_key_cleanup
        - signing_key_rotation
      signers:
        "1111111111111111111111111111111111111111111111111111111111111111":
          kid: "1111111111111111111111111111111111111111111111111111111111111111"
          enabled: true
        "2222222222222222222222222222222222222222222222222222222222222222":
          kid: "2222222222222222222222222222222222222222222222222222222222222222"
          enabled: true
      roles:
        - name: config-admin
          threshold: 2
          signer_kids:
            - "1111111111111111111111111111111111111111111111111111111111111111"
            - "2222222222222222222222222222222222222222222222222222222222222222"
          allowed_change_classes:
            - public_metadata

config_trust is optional. Simple local deployments omit it and keep using the local YAML loaded at startup. Governed config apply requires antirollback_state_path and local_approval_state_path, which must point to durable local state such as a mounted volume. break_glass_rate_limit is the trusted local rolling-window policy used for break-glass apply requests; when omitted it defaults to one accepted request per rate-limit identity per hour. Registry Relay fails closed for apply when required local state is absent, unreadable, stale, or inconsistent; verify and dry-run remain available.

Signed apply also requires at least one local accepted_roots entry that authorizes every change class in the signed target metadata. Registry Relay uses the local root only after TUF target verification succeeds. Verified TUF targets-role signature key IDs, not target-declared custom metadata, must satisfy one role threshold per change class. Inline admin YAML can be used for verify/dry-run checks, but apply requires a signed local TUF target and never treats raw inline YAML as signed governance input.

For TUF root rotation, add the new final tuf_root_sha256 as another local accepted_roots entry before applying bundles that verify through the rotated root. valid_from_unix_seconds and valid_until_unix_seconds are optional local bounds for overlap windows. Omit them for an indefinite local root; set them when old and new roots must both authorize bundles only during a planned transition window. Expired or not-yet-valid roots fail authorization even when the TUF metadata and signer quorum are otherwise valid.

Catalog and vocabularies

catalog:
  title: Internal Government Registry Relay
  base_url: https://data.example.gov
  publisher: Ministry of Digital Government
  participant_id: did:web:data.example.gov

vocabularies:
  psc: https://publicschema.org/
  m8g: http://data.europa.eu/m8g/

base_url is used in generated catalog links, OpenAPI servers, and provenance subject URIs. participant_id is optional and defaults from the catalog base URL when omitted.

Vocabulary prefixes let entity fields and dataset metadata use compact semantic references such as psc:concepts/Person.

Split metadata manifest

metadata:
  source:
    path: ./metadata.yaml

metadata.source.path points at a portable metadata manifest. Relative paths are resolved from the runtime config file. At startup, Registry Relay compiles the manifest and validates that runtime datasets, entities, fields, filters, and relationships are present in the metadata model. Add metadata.source.digest: sha256:<digest> when the deployment must pin the exact reviewed manifest.

Mode	Required config	Digest rule	Delivery
Simple local	`metadata.source.path`	Optional	Local file read at startup
Pinned local	`metadata.source.path`, `metadata.source.digest`	Must match the local manifest	Local file read at startup
Governed	`config_trust`, `metadata.source.path`, `metadata.source.digest`	Required before startup and apply	Signed config target plus signed metadata target; optional signed package index when `package_digest` is claimed

Keep operational details in this runtime config: sources, tables, physical columns, scopes, filters, aggregates, standards adapters, ingest, and refresh. Keep standard-facing meaning in the manifest: catalog, datasets, entities, fields, constraints, vocabularies, codelists, profiles, conformance claims, and descriptive ODRL policy metadata.

See metadata.md for the manifest schema, static publication, and the metadata.manifest.* / runtime.binding.* startup error codes.

ODRL policy belongs in the portable metadata manifest, not in runtime dataset bindings. A dataset policy block is published as an odrl:Offer for discovery and review evidence only. It does not change API-key scopes, OIDC authorization, row filtering, evidence verification, SP DCI behavior, or any other runtime access decision.

metadata:
  source:
    path: ./disability_registry.metadata.yaml

# In disability_registry.metadata.yaml:
datasets:
  - id: disability_registry
    policy:
      uid: https://demo.example.gov/datasets/disability_registry#illustrative-offer
      assigner: did:web:social-affairs.demo.example.gov
      permissions:
        - action: odrl:use
          constraints:
            - left_operand: odrl:purpose
              operator: odrl:isA
              right_operand:
                iri: https://demo.example.gov/purpose/disability-benefit-eligibility
          duties:
            - action: odrl:attribute
      prohibitions:
        - action: odrl:sell

The demo policy IRIs under demo.example.gov are hypothetical examples for catalog consumers. They are not official policy, legal advice, or a declaration that a client has been approved to use the data.

SP DCI sync adapter

SP DCI (the Social Protection Digital Convergence Initiative) sync adapters are optional and feature-gated. Build with --features spdci-api-standards to enable them. Without that feature, any standards.spdci config is rejected with spdci.config.feature_disabled.

The adapter does not add new storage semantics. Configure a normal Registry Relay entity, often backed by an XLSX worksheet, then bind the SP DCI sync routes to it:

standards:
  spdci:
    disability_registry:
      dataset: disability_registry
      entity: disabled_person
      query_key: member.member_identifier
      query_field: id
      disabled_status_field: disability_status
      disabled_positive_values: [approved, yes]
    registries:
      dr:
        dataset: disability_registry
        entity: disabled_person
        registry_type: ns:org:RegistryType:DR
        record_type: spdci-extensions-dci:DisabledPerson
        identifiers:
          DISABILITY_ID: id
          MEMBER_ID: id
        expression_fields:
          disability_status: disability_status
          disability_details.impairment_type: impairment_type

When enabled and configured, Registry Relay serves these SP DCI sync endpoints on the protected data-plane listener:

POST /dci/{registry}/registry/sync/search
POST /dci/{registry}/registry/sync/disabled
POST /dci/{registry}/registry/sync/get-disability-details
POST /dci/{registry}/registry/sync/get-disability-support

For sync/search, the {registry} segment selects any named standards.spdci.registries entry such as dr, sr, crvs, or fr, which lets one listener host multiple DCI registry APIs without path ambiguity. The disabled, get-disability-details, and get-disability-support routes are Disability Registry-specific and resolve only when the named registry entry points at the same dataset/entity as standards.spdci.disability_registry. The async /registry/search, subscribe, callback, and transaction-status APIs are intentionally not implemented by this sync adapter.

For generic sync search, identifiers maps DCI idtype-value query types to entity fields. expression_fields maps DCI expression or predicate attribute names to entity fields. Mapped fields must be exposed entity fields and allowed filters. The adapter currently supports idtype-value, expression $and with eq, in, ge, and le, and predicate conditions joined with and.

query_key is read from message.disabled_criteria.query in the SP DCI request envelope. It may be represented as a literal dotted JSON key ("member.member_identifier") or as nested objects ({"member": {"member_identifier": ...}}). query_field must be an allowed entity filter because the adapter delegates reads to the normal entity query engine.

For /dci/{registry}/registry/sync/disabled, the caller needs the entity evidence_verification_scope. Generic search, details, and support need the entity read_scope. API-key authentication is still Registry Relay’s normal auth layer. If a registry entry uses response_mapping_path, the binary must also be built with --features standards-cel-mapping; otherwise config validation fails with spdci.config.mapping_feature_disabled.

API keys

auth:
  mode: api_key
  api_keys:
    - id: program_system
      fingerprint:
        provider: env
        name: PROGRAM_SYSTEM_API_KEY_HASH
        commitment: sha256:0000000000000000000000000000000000000000000000000000000000000000
      scopes:
        - social_registry:metadata
        - social_registry:rows

The YAML stores committed fingerprint references, never raw API keys. Each env var value must be:

sha256:<64 lowercase hex chars>

Generate a raw key, its fingerprint, and the matching commitment:

registry-relay generate-api-key --id program_system

The command emits four shell-friendly lines:

api_key_id=program_system
api_key=<send-this-raw-key-to-the-client>
fingerprint=sha256:<store-this-in-the-secret-store>
commitment=sha256:<paste-this-into-config>

Store the emitted fingerprint in the platform secret store under the configured fingerprint.name. Paste the emitted commitment into fingerprint.commitment. Give the raw key only to the authorized client.

Worked standalone example, using demo_client and API_KEY_HASH:

api_key_id=demo_client
api_key=registry-relay-standalone-example-key-0001
fingerprint=sha256:db3f2a02c6ead9bf0387e8a97ec090a549daa46610ca87bd4b651631b2411def
commitment=sha256:1ee555b85da34dced897bc690053aaaedd4716f4b2972d556fa688b64ff55213

export API_KEY_HASH='sha256:db3f2a02c6ead9bf0387e8a97ec090a549daa46610ca87bd4b651631b2411def'

auth:
  mode: api_key
  api_keys:
    - id: demo_client
      fingerprint:
        provider: env
        name: API_KEY_HASH
        commitment: sha256:1ee555b85da34dced897bc690053aaaedd4716f4b2972d556fa688b64ff55213
      scopes:
        - people:metadata
        - people:rows

Do not reuse the example raw key in a real deployment.

OIDC (OAuth2)

Set auth.mode: oidc to verify bearer JWTs against an external OpenID Connect / OAuth2 IdP. Registry Relay is a resource server: it validates inbound tokens against the IdP’s JWKS but never mints, refreshes, or stores tokens. A given deployment runs in exactly one auth mode at a time; mixed-mode operation is not supported.

OIDC field names follow the shared Registry service runtime configuration conventions. Removed pre-convention names are rejected before deserialization with an error naming the replacement field.

auth:
  mode: oidc
  oidc:
    issuer: https://idp.example.gov
    audiences:
      - registry-relay
    discovery_url: https://idp.example.gov/.well-known/openid-configuration
    allowed_algorithms:
      - RS256
    jwks_cache_ttl: 10m
    leeway: 60s
    scope_claim: scope
    scope_map:
      "role:social-registry-reader": "social_registry:rows"
    scope_object_required_keys: []
    allowed_clients: []
    allowed_token_types:
      - JWT
      - at+jwt

A full drop-in alternative to config/example.yaml lives at config/example.oidc.yaml. It targets a local Zitadel instance and is what the integration test consumes.

Field	Purpose
`issuer`	Compared verbatim against the JWT `iss` claim. Must match the IdP’s published issuer URL.
`audiences`	One or more accepted `aud` values. Tokens whose `aud` does not intersect this list are rejected.
`jwks_url`	Explicit JWKS endpoint. Exactly one of `jwks_url` and `discovery_url` must be set; the validator rejects configs that supply both or neither.
`discovery_url`	OIDC discovery document (`.well-known/openid-configuration`). The JWKS URL is resolved from `jwks_uri` at startup.
`allow_dev_insecure_fetch_urls`	Development-only opt-in for loopback HTTP issuer, discovery, and JWKS URLs. Defaults to `false`; non-loopback private and metadata IPs remain denied by the platform fetch policy.
`allowed_algorithms`	Signature algorithms accepted by the verifier. RS256, ES256, EdDSA. HS* and `none` are intentionally absent.
`jwks_cache_ttl`	Steady-state JWKS cache TTL. The cache also refreshes on unknown `kid` (rate-limited), so this is the rotation pickup latency, not the upper bound.
`leeway`	Clock skew tolerance on `exp` and `nbf`. Bounded at 5 minutes by validation.
`scope_claim`	Name of the JWT claim to read scopes from (the config field itself is always a single string; defaults to `scope`). The claim’s value in the token may be a space-separated string (RFC 8693 / RFC 9068), a JSON array of strings, or a JSON object whose keys are the scope names. The `aud` claim is rejected as a scope source because it is used only for token audience validation. Object-valued role keys grant scopes only when `scope_object_required_keys` names a key present in the role value and that nested value is active: `true`, a non-empty string, or a non-empty object/array containing an active value.
`scope_map`	Optional rename map applied before scope-based access checks. Adapt IdP role names to Registry Relay’s `<dataset_id>:<level>` shape.
`scope_object_required_keys`	Allowlist of keys that must appear inside object-valued role claim values before the role key is accepted. For Zitadel organization-scoped role objects, set this to the expected organization id key or keys. Defaults to empty, which means object-valued claims grant no scopes. String and array scope claims do not require this setting.
`allowed_clients`	Optional allowlist matched against the token’s `azp` (preferred) or `client_id`. Empty list means any client is accepted.
`allowed_token_types`	Accepted JOSE `typ` header values. Defaults to `JWT` and `at+jwt` (RFC 9068). ID tokens (`id+jwt`) are intentionally rejected by default, and tokens without `typ` are rejected by the shared verifier.

Discovery vs explicit JWKS

discovery_url triggers a single discovery fetch at startup to resolve jwks_uri; a failure here aborts the binary so an operator sees the IdP wiring problem instead of a process that runs but silently rejects every token. The JWKS document itself is fetched lazily on first verify, so a transient JWKS outage at boot does not block startup. Production defaults require HTTPS; local loopback HTTP requires allow_dev_insecure_fetch_urls: true.

Resource-server semantics

Registry Relay never mints or refreshes tokens. Operators are responsible for provisioning OIDC applications, machine users, and grant types on the IdP. The Principal’s principal_id is taken from the token’s sub (preferred), then client_id, then azp; auth_mode=oidc is recorded on every audit record.

Granular failure codes

Token verification failures map to specific auth.* codes so audit pipelines can distinguish IdP outages from bad tokens from policy denials:

Code	HTTP	Meaning
`auth.missing_credential`	401	No `Authorization` header
`auth.malformed_credential`	401	Wrong scheme, empty bearer, or unparseable JWT structure
`auth.token_expired`	401	`exp` claim is in the past (after `leeway`)
`auth.token_not_yet_valid`	401	`nbf` claim is in the future (after `leeway`)
`auth.token_signature_invalid`	401	JWKS key found but signature did not verify
`auth.issuer_mismatch`	401	`iss` claim does not match `oidc.issuer`
`auth.audience_mismatch`	401	`aud` claim does not intersect `oidc.audiences`
`auth.kid_unknown`	401	Header `kid` is absent from the JWKS even after one refresh
`auth.algorithm_not_allowed`	401	Header `alg` is not in the configured allowlist
`auth.client_not_allowed`	403	`azp` / `client_id` is not in the configured `allowed_clients`
`auth.invalid_credential`	401	JWT decode failure not covered by a more specific variant
`auth.jwks_unavailable`	503	JWKS fetch failed; Registry Relay cannot verify any token

For a worked example of running Registry Relay against a local OIDC provider (using the project’s dev Zitadel stack), see development.md.

Audit

audit:
  sink: stdout
  format: jsonl
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
  chain: true
  include_health: false

Supported sinks:

audit:
  sink: stdout
  format: jsonl
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET

audit:
  sink: file
  format: jsonl
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
  path: /var/log/registry-relay/audit.jsonl
  rotate:
    max_size_mb: 100
    max_files: 14

audit:
  sink: syslog
  format: jsonl
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET

hash_secret_env is required at runtime and must name an environment variable containing at least 32 bytes of deployment-specific random secret material. Startup fails closed when it is missing, empty, unset, or weak.

Audit output uses registry-platform-audit envelopes with prev_hash and record_hash on every record. These fields detect ordering gaps and accidental corruption in retained logs, but they do not protect against an actor who can rewrite the audit sink. Use an append-only external sink or independent tail-hash anchoring when stronger integrity is required. chain is retained in config for compatibility with older deployments, but platform audit envelopes are always chained.

A normally booted relay always reports keyed integrity hmac in its posture because startup requires the audit hash secret (hash_secret_env); the none value appears only in dev or test configurations that build the posture without that secret.

Audit records are separate from operational logs, which go to stderr as readable text by default. Set REGISTRY_RELAY_LOG_FORMAT=json or REGISTRY_RELAY_LOG_FORMAT=jsonl when operational logs are emitted as JSON Lines for collection or redirected files.

Write policy

write_policy selects what happens when an audit record cannot be written (for example the sink is unreachable or the disk is full):

audit:
  sink: file
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
  path: /var/log/registry-relay/audit.jsonl
  write_policy: availability_first   # availability_first | fail_closed

availability_first (default): an audit write failure is logged and the request returns its original outcome unchanged. The deployment stays available even when audit is degraded. This is the historical behavior.
fail_closed: a request whose audit record cannot be written fails with HTTP 503 and the stable error code audit.write_failed (application/problem+json). No request outcome is returned without a durable audit record. Choose this when audit completeness is a hard requirement.

The policy applies to every audited route. Per-route-family selection is not configurable. The selected policy is reported truthfully as the write_policy fact in the operations posture audit block, so a deployment cannot claim a stronger guarantee than it runs.

Deployment profile

The deployment block lets an operator declare the assurance level a deployment claims. The profile is never inferred from hostname, environment, or network position: it is an explicit statement. Each profile binds a set of gates that check the running configuration and contribute findings at a defined severity.

deployment:
  profile: production        # local | hosted_lab | production | evidence_grade
  evidence:
    ingress_rate_limit: true # operator asserts a gateway enforces rate limiting
    api_key_rotation: true   # operator asserts an API-key rotation process exists
  waivers:
    - finding: relay.openapi.public
      reason: public API catalog is intentional for this deployment
      expires: 2026-12-31

The deployment block is optional. When it is omitted, no gates bind and the deployment keeps its existing behavior exactly; the posture reports a single deployment.profile_undeclared warning so the choice is visible. An unknown profile value is rejected at startup.

Profiles and severities

Each gate maps to one of four severities per profile:

startup_fail: the process refuses to start. Never waivable.
readiness_fail: the readiness endpoint reports not-ready; the process keeps running.
finding_error / finding_warn: a posture finding only.

The four profiles escalate from local (binds no hard gates) through hosted_lab and production to evidence_grade (the strictest). For example, evidence_grade requires a signed, governed config bundle: running it from a plain local YAML file trips a startup_fail gate (relay.config.unsigned) and the process refuses to start.

Evidence declarations

Some controls live outside the relay and cannot be observed by the process (for example ingress rate limiting enforced by a gateway, or an API-key rotation process). The evidence flags let the operator assert those controls are in place. Each flag defaults to false, which leaves the corresponding gate active until the operator declares the control.

Waivers

A triggered, waivable finding can be suppressed by a waiver that names the finding id, carries a free-text reason, and a mandatory expiry date (YYYY-MM-DD):

deployment:
  profile: hosted_lab
  waivers:
    - finding: relay.ingress.rate_limit_missing
      reason: rate limiting is handled by the lab gateway
      expires: 2026-09-30

A waived finding reports status waived instead of its severity effect. Once the expiry date passes, the waiver stops suppressing the finding and the posture additionally raises deployment.waiver_expired. The expiry date is mandatory; reasons must be non-empty and must not contain secrets. startup_fail gates are never waivable.

Waiver reasons are only visible in the restricted posture tier; the default tier reports finding id, severity, and status but not the reason.

Findings catalog

Finding id	hosted_lab	production	evidence_grade
`relay.admin.public_exposure`	error	readiness_fail	startup_fail
`relay.openapi.public`	warn	error	error
`relay.ingress.rate_limit_missing`	warn	error	error
`relay.oidc.client_allowlist_empty`	warn	error	readiness_fail
`relay.auth.api_key_no_rotation_evidence`	warn	error	error
`relay.config.unsigned`	warn	error	startup_fail
`relay.audit.best_effort`	(not bound)	warn	readiness_fail
`relay.audit.sink_missing`	error	readiness_fail	startup_fail

The current deployment profile, its findings, and active waivers are reported under deployment in the operations posture (GET /admin/v1/posture).

Datasets

Each dataset combines private storage tables with public entities:

datasets:
  - id: social_registry
    title: Social Registry
    description: Registry of households participating in Program X
    owner: Ministry of Social Affairs
    sensitivity: personal
    access_rights: restricted
    update_frequency: monthly
    conforms_to:
      - psc:concepts/Person
    defaults:
      materialization: snapshot
    tables: []
    entities: []

sensitivity, access_rights, and update_frequency are catalog metadata. They also make review conversations concrete; do not leave them vague in production configs. Allowed values:

sensitivity: public, internal, personal, confidential, or secret.
access_rights: public, restricted, or non_public.
update_frequency: continuous, daily, weekly, termly, monthly, quarterly, annual, irregular, as_needed, or unknown.

defaults is optional. It may provide materialization and refresh defaults for tables in the same dataset. Source configuration stays table-level.

Sources

Sources are configured on each private table. File sources read CSV, XLSX, or Parquet data:

source:
  type: file
  path: ./data/social_registry.xlsx
  format:
    xlsx:
      sheet: Individuals
      header_row: 1
      data_range: A1:E100000

For CSV files, set format.csv.header_row: 1 when the first row contains column names. For XLSX files, header_row and data_range can be used when a worksheet has notes or title rows around the rectangular table. Source configuration is table-local: put file/database settings and format hints under each tables[].source.

Postgres snapshot and live table sources are supported. Credentials are never stored in YAML:

source:
  type: postgres
  connection_env: SOCIAL_REGISTRY_DATABASE_URL
  table:
    schema: public
    name: individuals
  change_token_sql: "select max(updated_at)::text from public.individuals"

connection_env is the environment variable name containing the connection string. Validation and logs may mention the env var name but must not read or print its value. The connection string must set sslmode=require; missing sslmode, sslmode=prefer, and sslmode=disable are rejected when the connector reads the environment variable. The native TLS connector validates the server certificate and hostname against the system trust store. Use read-only database credentials. Registry Relay opens read-only Postgres sessions for live scans, but credentials must enforce the same boundary at the database. table and query are mutually exclusive; prefer structured table configs for production.

Snapshot ingest reads Postgres through COPY (SELECT ...) TO STDOUT WITH CSV HEADER, then applies the same declared-schema coercion and validation as CSV files. The exported snapshot is bounded by server.max_source_file_bytes. For table sources, Registry Relay projects the declared schema fields from the table and casts them to CSV-friendly values. Extra database columns are ignored. For query sources, write a single SELECT or WITH statement without semicolons; public request input is never interpolated into SQL.

Live materialization is supported for structured table sources only. Each DataFusion scan opens a read-only Postgres session and exports data from the configured table. Simple column projection is pushed into the generated COPY query only when the scan has no filters. Filter-free limited scans may use the requested limit as a physical fetch bound; filtered scans, joins, and semantic limit enforcement remain gateway-side and may read up to the configured live row cap before local filtering. This keeps the live path bounded and safe without accepting caller-controlled SQL. Live row responses do not advertise snapshot-style strong validators or cursor version tokens, because upstream rows can change between requests without a Registry Relay ingest event. Live exports are also bounded by server.max_source_file_bytes. Use connect_timeout, query_timeout, live_max_connections, and live_max_rows to bound upstream behavior.

For production live sources, keep the contract deliberately narrow:

tables:
  - id: individuals_table
    materialization: live
    primary_key: individual_id
    refresh:
      mode: manual
    schema:
      strict: true
      fields:
        - name: individual_id
          type: string
          nullable: false
        - name: household_id
          type: string
          nullable: false
        - name: updated_at
          type: timestamp
          nullable: true
    source:
      type: postgres
      connection_env: SOCIAL_REGISTRY_DATABASE_URL
      table:
        schema: public
        name: individuals
      connect_timeout: 5s
      query_timeout: 30s
      live_max_connections: 8

The connection string must include sslmode=require and point to a read-only database role that can SELECT only the configured table or view. Do not use query sources, change_token_sql, or refresh.mode: mtime with live materialization; those are snapshot-only controls. Declared schema fields are the exported contract, and extra database columns are ignored unless an entity query needs a full local scan to evaluate filters.

Minimal source-only form:

source:
  type: postgres
  connection_env: SOCIAL_REGISTRY_DATABASE_URL
  table:
    schema: public
    name: individuals
  connect_timeout: 5s
  query_timeout: 30s
  live_max_connections: 8

Supported Postgres field mappings are:

string -> text
integer -> bigint
number -> double precision
boolean -> boolean
date -> date
timestamp -> timestamptz rendered as RFC 3339 UTC text

Refresh

refresh:
  mode: mtime
  interval: 60s

refresh:
  mode: interval
  interval: 1h

refresh:
  mode: manual

mtime reloads when the source change token changes. It is supported for file sources and for Postgres snapshot sources only when change_token_sql is configured. interval reloads on every interval. manual reloads only through the admin listener’s table reload route.

Tables

Tables are private storage resources. Their ids do not appear in public URLs.

tables:
  - id: individuals_table
    materialization: snapshot
    source:
      type: file
      path: ./data/social_registry.xlsx
      format:
        xlsx:
          sheet: Individuals
    refresh:
      mode: mtime
      interval: 1h
    primary_key: individual_id
    schema:
      strict: true
      fields:
        - name: individual_id
          type: string
          nullable: false
        - name: payment_amount
          type: number
          nullable: true
          unit: EUR

Supported formats are csv, xlsx, and parquet. If format is omitted, the loader infers from the source file extension where possible.

materialization may be snapshot or live. File sources support snapshot. Postgres sources support snapshot; Postgres structured table sources also support live.

Datasource capability matrix

Registry Relay derives datasource capabilities from source.type and materialization. Operators do not configure these flags directly.

Source	Materialization	Filters	Projection	Limit	Validators and cursors	Provenance
`file`	`snapshot`	gateway-side	gateway-side	gateway-side	strong snapshot tokens	snapshot-backed
`postgres` `table` or `query`	`snapshot`	gateway-side	gateway-side	gateway-side	strong snapshot tokens	snapshot-backed
`postgres` `table`	`live`	gateway-side	Postgres column pushdown for filter-free scans, otherwise gateway-side	gateway-side	no strong snapshot tokens	not snapshot-backed

Unsupported combinations are rejected at config load: file live, Postgres live with a configured query, and live with mtime refresh. Postgres query sources stay snapshot-only so operator SQL is executed only during controlled ingest or refresh, never per public request. Future datasource connectors must follow the same convention: only generated SQL over structured table metadata may receive pushdown, and unsupported operations must fall back to gateway-side execution or be rejected explicitly.

At startup, Registry Relay logs one ingest.datasource_capabilities event per configured table. For Postgres live scans, the admin listener’s /metrics route also exports low-cardinality live scan metrics for scan duration, concurrency wait time, exported rows, and exported bytes. These metrics intentionally do not include dataset ids, table names, SQL, env vars, request ids, or row values.

Field types:

string, number, integer, boolean, date, timestamp

Use sensitive: true on source or entity fields whose query values are redacted or deterministically hashed in audit records. This flag is audit-only in beta: it does not hide a field from API responses and does not grant or deny read access.

Entities

Entities are the public REST resources:

entities:
  - name: individual
    title: Individual
    description: A person enrolled in Program X
    table: individuals_table
    concept_uri: psc:concepts/Person
    fields:
      - name: id
        from: individual_id
        sensitive: true
      - name: payment_amount
        from: payment_amount
    relationships:
      - name: household
        kind: belongs_to
        target: household
        foreign_key: household_id
    access:
      metadata_scope: social_registry:metadata
      aggregate_scope: social_registry:aggregate
      read_scope: social_registry:rows
      evidence_verification_scope: social_registry:evidence_verification
    api:
      default_limit: 100
      max_limit: 1000
      require_purpose_header: true
      required_filters:
        - id
      allowed_filters:
        - field: id
          ops: [eq, in]
      allowed_expansions:
        - household
    publicschema:
      target: Person
      mapping_path: mappings/individual-person.publicschema.yaml
      schema_validation_path: ../publicschema.org/dist/schemas/Person.schema.json

When fields is present, only listed fields are exposed. When it is omitted, every table column is exposed. For sensitive datasets, prefer an explicit field list. Use entity read_scope, required filters, purpose-header requirements, and explicit field projection for exposure control; sensitive: true controls audit redaction only.

Row-level authorization scopes are not supported. The row_scope resource setting is rejected by config parsing; model row exposure with dataset/entity read scopes, required filters, purpose headers, and projected fields instead.

Relationships are dataset-local in V1. Cross-dataset workflows must compose client-side with separate scoped calls and separate audit records.

OGC API features

Build with --features ogcapi-features to expose spatial entities through the protected /ogc/v1 surface. The feature does not add a top-level standards config block. Instead, opt in per entity with spatial:

spatial:
  collection_id: facilities
  title: Public facilities
  description: Public facility locations from the civic registry.
  geometry:
    kind: point
    longitude_field: lon
    latitude_field: lat
    crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
  datetime_field: updated_at
  max_bbox_degrees: 5.0
  max_geometry_vertices: 10000

Phase 1 supports kind: point and kind: geojson. Point longitude, point latitude, datetime, and bbox helper fields must be exposed entity fields with compatible types. kind: geojson may use optional precomputed bbox fields:

spatial:
  collection_id: parcels
  geometry:
    kind: geojson
    field: geometry
    crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
  bbox_fields:
    min_x: bbox_min_x
    min_y: bbox_min_y
    max_x: bbox_max_x
    max_y: bbox_max_y

Only CRS84 is accepted. wkt and wkb parse as reserved geometry kinds but are rejected by V1 validation. Collection ids default to the entity name and must be unique within a dataset. OGC discovery uses metadata scope; feature item reads use read_scope and preserve entity required filters, purpose-header requirements, projection, and audit behavior.

Evidence verification

Evidence offerings expose Registry Notary discovery metadata:

GET /metadata/evidence-offerings
GET /metadata/evidence-offerings/{offering_id}

Relay does not verify claims or evidence. registry-notary is the only claim/evidence verifier. The portable metadata manifest declares public offerings with access.kind: registry-notary, endpoint_url, discovery_url, and ruleset so clients can discover the Notary service that owns verification.

access:
  evidence_verification_scope: social_registry:evidence_verification

evidence_verification_scope remains a scope label for standards adapters and integrations that need to distinguish evidence-oriented access from row reads. It does not enable a Relay-local verification endpoint.

PublicSchema VC mapping

Requires --features publicschema-cel. When present, entity-record VC issuance uses the mapping file to produce a PublicSchema.org credential subject instead of the default entity JSON shape.

publicschema:
  target: Person                                          # required; PublicSchema concept name
  mapping_path: mappings/individual-person.publicschema.yaml  # required; CEL mapping document
  schema_validation_path: ../publicschema.org/dist/schemas/Person.schema.json  # optional; validates subject before signing
  context_url: https://publicschema.org/ctx/draft.jsonld  # optional; overrides default context
  schema_url: https://publicschema.org/schemas/Person.schema.json  # optional; overrides default credentialSchema.id
  credential_type: Person                                 # optional; overrides default VC type[1]

Field	Default	Notes
`target`	(required)	PublicSchema concept name; drives `credential_type` and `schema_url` defaults
`mapping_path`	(required)	Path to a CEL mapping YAML document; compiled at startup
`schema_validation_path`	absent	Local JSON Schema; when set, every mapped subject is validated before signing
`context_url`	`https://publicschema.org/ctx/draft.jsonld`	JSON-LD context URL in the issued VC
`schema_url`	`https://publicschema.org/schemas/{target}.schema.json`	`credentialSchema.id` in the issued VC
`credential_type`	`{target}`	`type[1]` value in the issued VC

See provenance.md for CEL context variables, issuance behavior, audit records, and the build and test commands for this feature.

Aggregates

Aggregates are declared on datasets and name their source entity:

aggregates:
  - id: by_municipality
    title: Individuals by municipality
    description: Number of individuals by municipality
    source_entity: individual
    default_group_by:
      - municipality_code
    dimensions:
      - id: municipality_code
        label: Municipality
        field: municipality_code
    indicators:
      - id: individual_count
        label: Individuals
        function: count
        column: id
        unit_measure: people
    allowed_filters:
      - field: municipality_code
        ops: [eq, in]
      - field: enrolled_on
        ops: [gte, lte, between]
    temporal_field: enrolled_on
    disclosure_control:
      min_group_size: 5
      suppression: omit

Supported aggregate functions include the configured V1 set used by tests and examples, such as count, sum, and avg. The runtime config key remains indicators for compatibility; public aggregate APIs expose these configured series as measures. temporal_field is optional; when present, native aggregate temporal.from and temporal.to are translated into the declared range-capable allowed filter for that source-entity field. Dataset measure and dimension discovery is derived from these aggregate declarations, so keep ids stable and labels consumer-friendly. Keep disclosure thresholds explicit and reviewable.

Spatial EDR aggregates

Spatial EDR exposure is opt-in. Requires --features ogcapi-edr.

aggregates:
  - id: by_admin_area
    description: Individuals by administrative area
    source_entity: individual
    # ...dimensions, indicators, disclosure_control as normal...
    spatial:
      mode: admin_area
      collection_id: by_admin_area   # optional; defaults to aggregate id
      dimension: municipality_code   # declared dimension id used to join geometry
      geometry_entity: municipality  # entity name that holds geometry rows
      geometry_id_field: code        # field in geometry_entity matching the dimension values
      geometry_field: geometry       # geojson field in geometry_entity
      bbox_fields:                   # optional precomputed bbox fields in geometry_entity
        min_x: bbox_min_x
        min_y: bbox_min_y
        max_x: bbox_max_x
        max_y: bbox_max_y
      max_geometry_vertices: 10000   # optional; defaults to 10000

Field	Default	Notes
`mode`	(required)	Must be `admin_area`
`collection_id`	aggregate id	OGC collection identifier; must be unique within the dataset
`dimension`	(required)	Declared aggregate dimension id whose values are joined to geometry
`geometry_entity`	(required)	Entity that holds one geometry row per dimension value
`geometry_id_field`	(required)	Field in `geometry_entity` that matches dimension values
`geometry_field`	(required)	GeoJSON geometry field in `geometry_entity`
`bbox_fields`	absent	Optional precomputed bbox columns; same subkeys as entity `spatial.bbox_fields`
`max_geometry_vertices`	10000	Cap on GeoJSON vertices decoded from `geometry_field`

geometry_entity must be an entity declared in the same dataset. geometry_id_field and geometry_field must be exposed entity fields with compatible types (string/integer for id, geojson-typed string for geometry). Only kind: geojson geometry is supported for spatial aggregates in V1.

Provenance (response-credential issuer configuration)

The provenance block is optional. When absent or enabled: false, the gateway behaves as a plain JSON service. When enabled, callers can opt in to signed response credentials (W3C VCDM 2.0 VC-JWT) with Accept: application/vc+jwt. V1 supports local Ed25519 signing from either a software env-var JWK or a file_watch JWK file.

The key is named provenance for compatibility; it governs the response-credential issuer (DID, signing key, claim validity, and accepted media types). These credentials are W3C VCDM 2.0 VC-JWT with a Registry Relay JSON-LD context; they are not W3C PROV-O.

See provenance.md for the full signer, DID, schema, context, and rotation contract.

Production checklist

Source files are read-only to the process.
cache_dir is writable and on a filesystem with enough space.
Every env-backed fingerprint.name exists in the runtime environment.
No raw key, fingerprint, private JWK, or full environment dump is logged.
Admin listener, if enabled, is private.
CORS origins are explicit.
Personal-data entities use explicit field projections.
Row and evidence-verification routes that need purpose tracking set require_purpose_header: true.
Sensitive identifier fields are marked sensitive: true where audit redaction is required.
Audit sink and retention match the deployment’s governance requirements.
For Postgres live tables, scrape /metrics from the admin listener and alert on live scan timeout/error growth, exported bytes, and concurrency wait time.