Skip to content
Registry Stack Docs Latest

Configuration

View as Markdown

registry-relay is configured by one YAML document. The binary chooses the first available source:

  1. --config <path>
  2. REGISTRY_RELAY_CONFIG
  3. ./config/example.yaml

The canonical sample is config/example.yaml. Keep examples aligned with this guide and the API and operations documentation.

instance: {}
server: {}
metadata: {} # optional split portable metadata manifest
catalog: {}
vocabularies: {}
auth: {}
audit: {}
deployment: {} # optional declared assurance profile, waivers, and evidence
config_trust: {} # optional governed config apply state
datasets: []
provenance: {} # optional
standards: {} # optional, feature-gated adapters

Unknown fields are rejected for most blocks. Config validation runs after YAML parsing and checks ids, scopes, table/entity references, filter references, aggregate references, env var presence, and vocabulary prefixes.

A minimal deployment needs server (a listener), catalog (public metadata base), auth (one auth mode), audit (a sink and hash secret), and at least one entry in datasets. Every other root block is optional. This example shows the required shape. For a runnable starting point, use config/example.yaml; env-backed API key configs also need a governed fingerprint.commitment generated by the relay binary for the configured key id.

server:
bind: 127.0.0.1:8080
catalog:
title: Example Registry Relay
base_url: http://127.0.0.1:8080
publisher: Example Ministry
auth:
mode: api_key
api_keys:
- id: demo_client
fingerprint:
provider: env
name: API_KEY_HASH
commitment: sha256:0000000000000000000000000000000000000000000000000000000000000000
scopes:
- people:metadata
- people:rows
audit:
sink: stdout
hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
datasets:
- id: people
title: People registry
description: Demo people records
owner: Example Ministry
sensitivity: personal
access_rights: restricted
update_frequency: monthly
tables:
- id: people_table
source:
type: file
path: ./data/people.csv
format:
csv:
header_row: 1
primary_key: person_id
schema:
strict: true
fields:
- name: person_id
type: string
nullable: false
- name: name
type: string
nullable: false
entities:
- name: person
table: people_table
fields:
- name: person_id
- name: name
access:
metadata_scope: people:metadata
aggregate_scope: people:metadata
read_scope: people:rows
api:
default_limit: 50
max_limit: 100

Replace the placeholder commitment with the value emitted by registry-relay generate-api-key --id demo_client. The API_KEY_HASH environment variable must contain the emitted fingerprint in the form sha256:<64 lowercase hex chars>. The REGISTRY_RELAY_AUDIT_HASH_SECRET environment variable must contain at least 32 bytes of random secret material; startup fails closed when it is absent or weak.

See config/example.yaml for a larger working starting point; the sections that follow document each block in full.

instance:
id: registry-relay-local
environment: development
owner: Ministry of Digital Government
jurisdiction: example-country

instance gives posture and operations tooling a stable public identity for the running service. id defaults to registry-relay-local; environment, owner, and jurisdiction are optional public labels.

server:
bind: 0.0.0.0:8080
admin_bind: 127.0.0.1:8081
openapi_requires_auth: true
cache_dir: ./cache
max_source_file_bytes: 268435456
xlsx_max_file_bytes: 268435456
request_timeout: 30s
request_body_timeout: 10s
http1_header_read_timeout: 10s
max_connections: 1024
cors:
allowed_origins:
- https://portal.example.gov
trust_proxy:
enabled: false
trusted_proxies: []

bind is the public data-plane listener. admin_bind is optional and must be private in production. cache_dir must be writable by the process. Source data must be mounted read-only.

openapi_requires_auth defaults to true. Set it to false only for local testing or controlled tooling environments that need unauthenticated access to /openapi.json; the unauthenticated document includes the full configured OpenAPI surface.

request_timeout bounds total request service time after HTTP headers are parsed. request_body_timeout bounds body reads for handlers that consume a request body. http1_header_read_timeout closes incomplete HTTP/1 headers before request work is admitted, and max_connections caps concurrent accepted sockets per listener. All timeouts must be non-zero and max_connections must be greater than zero.

HTTP/2 connections use the same finite connection cap and keepalive timeout. If production terminates HTTP/2 at a reverse proxy, configure bounded proxy header/body read timeouts and per-client connection limits before forwarding to Registry Relay.

The default CORS policy is deny by omission. Add explicit trusted origins only.

Most deployments can skip this section. config_trust is optional; it governs signed, threshold-approved config changes for high-assurance deployments. Simple local deployments omit it and keep using the local YAML loaded at startup.

This governed example is syntactically valid but illustrative. Generate the tuf_root_sha256 and targets-role signer key IDs from your own trusted TUF repository before using governed apply in an environment.

config_trust:
antirollback_state_path: /var/lib/registry-relay/config-antirollback.json
local_approval_state_path: /var/lib/registry-relay/config-local-approvals.json
break_glass_rate_limit:
max_accepted: 1
window_seconds: 3600
accepted_roots:
- root_id: ops-root
production: true
tuf_root_sha256: sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
valid_from_unix_seconds: 1770000000
valid_until_unix_seconds: 1772592000
high_risk_change_classes:
- auth_scopes
- signing_key_cleanup
- signing_key_rotation
signers:
"1111111111111111111111111111111111111111111111111111111111111111":
kid: "1111111111111111111111111111111111111111111111111111111111111111"
enabled: true
"2222222222222222222222222222222222222222222222222222222222222222":
kid: "2222222222222222222222222222222222222222222222222222222222222222"
enabled: true
roles:
- name: config-admin
threshold: 2
signer_kids:
- "1111111111111111111111111111111111111111111111111111111111111111"
- "2222222222222222222222222222222222222222222222222222222222222222"
allowed_change_classes:
- public_metadata

config_trust is optional. Simple local deployments omit it and keep using the local YAML loaded at startup. Governed config apply requires antirollback_state_path and local_approval_state_path, which must point to durable local state such as a mounted volume. break_glass_rate_limit is the trusted local rolling-window policy used for break-glass apply requests; when omitted it defaults to one accepted request per rate-limit identity per hour. Registry Relay fails closed for apply when required local state is absent, unreadable, stale, or inconsistent; verify and dry-run remain available.

Signed apply also requires at least one local accepted_roots entry that authorizes every change class in the signed target metadata. Registry Relay uses the local root only after TUF target verification succeeds. Verified TUF targets-role signature key IDs, not target-declared custom metadata, must satisfy one role threshold per change class. Inline admin YAML can be used for verify/dry-run checks, but apply requires a signed local TUF target and never treats raw inline YAML as signed governance input.

For TUF root rotation, add the new final tuf_root_sha256 as another local accepted_roots entry before applying bundles that verify through the rotated root. valid_from_unix_seconds and valid_until_unix_seconds are optional local bounds for overlap windows. Omit them for an indefinite local root; set them when old and new roots must both authorize bundles only during a planned transition window. Expired or not-yet-valid roots fail authorization even when the TUF metadata and signer quorum are otherwise valid.

catalog:
title: Internal Government Registry Relay
base_url: https://data.example.gov
publisher: Ministry of Digital Government
participant_id: did:web:data.example.gov
vocabularies:
psc: https://publicschema.org/
m8g: http://data.europa.eu/m8g/

base_url is used in generated catalog links, OpenAPI servers, and provenance subject URIs. participant_id is optional and defaults from the catalog base URL when omitted.

Vocabulary prefixes let entity fields and dataset metadata use compact semantic references such as psc:concepts/Person.

metadata:
source:
path: ./metadata.yaml

metadata.source.path points at a portable metadata manifest. Relative paths are resolved from the runtime config file. At startup, Registry Relay compiles the manifest and validates that runtime datasets, entities, fields, filters, and relationships are present in the metadata model. Add metadata.source.digest: sha256:<digest> when the deployment must pin the exact reviewed manifest.

ModeRequired configDigest ruleDelivery
Simple localmetadata.source.pathOptionalLocal file read at startup
Pinned localmetadata.source.path, metadata.source.digestMust match the local manifestLocal file read at startup
Governedconfig_trust, metadata.source.path, metadata.source.digestRequired before startup and applySigned config target plus signed metadata target; optional signed package index when package_digest is claimed

Keep operational details in this runtime config: sources, tables, physical columns, scopes, filters, aggregates, standards adapters, ingest, and refresh. Keep standard-facing meaning in the manifest: catalog, datasets, entities, fields, constraints, vocabularies, codelists, profiles, conformance claims, and descriptive ODRL policy metadata.

See metadata.md for the manifest schema, static publication, and the metadata.manifest.* / runtime.binding.* startup error codes.

ODRL policy belongs in the portable metadata manifest, not in runtime dataset bindings. A dataset policy block is published as an odrl:Offer for discovery and review evidence only. It does not change API-key scopes, OIDC authorization, row filtering, evidence verification, SP DCI behavior, or any other runtime access decision.

metadata:
source:
path: ./disability_registry.metadata.yaml
# In disability_registry.metadata.yaml:
datasets:
- id: disability_registry
policy:
uid: https://demo.example.gov/datasets/disability_registry#illustrative-offer
assigner: did:web:social-affairs.demo.example.gov
permissions:
- action: odrl:use
constraints:
- left_operand: odrl:purpose
operator: odrl:isA
right_operand:
iri: https://demo.example.gov/purpose/disability-benefit-eligibility
duties:
- action: odrl:attribute
prohibitions:
- action: odrl:sell

The demo policy IRIs under demo.example.gov are hypothetical examples for catalog consumers. They are not official policy, legal advice, or a declaration that a client has been approved to use the data.

SP DCI (the Social Protection Digital Convergence Initiative) sync adapters are optional and feature-gated. Build with --features spdci-api-standards to enable them. Without that feature, any standards.spdci config is rejected with spdci.config.feature_disabled.

The adapter does not add new storage semantics. Configure a normal Registry Relay entity, often backed by an XLSX worksheet, then bind the SP DCI sync routes to it:

standards:
spdci:
disability_registry:
dataset: disability_registry
entity: disabled_person
query_key: member.member_identifier
query_field: id
disabled_status_field: disability_status
disabled_positive_values: [approved, yes]
registries:
dr:
dataset: disability_registry
entity: disabled_person
registry_type: ns:org:RegistryType:DR
record_type: spdci-extensions-dci:DisabledPerson
identifiers:
DISABILITY_ID: id
MEMBER_ID: id
expression_fields:
disability_status: disability_status
disability_details.impairment_type: impairment_type

When enabled and configured, Registry Relay serves these SP DCI sync endpoints on the protected data-plane listener:

POST /dci/{registry}/registry/sync/search
POST /dci/{registry}/registry/sync/disabled
POST /dci/{registry}/registry/sync/get-disability-details
POST /dci/{registry}/registry/sync/get-disability-support

For sync/search, the {registry} segment selects any named standards.spdci.registries entry such as dr, sr, crvs, or fr, which lets one listener host multiple DCI registry APIs without path ambiguity. The disabled, get-disability-details, and get-disability-support routes are Disability Registry-specific and resolve only when the named registry entry points at the same dataset/entity as standards.spdci.disability_registry. The async /registry/search, subscribe, callback, and transaction-status APIs are intentionally not implemented by this sync adapter.

For generic sync search, identifiers maps DCI idtype-value query types to entity fields. expression_fields maps DCI expression or predicate attribute names to entity fields. Mapped fields must be exposed entity fields and allowed filters. The adapter currently supports idtype-value, expression $and with eq, in, ge, and le, and predicate conditions joined with and.

query_key is read from message.disabled_criteria.query in the SP DCI request envelope. It may be represented as a literal dotted JSON key ("member.member_identifier") or as nested objects ({"member": {"member_identifier": ...}}). query_field must be an allowed entity filter because the adapter delegates reads to the normal entity query engine.

For /dci/{registry}/registry/sync/disabled, the caller needs the entity evidence_verification_scope. Generic search, details, and support need the entity read_scope. API-key authentication is still Registry Relay’s normal auth layer. If a registry entry uses response_mapping_path, the binary must also be built with --features standards-cel-mapping; otherwise config validation fails with spdci.config.mapping_feature_disabled.

auth:
mode: api_key
api_keys:
- id: program_system
fingerprint:
provider: env
name: PROGRAM_SYSTEM_API_KEY_HASH
commitment: sha256:0000000000000000000000000000000000000000000000000000000000000000
scopes:
- social_registry:metadata
- social_registry:rows

The YAML stores committed fingerprint references, never raw API keys. Each env var value must be:

sha256:<64 lowercase hex chars>

Generate a raw key, its fingerprint, and the matching commitment:

Terminal window
registry-relay generate-api-key --id program_system

The command emits four shell-friendly lines:

api_key_id=program_system
api_key=<send-this-raw-key-to-the-client>
fingerprint=sha256:<store-this-in-the-secret-store>
commitment=sha256:<paste-this-into-config>

Store the emitted fingerprint in the platform secret store under the configured fingerprint.name. Paste the emitted commitment into fingerprint.commitment. Give the raw key only to the authorized client.

Worked standalone example, using demo_client and API_KEY_HASH:

api_key_id=demo_client
api_key=registry-relay-standalone-example-key-0001
fingerprint=sha256:db3f2a02c6ead9bf0387e8a97ec090a549daa46610ca87bd4b651631b2411def
commitment=sha256:1ee555b85da34dced897bc690053aaaedd4716f4b2972d556fa688b64ff55213
Terminal window
export API_KEY_HASH='sha256:db3f2a02c6ead9bf0387e8a97ec090a549daa46610ca87bd4b651631b2411def'
auth:
mode: api_key
api_keys:
- id: demo_client
fingerprint:
provider: env
name: API_KEY_HASH
commitment: sha256:1ee555b85da34dced897bc690053aaaedd4716f4b2972d556fa688b64ff55213
scopes:
- people:metadata
- people:rows

Do not reuse the example raw key in a real deployment.

Set auth.mode: oidc to verify bearer JWTs against an external OpenID Connect / OAuth2 IdP. Registry Relay is a resource server: it validates inbound tokens against the IdP’s JWKS but never mints, refreshes, or stores tokens. A given deployment runs in exactly one auth mode at a time; mixed-mode operation is not supported.

OIDC field names follow the shared Registry service runtime configuration conventions. Removed pre-convention names are rejected before deserialization with an error naming the replacement field.

auth:
mode: oidc
oidc:
issuer: https://idp.example.gov
audiences:
- registry-relay
discovery_url: https://idp.example.gov/.well-known/openid-configuration
allowed_algorithms:
- RS256
jwks_cache_ttl: 10m
leeway: 60s
scope_claim: scope
scope_map:
"role:social-registry-reader": "social_registry:rows"
scope_object_required_keys: []
allowed_clients: []
allowed_token_types:
- JWT
- at+jwt

A full drop-in alternative to config/example.yaml lives at config/example.oidc.yaml. It targets a local Zitadel instance and is what the integration test consumes.

FieldPurpose
issuerCompared verbatim against the JWT iss claim. Must match the IdP’s published issuer URL.
audiencesOne or more accepted aud values. Tokens whose aud does not intersect this list are rejected.
jwks_urlExplicit JWKS endpoint. Exactly one of jwks_url and discovery_url must be set; the validator rejects configs that supply both or neither.
discovery_urlOIDC discovery document (.well-known/openid-configuration). The JWKS URL is resolved from jwks_uri at startup.
allow_dev_insecure_fetch_urlsDevelopment-only opt-in for loopback HTTP issuer, discovery, and JWKS URLs. Defaults to false; non-loopback private and metadata IPs remain denied by the platform fetch policy.
allowed_algorithmsSignature algorithms accepted by the verifier. RS256, ES256, EdDSA. HS* and none are intentionally absent.
jwks_cache_ttlSteady-state JWKS cache TTL. The cache also refreshes on unknown kid (rate-limited), so this is the rotation pickup latency, not the upper bound.
leewayClock skew tolerance on exp and nbf. Bounded at 5 minutes by validation.
scope_claimName of the JWT claim to read scopes from (the config field itself is always a single string; defaults to scope). The claim’s value in the token may be a space-separated string (RFC 8693 / RFC 9068), a JSON array of strings, or a JSON object whose keys are the scope names. The aud claim is rejected as a scope source because it is used only for token audience validation. Object-valued role keys grant scopes only when scope_object_required_keys names a key present in the role value and that nested value is active: true, a non-empty string, or a non-empty object/array containing an active value.
scope_mapOptional rename map applied before scope-based access checks. Adapt IdP role names to Registry Relay’s <dataset_id>:<level> shape.
scope_object_required_keysAllowlist of keys that must appear inside object-valued role claim values before the role key is accepted. For Zitadel organization-scoped role objects, set this to the expected organization id key or keys. Defaults to empty, which means object-valued claims grant no scopes. String and array scope claims do not require this setting.
allowed_clientsOptional allowlist matched against the token’s azp (preferred) or client_id. Empty list means any client is accepted.
allowed_token_typesAccepted JOSE typ header values. Defaults to JWT and at+jwt (RFC 9068). ID tokens (id+jwt) are intentionally rejected by default, and tokens without typ are rejected by the shared verifier.

discovery_url triggers a single discovery fetch at startup to resolve jwks_uri; a failure here aborts the binary so an operator sees the IdP wiring problem instead of a process that runs but silently rejects every token. The JWKS document itself is fetched lazily on first verify, so a transient JWKS outage at boot does not block startup. Production defaults require HTTPS; local loopback HTTP requires allow_dev_insecure_fetch_urls: true.

Registry Relay never mints or refreshes tokens. Operators are responsible for provisioning OIDC applications, machine users, and grant types on the IdP. The Principal’s principal_id is taken from the token’s sub (preferred), then client_id, then azp; auth_mode=oidc is recorded on every audit record.

Token verification failures map to specific auth.* codes so audit pipelines can distinguish IdP outages from bad tokens from policy denials:

CodeHTTPMeaning
auth.missing_credential401No Authorization header
auth.malformed_credential401Wrong scheme, empty bearer, or unparseable JWT structure
auth.token_expired401exp claim is in the past (after leeway)
auth.token_not_yet_valid401nbf claim is in the future (after leeway)
auth.token_signature_invalid401JWKS key found but signature did not verify
auth.issuer_mismatch401iss claim does not match oidc.issuer
auth.audience_mismatch401aud claim does not intersect oidc.audiences
auth.kid_unknown401Header kid is absent from the JWKS even after one refresh
auth.algorithm_not_allowed401Header alg is not in the configured allowlist
auth.client_not_allowed403azp / client_id is not in the configured allowed_clients
auth.invalid_credential401JWT decode failure not covered by a more specific variant
auth.jwks_unavailable503JWKS fetch failed; Registry Relay cannot verify any token

For a worked example of running Registry Relay against a local OIDC provider (using the project’s dev Zitadel stack), see development.md.

audit:
sink: stdout
format: jsonl
hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
chain: true
include_health: false

Supported sinks:

audit:
sink: stdout
format: jsonl
hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
audit:
sink: file
format: jsonl
hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
path: /var/log/registry-relay/audit.jsonl
rotate:
max_size_mb: 100
max_files: 14
audit:
sink: syslog
format: jsonl
hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET

hash_secret_env is required at runtime and must name an environment variable containing at least 32 bytes of deployment-specific random secret material. Startup fails closed when it is missing, empty, unset, or weak.

Audit output uses registry-platform-audit envelopes with prev_hash and record_hash on every record. These fields detect ordering gaps and accidental corruption in retained logs, but they do not protect against an actor who can rewrite the audit sink. Use an append-only external sink or independent tail-hash anchoring when stronger integrity is required. chain is retained in config for compatibility with older deployments, but platform audit envelopes are always chained.

A normally booted relay always reports keyed integrity hmac in its posture because startup requires the audit hash secret (hash_secret_env); the none value appears only in dev or test configurations that build the posture without that secret.

Audit records are separate from operational logs, which go to stderr as readable text by default. Set REGISTRY_RELAY_LOG_FORMAT=json or REGISTRY_RELAY_LOG_FORMAT=jsonl when operational logs are emitted as JSON Lines for collection or redirected files.

write_policy selects what happens when an audit record cannot be written (for example the sink is unreachable or the disk is full):

audit:
sink: file
hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
path: /var/log/registry-relay/audit.jsonl
write_policy: availability_first # availability_first | fail_closed
  • availability_first (default): an audit write failure is logged and the request returns its original outcome unchanged. The deployment stays available even when audit is degraded. This is the historical behavior.
  • fail_closed: a request whose audit record cannot be written fails with HTTP 503 and the stable error code audit.write_failed (application/problem+json). No request outcome is returned without a durable audit record. Choose this when audit completeness is a hard requirement.

The policy applies to every audited route. Per-route-family selection is not configurable. The selected policy is reported truthfully as the write_policy fact in the operations posture audit block, so a deployment cannot claim a stronger guarantee than it runs.

The deployment block lets an operator declare the assurance level a deployment claims. The profile is never inferred from hostname, environment, or network position: it is an explicit statement. Each profile binds a set of gates that check the running configuration and contribute findings at a defined severity.

deployment:
profile: production # local | hosted_lab | production | evidence_grade
evidence:
ingress_rate_limit: true # operator asserts a gateway enforces rate limiting
api_key_rotation: true # operator asserts an API-key rotation process exists
waivers:
- finding: relay.openapi.public
reason: public API catalog is intentional for this deployment
expires: 2026-12-31

The deployment block is optional. When it is omitted, no gates bind and the deployment keeps its existing behavior exactly; the posture reports a single deployment.profile_undeclared warning so the choice is visible. An unknown profile value is rejected at startup.

Each gate maps to one of four severities per profile:

  • startup_fail: the process refuses to start. Never waivable.
  • readiness_fail: the readiness endpoint reports not-ready; the process keeps running.
  • finding_error / finding_warn: a posture finding only.

The four profiles escalate from local (binds no hard gates) through hosted_lab and production to evidence_grade (the strictest). For example, evidence_grade requires a signed, governed config bundle: running it from a plain local YAML file trips a startup_fail gate (relay.config.unsigned) and the process refuses to start.

Some controls live outside the relay and cannot be observed by the process (for example ingress rate limiting enforced by a gateway, or an API-key rotation process). The evidence flags let the operator assert those controls are in place. Each flag defaults to false, which leaves the corresponding gate active until the operator declares the control.

A triggered, waivable finding can be suppressed by a waiver that names the finding id, carries a free-text reason, and a mandatory expiry date (YYYY-MM-DD):

deployment:
profile: hosted_lab
waivers:
- finding: relay.ingress.rate_limit_missing
reason: rate limiting is handled by the lab gateway
expires: 2026-09-30

A waived finding reports status waived instead of its severity effect. Once the expiry date passes, the waiver stops suppressing the finding and the posture additionally raises deployment.waiver_expired. The expiry date is mandatory; reasons must be non-empty and must not contain secrets. startup_fail gates are never waivable.

Waiver reasons are only visible in the restricted posture tier; the default tier reports finding id, severity, and status but not the reason.

Finding idhosted_labproductionevidence_grade
relay.admin.public_exposureerrorreadiness_failstartup_fail
relay.openapi.publicwarnerrorerror
relay.ingress.rate_limit_missingwarnerrorerror
relay.oidc.client_allowlist_emptywarnerrorreadiness_fail
relay.auth.api_key_no_rotation_evidencewarnerrorerror
relay.config.unsignedwarnerrorstartup_fail
relay.audit.best_effort(not bound)warnreadiness_fail
relay.audit.sink_missingerrorreadiness_failstartup_fail

The current deployment profile, its findings, and active waivers are reported under deployment in the operations posture (GET /admin/v1/posture).

Each dataset combines private storage tables with public entities:

datasets:
- id: social_registry
title: Social Registry
description: Registry of households participating in Program X
owner: Ministry of Social Affairs
sensitivity: personal
access_rights: restricted
update_frequency: monthly
conforms_to:
- psc:concepts/Person
defaults:
materialization: snapshot
tables: []
entities: []

sensitivity, access_rights, and update_frequency are catalog metadata. They also make review conversations concrete; do not leave them vague in production configs. Allowed values:

  • sensitivity: public, internal, personal, confidential, or secret.
  • access_rights: public, restricted, or non_public.
  • update_frequency: continuous, daily, weekly, termly, monthly, quarterly, annual, irregular, as_needed, or unknown.

defaults is optional. It may provide materialization and refresh defaults for tables in the same dataset. Source configuration stays table-level.

Sources are configured on each private table. File sources read CSV, XLSX, or Parquet data:

source:
type: file
path: ./data/social_registry.xlsx
format:
xlsx:
sheet: Individuals
header_row: 1
data_range: A1:E100000

For CSV files, set format.csv.header_row: 1 when the first row contains column names. For XLSX files, header_row and data_range can be used when a worksheet has notes or title rows around the rectangular table. Source configuration is table-local: put file/database settings and format hints under each tables[].source.

Postgres snapshot and live table sources are supported. Credentials are never stored in YAML:

source:
type: postgres
connection_env: SOCIAL_REGISTRY_DATABASE_URL
table:
schema: public
name: individuals
change_token_sql: "select max(updated_at)::text from public.individuals"

connection_env is the environment variable name containing the connection string. Validation and logs may mention the env var name but must not read or print its value. The connection string must set sslmode=require; missing sslmode, sslmode=prefer, and sslmode=disable are rejected when the connector reads the environment variable. The native TLS connector validates the server certificate and hostname against the system trust store. Use read-only database credentials. Registry Relay opens read-only Postgres sessions for live scans, but credentials must enforce the same boundary at the database. table and query are mutually exclusive; prefer structured table configs for production.

Snapshot ingest reads Postgres through COPY (SELECT ...) TO STDOUT WITH CSV HEADER, then applies the same declared-schema coercion and validation as CSV files. The exported snapshot is bounded by server.max_source_file_bytes. For table sources, Registry Relay projects the declared schema fields from the table and casts them to CSV-friendly values. Extra database columns are ignored. For query sources, write a single SELECT or WITH statement without semicolons; public request input is never interpolated into SQL.

Live materialization is supported for structured table sources only. Each DataFusion scan opens a read-only Postgres session and exports data from the configured table. Simple column projection is pushed into the generated COPY query only when the scan has no filters. Filter-free limited scans may use the requested limit as a physical fetch bound; filtered scans, joins, and semantic limit enforcement remain gateway-side and may read up to the configured live row cap before local filtering. This keeps the live path bounded and safe without accepting caller-controlled SQL. Live row responses do not advertise snapshot-style strong validators or cursor version tokens, because upstream rows can change between requests without a Registry Relay ingest event. Live exports are also bounded by server.max_source_file_bytes. Use connect_timeout, query_timeout, live_max_connections, and live_max_rows to bound upstream behavior.

For production live sources, keep the contract deliberately narrow:

tables:
- id: individuals_table
materialization: live
primary_key: individual_id
refresh:
mode: manual
schema:
strict: true
fields:
- name: individual_id
type: string
nullable: false
- name: household_id
type: string
nullable: false
- name: updated_at
type: timestamp
nullable: true
source:
type: postgres
connection_env: SOCIAL_REGISTRY_DATABASE_URL
table:
schema: public
name: individuals
connect_timeout: 5s
query_timeout: 30s
live_max_connections: 8

The connection string must include sslmode=require and point to a read-only database role that can SELECT only the configured table or view. Do not use query sources, change_token_sql, or refresh.mode: mtime with live materialization; those are snapshot-only controls. Declared schema fields are the exported contract, and extra database columns are ignored unless an entity query needs a full local scan to evaluate filters.

Minimal source-only form:

source:
type: postgres
connection_env: SOCIAL_REGISTRY_DATABASE_URL
table:
schema: public
name: individuals
connect_timeout: 5s
query_timeout: 30s
live_max_connections: 8

Supported Postgres field mappings are:

string -> text
integer -> bigint
number -> double precision
boolean -> boolean
date -> date
timestamp -> timestamptz rendered as RFC 3339 UTC text
refresh:
mode: mtime
interval: 60s
refresh:
mode: interval
interval: 1h
refresh:
mode: manual

mtime reloads when the source change token changes. It is supported for file sources and for Postgres snapshot sources only when change_token_sql is configured. interval reloads on every interval. manual reloads only through the admin listener’s table reload route.

Tables are private storage resources. Their ids do not appear in public URLs.

tables:
- id: individuals_table
materialization: snapshot
source:
type: file
path: ./data/social_registry.xlsx
format:
xlsx:
sheet: Individuals
refresh:
mode: mtime
interval: 1h
primary_key: individual_id
schema:
strict: true
fields:
- name: individual_id
type: string
nullable: false
- name: payment_amount
type: number
nullable: true
unit: EUR

Supported formats are csv, xlsx, and parquet. If format is omitted, the loader infers from the source file extension where possible.

materialization may be snapshot or live. File sources support snapshot. Postgres sources support snapshot; Postgres structured table sources also support live.

Registry Relay derives datasource capabilities from source.type and materialization. Operators do not configure these flags directly.

SourceMaterializationFiltersProjectionLimitValidators and cursorsProvenance
filesnapshotgateway-sidegateway-sidegateway-sidestrong snapshot tokenssnapshot-backed
postgres table or querysnapshotgateway-sidegateway-sidegateway-sidestrong snapshot tokenssnapshot-backed
postgres tablelivegateway-sidePostgres column pushdown for filter-free scans, otherwise gateway-sidegateway-sideno strong snapshot tokensnot snapshot-backed

Unsupported combinations are rejected at config load: file live, Postgres live with a configured query, and live with mtime refresh. Postgres query sources stay snapshot-only so operator SQL is executed only during controlled ingest or refresh, never per public request. Future datasource connectors must follow the same convention: only generated SQL over structured table metadata may receive pushdown, and unsupported operations must fall back to gateway-side execution or be rejected explicitly.

At startup, Registry Relay logs one ingest.datasource_capabilities event per configured table. For Postgres live scans, the admin listener’s /metrics route also exports low-cardinality live scan metrics for scan duration, concurrency wait time, exported rows, and exported bytes. These metrics intentionally do not include dataset ids, table names, SQL, env vars, request ids, or row values.

Field types:

string, number, integer, boolean, date, timestamp

Use sensitive: true on source or entity fields whose query values are redacted or deterministically hashed in audit records. This flag is audit-only in beta: it does not hide a field from API responses and does not grant or deny read access.

Entities are the public REST resources:

entities:
- name: individual
title: Individual
description: A person enrolled in Program X
table: individuals_table
concept_uri: psc:concepts/Person
fields:
- name: id
from: individual_id
sensitive: true
- name: payment_amount
from: payment_amount
relationships:
- name: household
kind: belongs_to
target: household
foreign_key: household_id
access:
metadata_scope: social_registry:metadata
aggregate_scope: social_registry:aggregate
read_scope: social_registry:rows
evidence_verification_scope: social_registry:evidence_verification
api:
default_limit: 100
max_limit: 1000
require_purpose_header: true
required_filters:
- id
allowed_filters:
- field: id
ops: [eq, in]
allowed_expansions:
- household
publicschema:
target: Person
mapping_path: mappings/individual-person.publicschema.yaml
schema_validation_path: ../publicschema.org/dist/schemas/Person.schema.json

When fields is present, only listed fields are exposed. When it is omitted, every table column is exposed. For sensitive datasets, prefer an explicit field list. Use entity read_scope, required filters, purpose-header requirements, and explicit field projection for exposure control; sensitive: true controls audit redaction only.

Row-level authorization scopes are not supported. The row_scope resource setting is rejected by config parsing; model row exposure with dataset/entity read scopes, required filters, purpose headers, and projected fields instead.

Relationships are dataset-local in V1. Cross-dataset workflows must compose client-side with separate scoped calls and separate audit records.

Build with --features ogcapi-features to expose spatial entities through the protected /ogc/v1 surface. The feature does not add a top-level standards config block. Instead, opt in per entity with spatial:

spatial:
collection_id: facilities
title: Public facilities
description: Public facility locations from the civic registry.
geometry:
kind: point
longitude_field: lon
latitude_field: lat
crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
datetime_field: updated_at
max_bbox_degrees: 5.0
max_geometry_vertices: 10000

Phase 1 supports kind: point and kind: geojson. Point longitude, point latitude, datetime, and bbox helper fields must be exposed entity fields with compatible types. kind: geojson may use optional precomputed bbox fields:

spatial:
collection_id: parcels
geometry:
kind: geojson
field: geometry
crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
bbox_fields:
min_x: bbox_min_x
min_y: bbox_min_y
max_x: bbox_max_x
max_y: bbox_max_y

Only CRS84 is accepted. wkt and wkb parse as reserved geometry kinds but are rejected by V1 validation. Collection ids default to the entity name and must be unique within a dataset. OGC discovery uses metadata scope; feature item reads use read_scope and preserve entity required filters, purpose-header requirements, projection, and audit behavior.

Evidence offerings expose Registry Notary discovery metadata:

GET /metadata/evidence-offerings
GET /metadata/evidence-offerings/{offering_id}

Relay does not verify claims or evidence. registry-notary is the only claim/evidence verifier. The portable metadata manifest declares public offerings with access.kind: registry-notary, endpoint_url, discovery_url, and ruleset so clients can discover the Notary service that owns verification.

access:
evidence_verification_scope: social_registry:evidence_verification

evidence_verification_scope remains a scope label for standards adapters and integrations that need to distinguish evidence-oriented access from row reads. It does not enable a Relay-local verification endpoint.

Requires --features publicschema-cel. When present, entity-record VC issuance uses the mapping file to produce a PublicSchema.org credential subject instead of the default entity JSON shape.

publicschema:
target: Person # required; PublicSchema concept name
mapping_path: mappings/individual-person.publicschema.yaml # required; CEL mapping document
schema_validation_path: ../publicschema.org/dist/schemas/Person.schema.json # optional; validates subject before signing
context_url: https://publicschema.org/ctx/draft.jsonld # optional; overrides default context
schema_url: https://publicschema.org/schemas/Person.schema.json # optional; overrides default credentialSchema.id
credential_type: Person # optional; overrides default VC type[1]
FieldDefaultNotes
target(required)PublicSchema concept name; drives credential_type and schema_url defaults
mapping_path(required)Path to a CEL mapping YAML document; compiled at startup
schema_validation_pathabsentLocal JSON Schema; when set, every mapped subject is validated before signing
context_urlhttps://publicschema.org/ctx/draft.jsonldJSON-LD context URL in the issued VC
schema_urlhttps://publicschema.org/schemas/{target}.schema.jsoncredentialSchema.id in the issued VC
credential_type{target}type[1] value in the issued VC

See provenance.md for CEL context variables, issuance behavior, audit records, and the build and test commands for this feature.

Aggregates are declared on datasets and name their source entity:

aggregates:
- id: by_municipality
title: Individuals by municipality
description: Number of individuals by municipality
source_entity: individual
default_group_by:
- municipality_code
dimensions:
- id: municipality_code
label: Municipality
field: municipality_code
indicators:
- id: individual_count
label: Individuals
function: count
column: id
unit_measure: people
allowed_filters:
- field: municipality_code
ops: [eq, in]
- field: enrolled_on
ops: [gte, lte, between]
temporal_field: enrolled_on
disclosure_control:
min_group_size: 5
suppression: omit

Supported aggregate functions include the configured V1 set used by tests and examples, such as count, sum, and avg. The runtime config key remains indicators for compatibility; public aggregate APIs expose these configured series as measures. temporal_field is optional; when present, native aggregate temporal.from and temporal.to are translated into the declared range-capable allowed filter for that source-entity field. Dataset measure and dimension discovery is derived from these aggregate declarations, so keep ids stable and labels consumer-friendly. Keep disclosure thresholds explicit and reviewable.

Spatial EDR exposure is opt-in. Requires --features ogcapi-edr.

aggregates:
- id: by_admin_area
description: Individuals by administrative area
source_entity: individual
# ...dimensions, indicators, disclosure_control as normal...
spatial:
mode: admin_area
collection_id: by_admin_area # optional; defaults to aggregate id
dimension: municipality_code # declared dimension id used to join geometry
geometry_entity: municipality # entity name that holds geometry rows
geometry_id_field: code # field in geometry_entity matching the dimension values
geometry_field: geometry # geojson field in geometry_entity
bbox_fields: # optional precomputed bbox fields in geometry_entity
min_x: bbox_min_x
min_y: bbox_min_y
max_x: bbox_max_x
max_y: bbox_max_y
max_geometry_vertices: 10000 # optional; defaults to 10000
FieldDefaultNotes
mode(required)Must be admin_area
collection_idaggregate idOGC collection identifier; must be unique within the dataset
dimension(required)Declared aggregate dimension id whose values are joined to geometry
geometry_entity(required)Entity that holds one geometry row per dimension value
geometry_id_field(required)Field in geometry_entity that matches dimension values
geometry_field(required)GeoJSON geometry field in geometry_entity
bbox_fieldsabsentOptional precomputed bbox columns; same subkeys as entity spatial.bbox_fields
max_geometry_vertices10000Cap on GeoJSON vertices decoded from geometry_field

geometry_entity must be an entity declared in the same dataset. geometry_id_field and geometry_field must be exposed entity fields with compatible types (string/integer for id, geojson-typed string for geometry). Only kind: geojson geometry is supported for spatial aggregates in V1.

Provenance (response-credential issuer configuration)

Section titled “Provenance (response-credential issuer configuration)”

The provenance block is optional. When absent or enabled: false, the gateway behaves as a plain JSON service. When enabled, callers can opt in to signed response credentials (W3C VCDM 2.0 VC-JWT) with Accept: application/vc+jwt. V1 supports local Ed25519 signing from either a software env-var JWK or a file_watch JWK file.

The key is named provenance for compatibility; it governs the response-credential issuer (DID, signing key, claim validity, and accepted media types). These credentials are W3C VCDM 2.0 VC-JWT with a Registry Relay JSON-LD context; they are not W3C PROV-O.

See provenance.md for the full signer, DID, schema, context, and rotation contract.

  • Source files are read-only to the process.
  • cache_dir is writable and on a filesystem with enough space.
  • Every env-backed fingerprint.name exists in the runtime environment.
  • No raw key, fingerprint, private JWK, or full environment dump is logged.
  • Admin listener, if enabled, is private.
  • CORS origins are explicit.
  • Personal-data entities use explicit field projections.
  • Row and evidence-verification routes that need purpose tracking set require_purpose_header: true.
  • Sensitive identifier fields are marked sensitive: true where audit redaction is required.
  • Audit sink and retention match the deployment’s governance requirements.
  • For Postgres live tables, scrape /metrics from the admin listener and alert on live scan timeout/error growth, exported bytes, and concurrency wait time.