Preview release. These docs are a work in progress. Pages are still being written, links may break, and structure may shift without notice. Treat everything here as a draft and report issues on GitHub.
registry-relay is configured by one YAML document. The binary chooses the first available source:
--config <path>REGISTRY_RELAY_CONFIG./config/example.yaml
The canonical sample is config/example.yaml. Keep examples aligned with this guide and the API and operations documentation.
Root shape
Section titled “Root shape”instance: {}server: {}metadata: {} # optional split portable metadata manifestcatalog: {}vocabularies: {}auth: {}audit: {}deployment: {} # optional declared assurance profile, waivers, and evidenceconfig_trust: {} # optional governed config apply statedatasets: []provenance: {} # optionalstandards: {} # optional, feature-gated adaptersUnknown fields are rejected for most blocks. Config validation runs after YAML parsing and checks ids, scopes, table/entity references, filter references, aggregate references, env var presence, and vocabulary prefixes.
A minimal deployment needs server (a listener), catalog (public metadata base), auth (one auth mode), audit (a sink and hash secret), and at least one entry in datasets. Every other root block is optional. This example shows the required shape. For a runnable starting point, use config/example.yaml; env-backed API key configs also need a governed fingerprint.commitment generated by the relay binary for the configured key id.
server: bind: 127.0.0.1:8080
catalog: title: Example Registry Relay base_url: http://127.0.0.1:8080 publisher: Example Ministry
auth: mode: api_key api_keys: - id: demo_client fingerprint: provider: env name: API_KEY_HASH commitment: sha256:0000000000000000000000000000000000000000000000000000000000000000 scopes: - people:metadata - people:rows
audit: sink: stdout hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
datasets: - id: people title: People registry description: Demo people records owner: Example Ministry sensitivity: personal access_rights: restricted update_frequency: monthly tables: - id: people_table source: type: file path: ./data/people.csv format: csv: header_row: 1 primary_key: person_id schema: strict: true fields: - name: person_id type: string nullable: false - name: name type: string nullable: false entities: - name: person table: people_table fields: - name: person_id - name: name access: metadata_scope: people:metadata aggregate_scope: people:metadata read_scope: people:rows api: default_limit: 50 max_limit: 100Replace the placeholder commitment with the value emitted by registry-relay generate-api-key --id demo_client. The API_KEY_HASH environment variable must contain the emitted fingerprint in the form sha256:<64 lowercase hex chars>. The REGISTRY_RELAY_AUDIT_HASH_SECRET environment variable must contain at least 32 bytes of random secret material; startup fails closed when it is absent or weak.
See config/example.yaml for a larger working starting point; the sections that follow document each block in full.
Instance
Section titled “Instance”instance: id: registry-relay-local environment: development owner: Ministry of Digital Government jurisdiction: example-countryinstance gives posture and operations tooling a stable public identity for the
running service. id defaults to registry-relay-local; environment, owner,
and jurisdiction are optional public labels.
Server
Section titled “Server”server: bind: 0.0.0.0:8080 admin_bind: 127.0.0.1:8081 openapi_requires_auth: true cache_dir: ./cache max_source_file_bytes: 268435456 xlsx_max_file_bytes: 268435456 request_timeout: 30s request_body_timeout: 10s http1_header_read_timeout: 10s max_connections: 1024 cors: allowed_origins: - https://portal.example.gov trust_proxy: enabled: false trusted_proxies: []bind is the public data-plane listener. admin_bind is optional and must be private in production. cache_dir must be writable by the process. Source data must be mounted read-only.
openapi_requires_auth defaults to true. Set it to false only for local testing or controlled tooling environments that need unauthenticated access to /openapi.json; the unauthenticated document includes the full configured OpenAPI surface.
request_timeout bounds total request service time after HTTP headers are parsed. request_body_timeout bounds body reads for handlers that consume a request body. http1_header_read_timeout closes incomplete HTTP/1 headers before request work is admitted, and max_connections caps concurrent accepted sockets per listener. All timeouts must be non-zero and max_connections must be greater than zero.
HTTP/2 connections use the same finite connection cap and keepalive timeout. If production terminates HTTP/2 at a reverse proxy, configure bounded proxy header/body read timeouts and per-client connection limits before forwarding to Registry Relay.
The default CORS policy is deny by omission. Add explicit trusted origins only.
Governed config apply
Section titled “Governed config apply”Most deployments can skip this section. config_trust is optional; it governs
signed, threshold-approved config changes for high-assurance deployments. Simple
local deployments omit it and keep using the local YAML loaded at startup.
This governed example is syntactically valid but illustrative. Generate the
tuf_root_sha256 and targets-role signer key IDs from your own trusted TUF
repository before using governed apply in an environment.
config_trust: antirollback_state_path: /var/lib/registry-relay/config-antirollback.json local_approval_state_path: /var/lib/registry-relay/config-local-approvals.json break_glass_rate_limit: max_accepted: 1 window_seconds: 3600 accepted_roots: - root_id: ops-root production: true tuf_root_sha256: sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef valid_from_unix_seconds: 1770000000 valid_until_unix_seconds: 1772592000 high_risk_change_classes: - auth_scopes - signing_key_cleanup - signing_key_rotation signers: "1111111111111111111111111111111111111111111111111111111111111111": kid: "1111111111111111111111111111111111111111111111111111111111111111" enabled: true "2222222222222222222222222222222222222222222222222222222222222222": kid: "2222222222222222222222222222222222222222222222222222222222222222" enabled: true roles: - name: config-admin threshold: 2 signer_kids: - "1111111111111111111111111111111111111111111111111111111111111111" - "2222222222222222222222222222222222222222222222222222222222222222" allowed_change_classes: - public_metadataconfig_trust is optional. Simple local deployments omit it and keep using the
local YAML loaded at startup. Governed config apply requires
antirollback_state_path and local_approval_state_path, which must point to
durable local state such as a mounted volume. break_glass_rate_limit is the
trusted local rolling-window policy used for break-glass apply requests; when
omitted it defaults to one accepted request per rate-limit identity per hour.
Registry Relay fails closed for
apply when required local state is absent, unreadable, stale, or inconsistent;
verify and dry-run remain available.
Signed apply also requires at least one local accepted_roots entry that
authorizes every change class in the signed target metadata. Registry Relay uses
the local root only after TUF target verification succeeds. Verified TUF
targets-role signature key IDs, not target-declared custom metadata, must
satisfy one role threshold per change class. Inline admin YAML can be used for
verify/dry-run checks, but apply requires a signed local TUF target and never
treats raw inline YAML as signed governance input.
For TUF root rotation, add the new final tuf_root_sha256 as another local
accepted_roots entry before applying bundles that verify through the rotated
root. valid_from_unix_seconds and valid_until_unix_seconds are optional
local bounds for overlap windows. Omit them for an indefinite local root; set
them when old and new roots must both authorize bundles only during a planned
transition window. Expired or not-yet-valid roots fail authorization even when
the TUF metadata and signer quorum are otherwise valid.
Catalog and vocabularies
Section titled “Catalog and vocabularies”catalog: title: Internal Government Registry Relay base_url: https://data.example.gov publisher: Ministry of Digital Government participant_id: did:web:data.example.gov
vocabularies: psc: https://publicschema.org/ m8g: http://data.europa.eu/m8g/base_url is used in generated catalog links, OpenAPI servers, and provenance subject URIs. participant_id is optional and defaults from the catalog base URL when omitted.
Vocabulary prefixes let entity fields and dataset metadata use compact semantic references such as psc:concepts/Person.
Split metadata manifest
Section titled “Split metadata manifest”metadata: source: path: ./metadata.yamlmetadata.source.path points at a portable metadata manifest. Relative paths
are resolved from the runtime config file. At startup, Registry Relay compiles
the manifest and validates that runtime datasets, entities, fields, filters, and
relationships are present in the metadata model. Add
metadata.source.digest: sha256:<digest> when the deployment must pin the
exact reviewed manifest.
| Mode | Required config | Digest rule | Delivery |
|---|---|---|---|
| Simple local | metadata.source.path | Optional | Local file read at startup |
| Pinned local | metadata.source.path, metadata.source.digest | Must match the local manifest | Local file read at startup |
| Governed | config_trust, metadata.source.path, metadata.source.digest | Required before startup and apply | Signed config target plus signed metadata target; optional signed package index when package_digest is claimed |
Keep operational details in this runtime config: sources, tables, physical columns, scopes, filters, aggregates, standards adapters, ingest, and refresh. Keep standard-facing meaning in the manifest: catalog, datasets, entities, fields, constraints, vocabularies, codelists, profiles, conformance claims, and descriptive ODRL policy metadata.
See metadata.md for the manifest schema, static publication, and
the metadata.manifest.* / runtime.binding.* startup error codes.
ODRL policy belongs in the portable metadata manifest, not in runtime dataset
bindings. A dataset policy block is published as an odrl:Offer for discovery
and review evidence only. It does not change API-key scopes, OIDC authorization,
row filtering, evidence verification, SP DCI behavior, or any other runtime access
decision.
metadata: source: path: ./disability_registry.metadata.yaml
# In disability_registry.metadata.yaml:datasets: - id: disability_registry policy: uid: https://demo.example.gov/datasets/disability_registry#illustrative-offer assigner: did:web:social-affairs.demo.example.gov permissions: - action: odrl:use constraints: - left_operand: odrl:purpose operator: odrl:isA right_operand: iri: https://demo.example.gov/purpose/disability-benefit-eligibility duties: - action: odrl:attribute prohibitions: - action: odrl:sellThe demo policy IRIs under demo.example.gov are hypothetical examples for
catalog consumers. They are not official policy, legal advice, or a declaration
that a client has been approved to use the data.
SP DCI sync adapter
Section titled “SP DCI sync adapter”SP DCI (the Social Protection Digital Convergence Initiative) sync adapters are optional and feature-gated. Build with --features spdci-api-standards to enable them. Without that feature, any standards.spdci config is rejected with spdci.config.feature_disabled.
The adapter does not add new storage semantics. Configure a normal Registry Relay entity, often backed by an XLSX worksheet, then bind the SP DCI sync routes to it:
standards: spdci: disability_registry: dataset: disability_registry entity: disabled_person query_key: member.member_identifier query_field: id disabled_status_field: disability_status disabled_positive_values: [approved, yes] registries: dr: dataset: disability_registry entity: disabled_person registry_type: ns:org:RegistryType:DR record_type: spdci-extensions-dci:DisabledPerson identifiers: DISABILITY_ID: id MEMBER_ID: id expression_fields: disability_status: disability_status disability_details.impairment_type: impairment_typeWhen enabled and configured, Registry Relay serves these SP DCI sync endpoints on the protected data-plane listener:
POST /dci/{registry}/registry/sync/searchPOST /dci/{registry}/registry/sync/disabledPOST /dci/{registry}/registry/sync/get-disability-detailsPOST /dci/{registry}/registry/sync/get-disability-supportFor sync/search, the {registry} segment selects any named standards.spdci.registries entry such as dr, sr, crvs, or fr, which lets one listener host multiple DCI registry APIs without path ambiguity. The disabled, get-disability-details, and get-disability-support routes are Disability Registry-specific and resolve only when the named registry entry points at the same dataset/entity as standards.spdci.disability_registry. The async /registry/search, subscribe, callback, and transaction-status APIs are intentionally not implemented by this sync adapter.
For generic sync search, identifiers maps DCI idtype-value query types to entity fields. expression_fields maps DCI expression or predicate attribute names to entity fields. Mapped fields must be exposed entity fields and allowed filters. The adapter currently supports idtype-value, expression $and with eq, in, ge, and le, and predicate conditions joined with and.
query_key is read from message.disabled_criteria.query in the SP DCI request envelope. It may be represented as a literal dotted JSON key ("member.member_identifier") or as nested objects ({"member": {"member_identifier": ...}}). query_field must be an allowed entity filter because the adapter delegates reads to the normal entity query engine.
For /dci/{registry}/registry/sync/disabled, the caller needs the entity evidence_verification_scope. Generic search, details, and support need the entity read_scope. API-key authentication is still Registry Relay’s normal auth layer. If a registry entry uses response_mapping_path, the binary must also be built with --features standards-cel-mapping; otherwise config validation fails with spdci.config.mapping_feature_disabled.
API keys
Section titled “API keys”auth: mode: api_key api_keys: - id: program_system fingerprint: provider: env name: PROGRAM_SYSTEM_API_KEY_HASH commitment: sha256:0000000000000000000000000000000000000000000000000000000000000000 scopes: - social_registry:metadata - social_registry:rowsThe YAML stores committed fingerprint references, never raw API keys. Each env var value must be:
sha256:<64 lowercase hex chars>Generate a raw key, its fingerprint, and the matching commitment:
registry-relay generate-api-key --id program_systemThe command emits four shell-friendly lines:
api_key_id=program_systemapi_key=<send-this-raw-key-to-the-client>fingerprint=sha256:<store-this-in-the-secret-store>commitment=sha256:<paste-this-into-config>Store the emitted fingerprint in the platform secret store under the configured fingerprint.name. Paste the emitted commitment into fingerprint.commitment. Give the raw key only to the authorized client.
Worked standalone example, using demo_client and API_KEY_HASH:
api_key_id=demo_clientapi_key=registry-relay-standalone-example-key-0001fingerprint=sha256:db3f2a02c6ead9bf0387e8a97ec090a549daa46610ca87bd4b651631b2411defcommitment=sha256:1ee555b85da34dced897bc690053aaaedd4716f4b2972d556fa688b64ff55213export API_KEY_HASH='sha256:db3f2a02c6ead9bf0387e8a97ec090a549daa46610ca87bd4b651631b2411def'auth: mode: api_key api_keys: - id: demo_client fingerprint: provider: env name: API_KEY_HASH commitment: sha256:1ee555b85da34dced897bc690053aaaedd4716f4b2972d556fa688b64ff55213 scopes: - people:metadata - people:rowsDo not reuse the example raw key in a real deployment.
OIDC (OAuth2)
Section titled “OIDC (OAuth2)”Set auth.mode: oidc to verify bearer JWTs against an external OpenID Connect / OAuth2 IdP. Registry Relay is a resource server: it validates inbound tokens against the IdP’s JWKS but never mints, refreshes, or stores tokens. A given deployment runs in exactly one auth mode at a time; mixed-mode operation is not supported.
OIDC field names follow the shared Registry service runtime configuration conventions. Removed pre-convention names are rejected before deserialization with an error naming the replacement field.
auth: mode: oidc oidc: issuer: https://idp.example.gov audiences: - registry-relay discovery_url: https://idp.example.gov/.well-known/openid-configuration allowed_algorithms: - RS256 jwks_cache_ttl: 10m leeway: 60s scope_claim: scope scope_map: "role:social-registry-reader": "social_registry:rows" scope_object_required_keys: [] allowed_clients: [] allowed_token_types: - JWT - at+jwtA full drop-in alternative to config/example.yaml lives at config/example.oidc.yaml. It targets a local Zitadel instance and is what the integration test consumes.
| Field | Purpose |
|---|---|
issuer | Compared verbatim against the JWT iss claim. Must match the IdP’s published issuer URL. |
audiences | One or more accepted aud values. Tokens whose aud does not intersect this list are rejected. |
jwks_url | Explicit JWKS endpoint. Exactly one of jwks_url and discovery_url must be set; the validator rejects configs that supply both or neither. |
discovery_url | OIDC discovery document (.well-known/openid-configuration). The JWKS URL is resolved from jwks_uri at startup. |
allow_dev_insecure_fetch_urls | Development-only opt-in for loopback HTTP issuer, discovery, and JWKS URLs. Defaults to false; non-loopback private and metadata IPs remain denied by the platform fetch policy. |
allowed_algorithms | Signature algorithms accepted by the verifier. RS256, ES256, EdDSA. HS* and none are intentionally absent. |
jwks_cache_ttl | Steady-state JWKS cache TTL. The cache also refreshes on unknown kid (rate-limited), so this is the rotation pickup latency, not the upper bound. |
leeway | Clock skew tolerance on exp and nbf. Bounded at 5 minutes by validation. |
scope_claim | Name of the JWT claim to read scopes from (the config field itself is always a single string; defaults to scope). The claim’s value in the token may be a space-separated string (RFC 8693 / RFC 9068), a JSON array of strings, or a JSON object whose keys are the scope names. The aud claim is rejected as a scope source because it is used only for token audience validation. Object-valued role keys grant scopes only when scope_object_required_keys names a key present in the role value and that nested value is active: true, a non-empty string, or a non-empty object/array containing an active value. |
scope_map | Optional rename map applied before scope-based access checks. Adapt IdP role names to Registry Relay’s <dataset_id>:<level> shape. |
scope_object_required_keys | Allowlist of keys that must appear inside object-valued role claim values before the role key is accepted. For Zitadel organization-scoped role objects, set this to the expected organization id key or keys. Defaults to empty, which means object-valued claims grant no scopes. String and array scope claims do not require this setting. |
allowed_clients | Optional allowlist matched against the token’s azp (preferred) or client_id. Empty list means any client is accepted. |
allowed_token_types | Accepted JOSE typ header values. Defaults to JWT and at+jwt (RFC 9068). ID tokens (id+jwt) are intentionally rejected by default, and tokens without typ are rejected by the shared verifier. |
Discovery vs explicit JWKS
Section titled “Discovery vs explicit JWKS”discovery_url triggers a single discovery fetch at startup to resolve jwks_uri; a failure here aborts the binary so an operator sees the IdP wiring problem instead of a process that runs but silently rejects every token. The JWKS document itself is fetched lazily on first verify, so a transient JWKS outage at boot does not block startup. Production defaults require HTTPS; local loopback HTTP requires allow_dev_insecure_fetch_urls: true.
Resource-server semantics
Section titled “Resource-server semantics”Registry Relay never mints or refreshes tokens. Operators are responsible for provisioning OIDC applications, machine users, and grant types on the IdP. The Principal’s principal_id is taken from the token’s sub (preferred), then client_id, then azp; auth_mode=oidc is recorded on every audit record.
Granular failure codes
Section titled “Granular failure codes”Token verification failures map to specific auth.* codes so audit pipelines can distinguish IdP outages from bad tokens from policy denials:
| Code | HTTP | Meaning |
|---|---|---|
auth.missing_credential | 401 | No Authorization header |
auth.malformed_credential | 401 | Wrong scheme, empty bearer, or unparseable JWT structure |
auth.token_expired | 401 | exp claim is in the past (after leeway) |
auth.token_not_yet_valid | 401 | nbf claim is in the future (after leeway) |
auth.token_signature_invalid | 401 | JWKS key found but signature did not verify |
auth.issuer_mismatch | 401 | iss claim does not match oidc.issuer |
auth.audience_mismatch | 401 | aud claim does not intersect oidc.audiences |
auth.kid_unknown | 401 | Header kid is absent from the JWKS even after one refresh |
auth.algorithm_not_allowed | 401 | Header alg is not in the configured allowlist |
auth.client_not_allowed | 403 | azp / client_id is not in the configured allowed_clients |
auth.invalid_credential | 401 | JWT decode failure not covered by a more specific variant |
auth.jwks_unavailable | 503 | JWKS fetch failed; Registry Relay cannot verify any token |
For a worked example of running Registry Relay against a local OIDC provider (using the project’s dev Zitadel stack), see development.md.
audit: sink: stdout format: jsonl hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET chain: true include_health: falseSupported sinks:
audit: sink: stdout format: jsonl hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRETaudit: sink: file format: jsonl hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET path: /var/log/registry-relay/audit.jsonl rotate: max_size_mb: 100 max_files: 14audit: sink: syslog format: jsonl hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECREThash_secret_env is required at runtime and must name an environment variable containing at least 32 bytes of deployment-specific random secret material. Startup fails closed when it is missing, empty, unset, or weak.
Audit output uses registry-platform-audit envelopes with prev_hash and record_hash on every record. These fields detect ordering gaps and accidental corruption in retained logs, but they do not protect against an actor who can rewrite the audit sink. Use an append-only external sink or independent tail-hash anchoring when stronger integrity is required. chain is retained in config for compatibility with older deployments, but platform audit envelopes are always chained.
A normally booted relay always reports keyed integrity hmac in its posture because startup requires the audit hash secret (hash_secret_env); the none value appears only in dev or test configurations that build the posture without that secret.
Audit records are separate from operational logs, which go to stderr as readable text by default. Set REGISTRY_RELAY_LOG_FORMAT=json or REGISTRY_RELAY_LOG_FORMAT=jsonl when operational logs are emitted as JSON Lines for collection or redirected files.
Write policy
Section titled “Write policy”write_policy selects what happens when an audit record cannot be written (for example the sink is unreachable or the disk is full):
audit: sink: file hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET path: /var/log/registry-relay/audit.jsonl write_policy: availability_first # availability_first | fail_closedavailability_first(default): an audit write failure is logged and the request returns its original outcome unchanged. The deployment stays available even when audit is degraded. This is the historical behavior.fail_closed: a request whose audit record cannot be written fails with HTTP503and the stable error codeaudit.write_failed(application/problem+json). No request outcome is returned without a durable audit record. Choose this when audit completeness is a hard requirement.
The policy applies to every audited route. Per-route-family selection is not configurable. The selected policy is reported truthfully as the write_policy fact in the operations posture audit block, so a deployment cannot claim a stronger guarantee than it runs.
Deployment profile
Section titled “Deployment profile”The deployment block lets an operator declare the assurance level a deployment claims. The profile is never inferred from hostname, environment, or network position: it is an explicit statement. Each profile binds a set of gates that check the running configuration and contribute findings at a defined severity.
deployment: profile: production # local | hosted_lab | production | evidence_grade evidence: ingress_rate_limit: true # operator asserts a gateway enforces rate limiting api_key_rotation: true # operator asserts an API-key rotation process exists waivers: - finding: relay.openapi.public reason: public API catalog is intentional for this deployment expires: 2026-12-31The deployment block is optional. When it is omitted, no gates bind and the deployment keeps its existing behavior exactly; the posture reports a single deployment.profile_undeclared warning so the choice is visible. An unknown profile value is rejected at startup.
Profiles and severities
Section titled “Profiles and severities”Each gate maps to one of four severities per profile:
startup_fail: the process refuses to start. Never waivable.readiness_fail: the readiness endpoint reports not-ready; the process keeps running.finding_error/finding_warn: a posture finding only.
The four profiles escalate from local (binds no hard gates) through hosted_lab and production to evidence_grade (the strictest). For example, evidence_grade requires a signed, governed config bundle: running it from a plain local YAML file trips a startup_fail gate (relay.config.unsigned) and the process refuses to start.
Evidence declarations
Section titled “Evidence declarations”Some controls live outside the relay and cannot be observed by the process (for example ingress rate limiting enforced by a gateway, or an API-key rotation process). The evidence flags let the operator assert those controls are in place. Each flag defaults to false, which leaves the corresponding gate active until the operator declares the control.
Waivers
Section titled “Waivers”A triggered, waivable finding can be suppressed by a waiver that names the finding id, carries a free-text reason, and a mandatory expiry date (YYYY-MM-DD):
deployment: profile: hosted_lab waivers: - finding: relay.ingress.rate_limit_missing reason: rate limiting is handled by the lab gateway expires: 2026-09-30A waived finding reports status waived instead of its severity effect. Once the expiry date passes, the waiver stops suppressing the finding and the posture additionally raises deployment.waiver_expired. The expiry date is mandatory; reasons must be non-empty and must not contain secrets. startup_fail gates are never waivable.
Waiver reasons are only visible in the restricted posture tier; the default tier reports finding id, severity, and status but not the reason.
Findings catalog
Section titled “Findings catalog”| Finding id | hosted_lab | production | evidence_grade |
|---|---|---|---|
relay.admin.public_exposure | error | readiness_fail | startup_fail |
relay.openapi.public | warn | error | error |
relay.ingress.rate_limit_missing | warn | error | error |
relay.oidc.client_allowlist_empty | warn | error | readiness_fail |
relay.auth.api_key_no_rotation_evidence | warn | error | error |
relay.config.unsigned | warn | error | startup_fail |
relay.audit.best_effort | (not bound) | warn | readiness_fail |
relay.audit.sink_missing | error | readiness_fail | startup_fail |
The current deployment profile, its findings, and active waivers are reported under deployment in the operations posture (GET /admin/v1/posture).
Datasets
Section titled “Datasets”Each dataset combines private storage tables with public entities:
datasets: - id: social_registry title: Social Registry description: Registry of households participating in Program X owner: Ministry of Social Affairs sensitivity: personal access_rights: restricted update_frequency: monthly conforms_to: - psc:concepts/Person defaults: materialization: snapshot tables: [] entities: []sensitivity, access_rights, and update_frequency are catalog metadata. They also make review conversations concrete; do not leave them vague in production configs. Allowed values:
sensitivity:public,internal,personal,confidential, orsecret.access_rights:public,restricted, ornon_public.update_frequency:continuous,daily,weekly,termly,monthly,quarterly,annual,irregular,as_needed, orunknown.
defaults is optional. It may provide materialization and refresh defaults for tables in the same dataset. Source configuration stays table-level.
Sources
Section titled “Sources”Sources are configured on each private table. File sources read CSV, XLSX, or Parquet data:
source: type: file path: ./data/social_registry.xlsx format: xlsx: sheet: Individuals header_row: 1 data_range: A1:E100000For CSV files, set format.csv.header_row: 1 when the first row contains column names. For XLSX files, header_row and data_range can be used when a worksheet has notes or title rows around the rectangular table. Source configuration is table-local: put file/database settings and format hints under each tables[].source.
Postgres snapshot and live table sources are supported. Credentials are never stored in YAML:
source: type: postgres connection_env: SOCIAL_REGISTRY_DATABASE_URL table: schema: public name: individuals change_token_sql: "select max(updated_at)::text from public.individuals"connection_env is the environment variable name containing the connection string. Validation and logs may mention the env var name but must not read or print its value. The connection string must set sslmode=require; missing sslmode, sslmode=prefer, and sslmode=disable are rejected when the connector reads the environment variable. The native TLS connector validates the server certificate and hostname against the system trust store. Use read-only database credentials. Registry Relay opens read-only Postgres sessions for live scans, but credentials must enforce the same boundary at the database. table and query are mutually exclusive; prefer structured table configs for production.
Snapshot ingest reads Postgres through COPY (SELECT ...) TO STDOUT WITH CSV HEADER, then applies the same declared-schema coercion and validation as CSV files. The exported snapshot is bounded by server.max_source_file_bytes. For table sources, Registry Relay projects the declared schema fields from the table and casts them to CSV-friendly values. Extra database columns are ignored. For query sources, write a single SELECT or WITH statement without semicolons; public request input is never interpolated into SQL.
Live materialization is supported for structured table sources only. Each DataFusion scan opens a read-only Postgres session and exports data from the configured table. Simple column projection is pushed into the generated COPY query only when the scan has no filters. Filter-free limited scans may use the requested limit as a physical fetch bound; filtered scans, joins, and semantic limit enforcement remain gateway-side and may read up to the configured live row cap before local filtering. This keeps the live path bounded and safe without accepting caller-controlled SQL. Live row responses do not advertise snapshot-style strong validators or cursor version tokens, because upstream rows can change between requests without a Registry Relay ingest event. Live exports are also bounded by server.max_source_file_bytes. Use connect_timeout, query_timeout, live_max_connections, and live_max_rows to bound upstream behavior.
For production live sources, keep the contract deliberately narrow:
tables: - id: individuals_table materialization: live primary_key: individual_id refresh: mode: manual schema: strict: true fields: - name: individual_id type: string nullable: false - name: household_id type: string nullable: false - name: updated_at type: timestamp nullable: true source: type: postgres connection_env: SOCIAL_REGISTRY_DATABASE_URL table: schema: public name: individuals connect_timeout: 5s query_timeout: 30s live_max_connections: 8The connection string must include sslmode=require and point to a read-only database role that can SELECT only the configured table or view. Do not use query sources, change_token_sql, or refresh.mode: mtime with live materialization; those are snapshot-only controls. Declared schema fields are the exported contract, and extra database columns are ignored unless an entity query needs a full local scan to evaluate filters.
Minimal source-only form:
source: type: postgres connection_env: SOCIAL_REGISTRY_DATABASE_URL table: schema: public name: individuals connect_timeout: 5s query_timeout: 30s live_max_connections: 8Supported Postgres field mappings are:
string -> textinteger -> bigintnumber -> double precisionboolean -> booleandate -> datetimestamp -> timestamptz rendered as RFC 3339 UTC textRefresh
Section titled “Refresh”refresh: mode: mtime interval: 60srefresh: mode: interval interval: 1hrefresh: mode: manualmtime reloads when the source change token changes. It is supported for file sources and for Postgres snapshot sources only when change_token_sql is configured. interval reloads on every interval. manual reloads only through the admin listener’s table reload route.
Tables
Section titled “Tables”Tables are private storage resources. Their ids do not appear in public URLs.
tables: - id: individuals_table materialization: snapshot source: type: file path: ./data/social_registry.xlsx format: xlsx: sheet: Individuals refresh: mode: mtime interval: 1h primary_key: individual_id schema: strict: true fields: - name: individual_id type: string nullable: false - name: payment_amount type: number nullable: true unit: EURSupported formats are csv, xlsx, and parquet. If format is omitted, the loader infers from the source file extension where possible.
materialization may be snapshot or live. File sources support snapshot. Postgres sources support snapshot; Postgres structured table sources also support live.
Datasource capability matrix
Section titled “Datasource capability matrix”Registry Relay derives datasource capabilities from source.type and materialization. Operators do not configure these flags directly.
| Source | Materialization | Filters | Projection | Limit | Validators and cursors | Provenance |
|---|---|---|---|---|---|---|
file | snapshot | gateway-side | gateway-side | gateway-side | strong snapshot tokens | snapshot-backed |
postgres table or query | snapshot | gateway-side | gateway-side | gateway-side | strong snapshot tokens | snapshot-backed |
postgres table | live | gateway-side | Postgres column pushdown for filter-free scans, otherwise gateway-side | gateway-side | no strong snapshot tokens | not snapshot-backed |
Unsupported combinations are rejected at config load: file live, Postgres live with a configured query, and live with mtime refresh. Postgres query sources stay snapshot-only so operator SQL is executed only during controlled ingest or refresh, never per public request. Future datasource connectors must follow the same convention: only generated SQL over structured table metadata may receive pushdown, and unsupported operations must fall back to gateway-side execution or be rejected explicitly.
At startup, Registry Relay logs one ingest.datasource_capabilities event per configured table. For Postgres live scans, the admin listener’s /metrics route also exports low-cardinality live scan metrics for scan duration, concurrency wait time, exported rows, and exported bytes. These metrics intentionally do not include dataset ids, table names, SQL, env vars, request ids, or row values.
Field types:
string, number, integer, boolean, date, timestampUse sensitive: true on source or entity fields whose query values are redacted or deterministically hashed in audit records. This flag is audit-only in beta: it does not hide a field from API responses and does not grant or deny read access.
Entities
Section titled “Entities”Entities are the public REST resources:
entities: - name: individual title: Individual description: A person enrolled in Program X table: individuals_table concept_uri: psc:concepts/Person fields: - name: id from: individual_id sensitive: true - name: payment_amount from: payment_amount relationships: - name: household kind: belongs_to target: household foreign_key: household_id access: metadata_scope: social_registry:metadata aggregate_scope: social_registry:aggregate read_scope: social_registry:rows evidence_verification_scope: social_registry:evidence_verification api: default_limit: 100 max_limit: 1000 require_purpose_header: true required_filters: - id allowed_filters: - field: id ops: [eq, in] allowed_expansions: - household publicschema: target: Person mapping_path: mappings/individual-person.publicschema.yaml schema_validation_path: ../publicschema.org/dist/schemas/Person.schema.jsonWhen fields is present, only listed fields are exposed. When it is omitted, every table column is exposed. For sensitive datasets, prefer an explicit field list. Use entity read_scope, required filters, purpose-header requirements, and explicit field projection for exposure control; sensitive: true controls audit redaction only.
Row-level authorization scopes are not supported. The row_scope resource setting is rejected by config parsing; model row exposure with dataset/entity read scopes, required filters, purpose headers, and projected fields instead.
Relationships are dataset-local in V1. Cross-dataset workflows must compose client-side with separate scoped calls and separate audit records.
OGC API features
Section titled “OGC API features”Build with --features ogcapi-features to expose spatial entities through the protected /ogc/v1 surface. The feature does not add a top-level standards config block. Instead, opt in per entity with spatial:
spatial: collection_id: facilities title: Public facilities description: Public facility locations from the civic registry. geometry: kind: point longitude_field: lon latitude_field: lat crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84 datetime_field: updated_at max_bbox_degrees: 5.0 max_geometry_vertices: 10000Phase 1 supports kind: point and kind: geojson. Point longitude, point latitude, datetime, and bbox helper fields must be exposed entity fields with compatible types. kind: geojson may use optional precomputed bbox fields:
spatial: collection_id: parcels geometry: kind: geojson field: geometry crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84 bbox_fields: min_x: bbox_min_x min_y: bbox_min_y max_x: bbox_max_x max_y: bbox_max_yOnly CRS84 is accepted. wkt and wkb parse as reserved geometry kinds but are rejected by V1 validation. Collection ids default to the entity name and must be unique within a dataset. OGC discovery uses metadata scope; feature item reads use read_scope and preserve entity required filters, purpose-header requirements, projection, and audit behavior.
Evidence verification
Section titled “Evidence verification”Evidence offerings expose Registry Notary discovery metadata:
GET /metadata/evidence-offeringsGET /metadata/evidence-offerings/{offering_id}Relay does not verify claims or evidence. registry-notary is the only claim/evidence verifier. The portable metadata manifest declares public offerings with access.kind: registry-notary, endpoint_url, discovery_url, and ruleset so clients can discover the Notary service that owns verification.
access: evidence_verification_scope: social_registry:evidence_verificationevidence_verification_scope remains a scope label for standards adapters and integrations that need to distinguish evidence-oriented access from row reads. It does not enable a Relay-local verification endpoint.
PublicSchema VC mapping
Section titled “PublicSchema VC mapping”Requires --features publicschema-cel. When present, entity-record VC issuance uses the mapping file to produce a PublicSchema.org credential subject instead of the default entity JSON shape.
publicschema: target: Person # required; PublicSchema concept name mapping_path: mappings/individual-person.publicschema.yaml # required; CEL mapping document schema_validation_path: ../publicschema.org/dist/schemas/Person.schema.json # optional; validates subject before signing context_url: https://publicschema.org/ctx/draft.jsonld # optional; overrides default context schema_url: https://publicschema.org/schemas/Person.schema.json # optional; overrides default credentialSchema.id credential_type: Person # optional; overrides default VC type[1]| Field | Default | Notes |
|---|---|---|
target | (required) | PublicSchema concept name; drives credential_type and schema_url defaults |
mapping_path | (required) | Path to a CEL mapping YAML document; compiled at startup |
schema_validation_path | absent | Local JSON Schema; when set, every mapped subject is validated before signing |
context_url | https://publicschema.org/ctx/draft.jsonld | JSON-LD context URL in the issued VC |
schema_url | https://publicschema.org/schemas/{target}.schema.json | credentialSchema.id in the issued VC |
credential_type | {target} | type[1] value in the issued VC |
See provenance.md for CEL context variables, issuance behavior, audit records, and the build and test commands for this feature.
Aggregates
Section titled “Aggregates”Aggregates are declared on datasets and name their source entity:
aggregates: - id: by_municipality title: Individuals by municipality description: Number of individuals by municipality source_entity: individual default_group_by: - municipality_code dimensions: - id: municipality_code label: Municipality field: municipality_code indicators: - id: individual_count label: Individuals function: count column: id unit_measure: people allowed_filters: - field: municipality_code ops: [eq, in] - field: enrolled_on ops: [gte, lte, between] temporal_field: enrolled_on disclosure_control: min_group_size: 5 suppression: omitSupported aggregate functions include the configured V1 set used by tests and examples, such as count, sum, and avg. The runtime config key remains indicators for compatibility; public aggregate APIs expose these configured series as measures. temporal_field is optional; when present, native aggregate temporal.from and temporal.to are translated into the declared range-capable allowed filter for that source-entity field. Dataset measure and dimension discovery is derived from these aggregate declarations, so keep ids stable and labels consumer-friendly. Keep disclosure thresholds explicit and reviewable.
Spatial EDR aggregates
Section titled “Spatial EDR aggregates”Spatial EDR exposure is opt-in. Requires --features ogcapi-edr.
aggregates: - id: by_admin_area description: Individuals by administrative area source_entity: individual # ...dimensions, indicators, disclosure_control as normal... spatial: mode: admin_area collection_id: by_admin_area # optional; defaults to aggregate id dimension: municipality_code # declared dimension id used to join geometry geometry_entity: municipality # entity name that holds geometry rows geometry_id_field: code # field in geometry_entity matching the dimension values geometry_field: geometry # geojson field in geometry_entity bbox_fields: # optional precomputed bbox fields in geometry_entity min_x: bbox_min_x min_y: bbox_min_y max_x: bbox_max_x max_y: bbox_max_y max_geometry_vertices: 10000 # optional; defaults to 10000| Field | Default | Notes |
|---|---|---|
mode | (required) | Must be admin_area |
collection_id | aggregate id | OGC collection identifier; must be unique within the dataset |
dimension | (required) | Declared aggregate dimension id whose values are joined to geometry |
geometry_entity | (required) | Entity that holds one geometry row per dimension value |
geometry_id_field | (required) | Field in geometry_entity that matches dimension values |
geometry_field | (required) | GeoJSON geometry field in geometry_entity |
bbox_fields | absent | Optional precomputed bbox columns; same subkeys as entity spatial.bbox_fields |
max_geometry_vertices | 10000 | Cap on GeoJSON vertices decoded from geometry_field |
geometry_entity must be an entity declared in the same dataset. geometry_id_field and geometry_field must be exposed entity fields with compatible types (string/integer for id, geojson-typed string for geometry). Only kind: geojson geometry is supported for spatial aggregates in V1.
Provenance (response-credential issuer configuration)
Section titled “Provenance (response-credential issuer configuration)”The provenance block is optional. When absent or enabled: false, the gateway behaves as a plain JSON service. When enabled, callers can opt in to signed response credentials (W3C VCDM 2.0 VC-JWT) with Accept: application/vc+jwt. V1 supports local Ed25519 signing from either a software env-var JWK or a file_watch JWK file.
The key is named provenance for compatibility; it governs the response-credential issuer (DID, signing key, claim validity, and accepted media types). These credentials are W3C VCDM 2.0 VC-JWT with a Registry Relay JSON-LD context; they are not W3C PROV-O.
See provenance.md for the full signer, DID, schema, context, and rotation contract.
Production checklist
Section titled “Production checklist”- Source files are read-only to the process.
cache_diris writable and on a filesystem with enough space.- Every env-backed
fingerprint.nameexists in the runtime environment. - No raw key, fingerprint, private JWK, or full environment dump is logged.
- Admin listener, if enabled, is private.
- CORS origins are explicit.
- Personal-data entities use explicit field projections.
- Row and evidence-verification routes that need purpose tracking set
require_purpose_header: true. - Sensitive identifier fields are marked
sensitive: truewhere audit redaction is required. - Audit sink and retention match the deployment’s governance requirements.
- For Postgres live tables, scrape
/metricsfrom the admin listener and alert on live scan timeout/error growth, exported bytes, and concurrency wait time.