Registry stack documentation: machine-readable Markdown.
Index of all pages: https://docs.registrystack.org/llms.txt
Full corpus: https://docs.registrystack.org/llms-full.txt

# Configuration

> registry-relay is configured by one YAML document. The binary chooses the first available source:

`registry-relay` is configured by one YAML document. The binary chooses the first available source:

1. `--config <path>`
2. `REGISTRY_RELAY_CONFIG`
3. `./config/example.yaml`

The canonical sample is [config/example.yaml](https://github.com/jeremi/registry-relay/blob/479b27709f8f2653d1fd0a882ed809d6185a2d2b/config/example.yaml). Keep examples aligned with this guide and the API and operations documentation.

## Root shape

```yaml
instance: {}
server: {}
metadata: {}   # optional split portable metadata manifest
catalog: {}
vocabularies: {}
auth: {}
audit: {}
deployment: {}   # optional declared assurance profile, waivers, and evidence
config_trust: {} # optional governed config apply state
datasets: []
provenance: {} # optional
standards: {}  # optional, feature-gated adapters
```

Unknown fields are rejected for most blocks. Config validation runs after YAML parsing and checks ids, scopes, table/entity references, filter references, aggregate references, env var presence, and vocabulary prefixes.

A minimal deployment needs `server` (a listener), `catalog` (public metadata base), `auth` (one auth mode), `audit` (a sink and hash secret), and at least one entry in `datasets`. Every other root block is optional. This example shows the required shape. For a runnable starting point, use `config/example.yaml`; env-backed API key configs also need a governed `fingerprint.commitment` generated by the relay binary for the configured key id.

```yaml
server:
  bind: 127.0.0.1:8080

catalog:
  title: Example Registry Relay
  base_url: http://127.0.0.1:8080
  publisher: Example Ministry

auth:
  mode: api_key
  api_keys:
    - id: demo_client
      fingerprint:
        provider: env
        name: API_KEY_HASH
        commitment: sha256:0000000000000000000000000000000000000000000000000000000000000000
      scopes:
        - people:metadata
        - people:rows

audit:
  sink: stdout
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET

datasets:
  - id: people
    title: People registry
    description: Demo people records
    owner: Example Ministry
    sensitivity: personal
    access_rights: restricted
    update_frequency: monthly
    tables:
      - id: people_table
        source:
          type: file
          path: ./data/people.csv
          format:
            csv:
              header_row: 1
        primary_key: person_id
        schema:
          strict: true
          fields:
            - name: person_id
              type: string
              nullable: false
            - name: name
              type: string
              nullable: false
    entities:
      - name: person
        table: people_table
        fields:
          - name: person_id
          - name: name
        access:
          metadata_scope: people:metadata
          aggregate_scope: people:metadata
          read_scope: people:rows
        api:
          default_limit: 50
          max_limit: 100
```

Replace the placeholder `commitment` with the value emitted by `registry-relay generate-api-key --id demo_client`. The `API_KEY_HASH` environment variable must contain the emitted fingerprint in the form `sha256:<64 lowercase hex chars>`. The `REGISTRY_RELAY_AUDIT_HASH_SECRET` environment variable must contain at least 32 bytes of random secret material; startup fails closed when it is absent or weak.

See [config/example.yaml](https://github.com/jeremi/registry-relay/blob/479b27709f8f2653d1fd0a882ed809d6185a2d2b/config/example.yaml) for a larger working starting point; the sections that follow document each block in full.

## Instance

```yaml
instance:
  id: registry-relay-local
  environment: development
  owner: Ministry of Digital Government
  jurisdiction: example-country
```

`instance` gives posture and operations tooling a stable public identity for the
running service. `id` defaults to `registry-relay-local`; `environment`, `owner`,
and `jurisdiction` are optional public labels.

## Server

```yaml
server:
  bind: 0.0.0.0:8080
  admin_bind: 127.0.0.1:8081
  openapi_requires_auth: true
  cache_dir: ./cache
  max_source_file_bytes: 268435456
  xlsx_max_file_bytes: 268435456
  request_timeout: 30s
  request_body_timeout: 10s
  http1_header_read_timeout: 10s
  max_connections: 1024
  cors:
    allowed_origins:
      - https://portal.example.gov
  trust_proxy:
    enabled: false
    trusted_proxies: []
```

`bind` is the public data-plane listener. `admin_bind` is optional and must be private in production. `cache_dir` must be writable by the process. Source data must be mounted read-only.

`openapi_requires_auth` defaults to `true`. Set it to `false` only for local testing or controlled tooling environments that need unauthenticated access to `/openapi.json`; the unauthenticated document includes the full configured OpenAPI surface.

`request_timeout` bounds total request service time after HTTP headers are parsed. `request_body_timeout` bounds body reads for handlers that consume a request body. `http1_header_read_timeout` closes incomplete HTTP/1 headers before request work is admitted, and `max_connections` caps concurrent accepted sockets per listener. All timeouts must be non-zero and `max_connections` must be greater than zero.

HTTP/2 connections use the same finite connection cap and keepalive timeout. If production terminates HTTP/2 at a reverse proxy, configure bounded proxy header/body read timeouts and per-client connection limits before forwarding to Registry Relay.

The default CORS policy is deny by omission. Add explicit trusted origins only.

## Governed config apply

Most deployments can skip this section. `config_trust` is optional; it governs
signed, threshold-approved config changes for high-assurance deployments. Simple
local deployments omit it and keep using the local YAML loaded at startup.

This governed example is syntactically valid but illustrative. Generate the
`tuf_root_sha256` and targets-role signer key IDs from your own trusted TUF
repository before using governed apply in an environment.

```yaml
config_trust:
  antirollback_state_path: /var/lib/registry-relay/config-antirollback.json
  local_approval_state_path: /var/lib/registry-relay/config-local-approvals.json
  break_glass_rate_limit:
    max_accepted: 1
    window_seconds: 3600
  accepted_roots:
    - root_id: ops-root
      production: true
      tuf_root_sha256: sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
      valid_from_unix_seconds: 1770000000
      valid_until_unix_seconds: 1772592000
      high_risk_change_classes:
        - auth_scopes
        - signing_key_cleanup
        - signing_key_rotation
      signers:
        "1111111111111111111111111111111111111111111111111111111111111111":
          kid: "1111111111111111111111111111111111111111111111111111111111111111"
          enabled: true
        "2222222222222222222222222222222222222222222222222222222222222222":
          kid: "2222222222222222222222222222222222222222222222222222222222222222"
          enabled: true
      roles:
        - name: config-admin
          threshold: 2
          signer_kids:
            - "1111111111111111111111111111111111111111111111111111111111111111"
            - "2222222222222222222222222222222222222222222222222222222222222222"
          allowed_change_classes:
            - public_metadata
```

`config_trust` is optional. Simple local deployments omit it and keep using the
local YAML loaded at startup. Governed config apply requires
`antirollback_state_path` and `local_approval_state_path`, which must point to
durable local state such as a mounted volume. `break_glass_rate_limit` is the
trusted local rolling-window policy used for break-glass apply requests; when
omitted it defaults to one accepted request per rate-limit identity per hour.
Registry Relay fails closed for
apply when required local state is absent, unreadable, stale, or inconsistent;
verify and dry-run remain available.

Signed apply also requires at least one local `accepted_roots` entry that
authorizes every change class in the signed target metadata. Registry Relay uses
the local root only after TUF target verification succeeds. Verified TUF
targets-role signature key IDs, not target-declared custom metadata, must
satisfy one role threshold per change class. Inline admin YAML can be used for
verify/dry-run checks, but apply requires a signed local TUF target and never
treats raw inline YAML as signed governance input.

For TUF root rotation, add the new final `tuf_root_sha256` as another local
`accepted_roots` entry before applying bundles that verify through the rotated
root. `valid_from_unix_seconds` and `valid_until_unix_seconds` are optional
local bounds for overlap windows. Omit them for an indefinite local root; set
them when old and new roots must both authorize bundles only during a planned
transition window. Expired or not-yet-valid roots fail authorization even when
the TUF metadata and signer quorum are otherwise valid.

## Catalog and vocabularies

```yaml
catalog:
  title: Internal Government Registry Relay
  base_url: https://data.example.gov
  publisher: Ministry of Digital Government
  participant_id: did:web:data.example.gov

vocabularies:
  psc: https://publicschema.org/
  m8g: http://data.europa.eu/m8g/
```

`base_url` is used in generated catalog links, OpenAPI servers, and provenance subject URIs. `participant_id` is optional and defaults from the catalog base URL when omitted.

Vocabulary prefixes let entity fields and dataset metadata use compact semantic references such as `psc:concepts/Person`.

## Split metadata manifest

```yaml
metadata:
  source:
    path: ./metadata.yaml
```

`metadata.source.path` points at a portable metadata manifest. Relative paths
are resolved from the runtime config file. At startup, Registry Relay compiles
the manifest and validates that runtime datasets, entities, fields, filters, and
relationships are present in the metadata model. Add
`metadata.source.digest: sha256:<digest>` when the deployment must pin the
exact reviewed manifest.

| Mode | Required config | Digest rule | Delivery |
| --- | --- | --- | --- |
| Simple local | `metadata.source.path` | Optional | Local file read at startup |
| Pinned local | `metadata.source.path`, `metadata.source.digest` | Must match the local manifest | Local file read at startup |
| Governed | `config_trust`, `metadata.source.path`, `metadata.source.digest` | Required before startup and apply | Signed config target plus signed metadata target; optional signed package index when `package_digest` is claimed |

Keep operational details in this runtime config: sources, tables, physical
columns, scopes, filters, aggregates, standards adapters, ingest, and refresh.
Keep standard-facing meaning in the manifest: catalog, datasets, entities,
fields, constraints, vocabularies, codelists, profiles, conformance claims, and
descriptive ODRL policy metadata.

See [metadata.md](https://github.com/jeremi/registry-relay/blob/3938fdf3930e134d6dca97360baf64ac3a16bed2/docs/metadata.md) for the manifest schema, static publication, and
the `metadata.manifest.*` / `runtime.binding.*` startup error codes.

ODRL policy belongs in the portable metadata manifest, not in runtime dataset
bindings. A dataset `policy` block is published as an `odrl:Offer` for discovery
and review evidence only. It does not change API-key scopes, OIDC authorization,
row filtering, evidence verification, SP DCI behavior, or any other runtime access
decision.

```yaml
metadata:
  source:
    path: ./disability_registry.metadata.yaml

# In disability_registry.metadata.yaml:
datasets:
  - id: disability_registry
    policy:
      uid: https://demo.example.gov/datasets/disability_registry#illustrative-offer
      assigner: did:web:social-affairs.demo.example.gov
      permissions:
        - action: odrl:use
          constraints:
            - left_operand: odrl:purpose
              operator: odrl:isA
              right_operand:
                iri: https://demo.example.gov/purpose/disability-benefit-eligibility
          duties:
            - action: odrl:attribute
      prohibitions:
        - action: odrl:sell
```

The demo policy IRIs under `demo.example.gov` are hypothetical examples for
catalog consumers. They are not official policy, legal advice, or a declaration
that a client has been approved to use the data.

## SP DCI sync adapter

SP DCI (the Social Protection Digital Convergence Initiative) sync adapters are optional and feature-gated. Build with `--features spdci-api-standards` to enable them. Without that feature, any `standards.spdci` config is rejected with `spdci.config.feature_disabled`.

The adapter does not add new storage semantics. Configure a normal Registry Relay entity, often backed by an XLSX worksheet, then bind the SP DCI sync routes to it:

```yaml
standards:
  spdci:
    disability_registry:
      dataset: disability_registry
      entity: disabled_person
      query_key: member.member_identifier
      query_field: id
      disabled_status_field: disability_status
      disabled_positive_values: [approved, yes]
    registries:
      dr:
        dataset: disability_registry
        entity: disabled_person
        registry_type: ns:org:RegistryType:DR
        record_type: spdci-extensions-dci:DisabledPerson
        identifiers:
          DISABILITY_ID: id
          MEMBER_ID: id
        expression_fields:
          disability_status: disability_status
          disability_details.impairment_type: impairment_type
```

When enabled and configured, Registry Relay serves these SP DCI sync endpoints on the protected data-plane listener:

```text
POST /dci/{registry}/registry/sync/search
POST /dci/{registry}/registry/sync/disabled
POST /dci/{registry}/registry/sync/get-disability-details
POST /dci/{registry}/registry/sync/get-disability-support
```

For `sync/search`, the `{registry}` segment selects any named `standards.spdci.registries` entry such as `dr`, `sr`, `crvs`, or `fr`, which lets one listener host multiple DCI registry APIs without path ambiguity. The `disabled`, `get-disability-details`, and `get-disability-support` routes are Disability Registry-specific and resolve only when the named registry entry points at the same dataset/entity as `standards.spdci.disability_registry`. The async `/registry/search`, subscribe, callback, and transaction-status APIs are intentionally not implemented by this sync adapter.

For generic sync search, `identifiers` maps DCI `idtype-value` query types to entity fields. `expression_fields` maps DCI expression or predicate attribute names to entity fields. Mapped fields must be exposed entity fields and allowed filters. The adapter currently supports `idtype-value`, expression `$and` with `eq`, `in`, `ge`, and `le`, and predicate conditions joined with `and`.

`query_key` is read from `message.disabled_criteria.query` in the SP DCI request envelope. It may be represented as a literal dotted JSON key (`"member.member_identifier"`) or as nested objects (`{"member": {"member_identifier": ...}}`). `query_field` must be an allowed entity filter because the adapter delegates reads to the normal entity query engine.

For `/dci/{registry}/registry/sync/disabled`, the caller needs the entity `evidence_verification_scope`. Generic search, details, and support need the entity `read_scope`. API-key authentication is still Registry Relay's normal auth layer. If a registry entry uses `response_mapping_path`, the binary must also be built with `--features standards-cel-mapping`; otherwise config validation fails with `spdci.config.mapping_feature_disabled`.

## API keys

```yaml
auth:
  mode: api_key
  api_keys:
    - id: program_system
      fingerprint:
        provider: env
        name: PROGRAM_SYSTEM_API_KEY_HASH
        commitment: sha256:0000000000000000000000000000000000000000000000000000000000000000
      scopes:
        - social_registry:metadata
        - social_registry:rows
```

The YAML stores committed fingerprint references, never raw API keys. Each env var value must be:

```text
sha256:<64 lowercase hex chars>
```

Generate a raw key, its fingerprint, and the matching commitment:

```sh
registry-relay generate-api-key --id program_system
```

The command emits four shell-friendly lines:

```text
api_key_id=program_system
api_key=<send-this-raw-key-to-the-client>
fingerprint=sha256:<store-this-in-the-secret-store>
commitment=sha256:<paste-this-into-config>
```

Store the emitted fingerprint in the platform secret store under the configured `fingerprint.name`. Paste the emitted commitment into `fingerprint.commitment`. Give the raw key only to the authorized client.

Worked standalone example, using `demo_client` and `API_KEY_HASH`:

```text
api_key_id=demo_client
api_key=registry-relay-standalone-example-key-0001
fingerprint=sha256:db3f2a02c6ead9bf0387e8a97ec090a549daa46610ca87bd4b651631b2411def
commitment=sha256:1ee555b85da34dced897bc690053aaaedd4716f4b2972d556fa688b64ff55213
```

```sh
export API_KEY_HASH='sha256:db3f2a02c6ead9bf0387e8a97ec090a549daa46610ca87bd4b651631b2411def'
```

```yaml
auth:
  mode: api_key
  api_keys:
    - id: demo_client
      fingerprint:
        provider: env
        name: API_KEY_HASH
        commitment: sha256:1ee555b85da34dced897bc690053aaaedd4716f4b2972d556fa688b64ff55213
      scopes:
        - people:metadata
        - people:rows
```

Do not reuse the example raw key in a real deployment.

## OIDC (OAuth2)

Set `auth.mode: oidc` to verify bearer JWTs against an external OpenID Connect / OAuth2 IdP. Registry Relay is a resource server: it validates inbound tokens against the IdP's JWKS but never mints, refreshes, or stores tokens. A given deployment runs in exactly one auth mode at a time; mixed-mode operation is not supported.

OIDC field names follow the shared Registry service runtime configuration conventions.
Removed pre-convention names are rejected before deserialization with an error
naming the replacement field.

```yaml
auth:
  mode: oidc
  oidc:
    issuer: https://idp.example.gov
    audiences:
      - registry-relay
    discovery_url: https://idp.example.gov/.well-known/openid-configuration
    allowed_algorithms:
      - RS256
    jwks_cache_ttl: 10m
    leeway: 60s
    scope_claim: scope
    scope_map:
      "role:social-registry-reader": "social_registry:rows"
    scope_object_required_keys: []
    allowed_clients: []
    allowed_token_types:
      - JWT
      - at+jwt
```

A full drop-in alternative to `config/example.yaml` lives at `config/example.oidc.yaml`. It targets a local Zitadel instance and is what the integration test consumes.

| Field             | Purpose                                                                                                                                                       |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `issuer`          | Compared verbatim against the JWT `iss` claim. Must match the IdP's published issuer URL.                                                                     |
| `audiences`       | One or more accepted `aud` values. Tokens whose `aud` does not intersect this list are rejected.                                                              |
| `jwks_url`        | Explicit JWKS endpoint. Exactly one of `jwks_url` and `discovery_url` must be set; the validator rejects configs that supply both or neither.                 |
| `discovery_url`   | OIDC discovery document (`.well-known/openid-configuration`). The JWKS URL is resolved from `jwks_uri` at startup.                                            |
| `allow_dev_insecure_fetch_urls` | Development-only opt-in for loopback HTTP issuer, discovery, and JWKS URLs. Defaults to `false`; non-loopback private and metadata IPs remain denied by the platform fetch policy. |
| `allowed_algorithms`      | Signature algorithms accepted by the verifier. RS256, ES256, EdDSA. HS\* and `none` are intentionally absent.                                                 |
| `jwks_cache_ttl`  | Steady-state JWKS cache TTL. The cache also refreshes on unknown `kid` (rate-limited), so this is the rotation pickup latency, not the upper bound.           |
| `leeway`          | Clock skew tolerance on `exp` and `nbf`. Bounded at 5 minutes by validation.                                                                                  |
| `scope_claim`     | Name of the JWT claim to read scopes from (the config field itself is always a single string; defaults to `scope`). The claim's *value* in the token may be a space-separated string (RFC 8693 / RFC 9068), a JSON array of strings, or a JSON object whose keys are the scope names. The `aud` claim is rejected as a scope source because it is used only for token audience validation. Object-valued role keys grant scopes only when `scope_object_required_keys` names a key present in the role value and that nested value is active: `true`, a non-empty string, or a non-empty object/array containing an active value. |
| `scope_map`       | Optional rename map applied before scope-based access checks. Adapt IdP role names to Registry Relay's `<dataset_id>:<level>` shape.                               |
| `scope_object_required_keys` | Allowlist of keys that must appear inside object-valued role claim values before the role key is accepted. For Zitadel organization-scoped role objects, set this to the expected organization id key or keys. Defaults to empty, which means object-valued claims grant no scopes. String and array scope claims do not require this setting. |
| `allowed_clients` | Optional allowlist matched against the token's `azp` (preferred) or `client_id`. Empty list means any client is accepted.                                     |
| `allowed_token_types`     | Accepted JOSE `typ` header values. Defaults to `JWT` and `at+jwt` (RFC 9068). ID tokens (`id+jwt`) are intentionally rejected by default, and tokens without `typ` are rejected by the shared verifier. |

### Discovery vs explicit JWKS

`discovery_url` triggers a single discovery fetch at startup to resolve `jwks_uri`; a failure here aborts the binary so an operator sees the IdP wiring problem instead of a process that runs but silently rejects every token. The JWKS document itself is fetched lazily on first verify, so a transient JWKS outage at boot does not block startup. Production defaults require HTTPS; local loopback HTTP requires `allow_dev_insecure_fetch_urls: true`.

### Resource-server semantics

Registry Relay never mints or refreshes tokens. Operators are responsible for provisioning OIDC applications, machine users, and grant types on the IdP. The Principal's `principal_id` is taken from the token's `sub` (preferred), then `client_id`, then `azp`; `auth_mode=oidc` is recorded on every audit record.

### Granular failure codes

Token verification failures map to specific `auth.*` codes so audit pipelines can distinguish IdP outages from bad tokens from policy denials:

| Code                            | HTTP | Meaning                                                       |
| ------------------------------- | ---- | ------------------------------------------------------------- |
| `auth.missing_credential`       | 401  | No `Authorization` header                                     |
| `auth.malformed_credential`     | 401  | Wrong scheme, empty bearer, or unparseable JWT structure      |
| `auth.token_expired`            | 401  | `exp` claim is in the past (after `leeway`)                   |
| `auth.token_not_yet_valid`      | 401  | `nbf` claim is in the future (after `leeway`)                 |
| `auth.token_signature_invalid`  | 401  | JWKS key found but signature did not verify                   |
| `auth.issuer_mismatch`          | 401  | `iss` claim does not match `oidc.issuer`                      |
| `auth.audience_mismatch`        | 401  | `aud` claim does not intersect `oidc.audiences`                |
| `auth.kid_unknown`              | 401  | Header `kid` is absent from the JWKS even after one refresh   |
| `auth.algorithm_not_allowed`    | 401  | Header `alg` is not in the configured allowlist               |
| `auth.client_not_allowed`       | 403  | `azp` / `client_id` is not in the configured `allowed_clients`|
| `auth.invalid_credential`       | 401  | JWT decode failure not covered by a more specific variant      |
| `auth.jwks_unavailable`         | 503  | JWKS fetch failed; Registry Relay cannot verify any token     |

For a worked example of running Registry Relay against a local OIDC provider (using the project's dev Zitadel stack), see [development.md](https://github.com/jeremi/registry-relay/blob/3938fdf3930e134d6dca97360baf64ac3a16bed2/docs/development.md).

## Audit

```yaml
audit:
  sink: stdout
  format: jsonl
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
  chain: true
  include_health: false
```

Supported sinks:

```yaml
audit:
  sink: stdout
  format: jsonl
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
```

```yaml
audit:
  sink: file
  format: jsonl
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
  path: /var/log/registry-relay/audit.jsonl
  rotate:
    max_size_mb: 100
    max_files: 14
```

```yaml
audit:
  sink: syslog
  format: jsonl
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
```

`hash_secret_env` is required at runtime and must name an environment variable containing at least 32 bytes of deployment-specific random secret material. Startup fails closed when it is missing, empty, unset, or weak.

Audit output uses `registry-platform-audit` envelopes with `prev_hash` and `record_hash` on every record. These fields detect ordering gaps and accidental corruption in retained logs, but they do not protect against an actor who can rewrite the audit sink. Use an append-only external sink or independent tail-hash anchoring when stronger integrity is required. `chain` is retained in config for compatibility with older deployments, but platform audit envelopes are always chained.

A normally booted relay always reports keyed integrity `hmac` in its posture because startup requires the audit hash secret (`hash_secret_env`); the `none` value appears only in dev or test configurations that build the posture without that secret.

Audit records are separate from operational logs, which go to stderr as readable text by default. Set `REGISTRY_RELAY_LOG_FORMAT=json` or `REGISTRY_RELAY_LOG_FORMAT=jsonl` when operational logs are emitted as JSON Lines for collection or redirected files.

### Write policy

`write_policy` selects what happens when an audit record cannot be written (for example the sink is unreachable or the disk is full):

```yaml
audit:
  sink: file
  hash_secret_env: REGISTRY_RELAY_AUDIT_HASH_SECRET
  path: /var/log/registry-relay/audit.jsonl
  write_policy: availability_first   # availability_first | fail_closed
```

* `availability_first` (default): an audit write failure is logged and the request returns its original outcome unchanged. The deployment stays available even when audit is degraded. This is the historical behavior.
* `fail_closed`: a request whose audit record cannot be written fails with HTTP `503` and the stable error code `audit.write_failed` (`application/problem+json`). No request outcome is returned without a durable audit record. Choose this when audit completeness is a hard requirement.

The policy applies to every audited route. Per-route-family selection is not configurable. The selected policy is reported truthfully as the `write_policy` fact in the operations posture audit block, so a deployment cannot claim a stronger guarantee than it runs.

## Deployment profile

The `deployment` block lets an operator declare the assurance level a deployment claims. The profile is never inferred from hostname, environment, or network position: it is an explicit statement. Each profile binds a set of gates that check the running configuration and contribute findings at a defined severity.

```yaml
deployment:
  profile: production        # local | hosted_lab | production | evidence_grade
  evidence:
    ingress_rate_limit: true # operator asserts a gateway enforces rate limiting
    api_key_rotation: true   # operator asserts an API-key rotation process exists
  waivers:
    - finding: relay.openapi.public
      reason: public API catalog is intentional for this deployment
      expires: 2026-12-31
```

The `deployment` block is optional. When it is omitted, no gates bind and the deployment keeps its existing behavior exactly; the posture reports a single `deployment.profile_undeclared` warning so the choice is visible. An unknown profile value is rejected at startup.

### Profiles and severities

Each gate maps to one of four severities per profile:

* `startup_fail`: the process refuses to start. Never waivable.
* `readiness_fail`: the readiness endpoint reports not-ready; the process keeps running.
* `finding_error` / `finding_warn`: a posture finding only.

The four profiles escalate from `local` (binds no hard gates) through `hosted_lab` and `production` to `evidence_grade` (the strictest). For example, `evidence_grade` requires a signed, governed config bundle: running it from a plain local YAML file trips a `startup_fail` gate (`relay.config.unsigned`) and the process refuses to start.

### Evidence declarations

Some controls live outside the relay and cannot be observed by the process (for example ingress rate limiting enforced by a gateway, or an API-key rotation process). The `evidence` flags let the operator assert those controls are in place. Each flag defaults to `false`, which leaves the corresponding gate active until the operator declares the control.

### Waivers

A triggered, waivable finding can be suppressed by a waiver that names the finding id, carries a free-text reason, and a mandatory expiry date (`YYYY-MM-DD`):

```yaml
deployment:
  profile: hosted_lab
  waivers:
    - finding: relay.ingress.rate_limit_missing
      reason: rate limiting is handled by the lab gateway
      expires: 2026-09-30
```

A waived finding reports status `waived` instead of its severity effect. Once the expiry date passes, the waiver stops suppressing the finding and the posture additionally raises `deployment.waiver_expired`. The expiry date is mandatory; reasons must be non-empty and must not contain secrets. `startup_fail` gates are never waivable.

Waiver reasons are only visible in the restricted posture tier; the default tier reports finding id, severity, and status but not the reason.

### Findings catalog

| Finding id | hosted_lab | production | evidence_grade |
| --- | --- | --- | --- |
| `relay.admin.public_exposure` | error | readiness_fail | startup_fail |
| `relay.openapi.public` | warn | error | error |
| `relay.ingress.rate_limit_missing` | warn | error | error |
| `relay.oidc.client_allowlist_empty` | warn | error | readiness_fail |
| `relay.auth.api_key_no_rotation_evidence` | warn | error | error |
| `relay.config.unsigned` | warn | error | startup_fail |
| `relay.audit.best_effort` | (not bound) | warn | readiness_fail |
| `relay.audit.sink_missing` | error | readiness_fail | startup_fail |

The current deployment profile, its findings, and active waivers are reported under `deployment` in the operations posture (`GET /admin/v1/posture`).

## Datasets

Each dataset combines private storage tables with public entities:

```yaml
datasets:
  - id: social_registry
    title: Social Registry
    description: Registry of households participating in Program X
    owner: Ministry of Social Affairs
    sensitivity: personal
    access_rights: restricted
    update_frequency: monthly
    conforms_to:
      - psc:concepts/Person
    defaults:
      materialization: snapshot
    tables: []
    entities: []
```

`sensitivity`, `access_rights`, and `update_frequency` are catalog metadata. They also make review conversations concrete; do not leave them vague in production configs. Allowed values:

- `sensitivity`: `public`, `internal`, `personal`, `confidential`, or `secret`.
- `access_rights`: `public`, `restricted`, or `non_public`.
- `update_frequency`: `continuous`, `daily`, `weekly`, `termly`, `monthly`, `quarterly`, `annual`, `irregular`, `as_needed`, or `unknown`.

`defaults` is optional. It may provide `materialization` and `refresh` defaults for tables in the same dataset. Source configuration stays table-level.

### Sources

Sources are configured on each private table. File sources read CSV, XLSX, or Parquet data:

```yaml
source:
  type: file
  path: ./data/social_registry.xlsx
  format:
    xlsx:
      sheet: Individuals
      header_row: 1
      data_range: A1:E100000
```

For CSV files, set `format.csv.header_row: 1` when the first row contains column names. For XLSX files, `header_row` and `data_range` can be used when a worksheet has notes or title rows around the rectangular table. Source configuration is table-local: put file/database settings and format hints under each `tables[].source`.

Postgres snapshot and live table sources are supported. Credentials are never stored in YAML:

```yaml
source:
  type: postgres
  connection_env: SOCIAL_REGISTRY_DATABASE_URL
  table:
    schema: public
    name: individuals
  change_token_sql: "select max(updated_at)::text from public.individuals"
```

`connection_env` is the environment variable name containing the connection string. Validation and logs may mention the env var name but must not read or print its value. The connection string must set `sslmode=require`; missing `sslmode`, `sslmode=prefer`, and `sslmode=disable` are rejected when the connector reads the environment variable. The native TLS connector validates the server certificate and hostname against the system trust store. Use read-only database credentials. Registry Relay opens read-only Postgres sessions for live scans, but credentials must enforce the same boundary at the database. `table` and `query` are mutually exclusive; prefer structured `table` configs for production.

Snapshot ingest reads Postgres through `COPY (SELECT ...) TO STDOUT WITH CSV HEADER`, then applies the same declared-schema coercion and validation as CSV files. The exported snapshot is bounded by `server.max_source_file_bytes`. For `table` sources, Registry Relay projects the declared schema fields from the table and casts them to CSV-friendly values. Extra database columns are ignored. For `query` sources, write a single `SELECT` or `WITH` statement without semicolons; public request input is never interpolated into SQL.

Live materialization is supported for structured `table` sources only. Each DataFusion scan opens a read-only Postgres session and exports data from the configured table. Simple column projection is pushed into the generated `COPY` query only when the scan has no filters. Filter-free limited scans may use the requested limit as a physical fetch bound; filtered scans, joins, and semantic limit enforcement remain gateway-side and may read up to the configured live row cap before local filtering. This keeps the live path bounded and safe without accepting caller-controlled SQL. Live row responses do not advertise snapshot-style strong validators or cursor version tokens, because upstream rows can change between requests without a Registry Relay ingest event. Live exports are also bounded by `server.max_source_file_bytes`. Use `connect_timeout`, `query_timeout`, `live_max_connections`, and `live_max_rows` to bound upstream behavior.

For production live sources, keep the contract deliberately narrow:

```yaml
tables:
  - id: individuals_table
    materialization: live
    primary_key: individual_id
    refresh:
      mode: manual
    schema:
      strict: true
      fields:
        - name: individual_id
          type: string
          nullable: false
        - name: household_id
          type: string
          nullable: false
        - name: updated_at
          type: timestamp
          nullable: true
    source:
      type: postgres
      connection_env: SOCIAL_REGISTRY_DATABASE_URL
      table:
        schema: public
        name: individuals
      connect_timeout: 5s
      query_timeout: 30s
      live_max_connections: 8
```

The connection string must include `sslmode=require` and point to a read-only database role that can `SELECT` only the configured table or view. Do not use `query` sources, `change_token_sql`, or `refresh.mode: mtime` with live materialization; those are snapshot-only controls. Declared schema fields are the exported contract, and extra database columns are ignored unless an entity query needs a full local scan to evaluate filters.

Minimal source-only form:

```yaml
source:
  type: postgres
  connection_env: SOCIAL_REGISTRY_DATABASE_URL
  table:
    schema: public
    name: individuals
  connect_timeout: 5s
  query_timeout: 30s
  live_max_connections: 8
```

Supported Postgres field mappings are:

```text
string -> text
integer -> bigint
number -> double precision
boolean -> boolean
date -> date
timestamp -> timestamptz rendered as RFC 3339 UTC text
```

### Refresh

```yaml
refresh:
  mode: mtime
  interval: 60s
```

```yaml
refresh:
  mode: interval
  interval: 1h
```

```yaml
refresh:
  mode: manual
```

`mtime` reloads when the source change token changes. It is supported for file sources and for Postgres snapshot sources only when `change_token_sql` is configured. `interval` reloads on every interval. `manual` reloads only through the admin listener's table reload route.

## Tables

Tables are private storage resources. Their ids do not appear in public URLs.

```yaml
tables:
  - id: individuals_table
    materialization: snapshot
    source:
      type: file
      path: ./data/social_registry.xlsx
      format:
        xlsx:
          sheet: Individuals
    refresh:
      mode: mtime
      interval: 1h
    primary_key: individual_id
    schema:
      strict: true
      fields:
        - name: individual_id
          type: string
          nullable: false
        - name: payment_amount
          type: number
          nullable: true
          unit: EUR
```

Supported formats are `csv`, `xlsx`, and `parquet`. If `format` is omitted, the loader infers from the source file extension where possible.

`materialization` may be `snapshot` or `live`. File sources support `snapshot`. Postgres sources support `snapshot`; Postgres structured table sources also support `live`.

### Datasource capability matrix

Registry Relay derives datasource capabilities from `source.type` and `materialization`. Operators do not configure these flags directly.

| Source | Materialization | Filters | Projection | Limit | Validators and cursors | Provenance |
| --- | --- | --- | --- | --- | --- | --- |
| `file` | `snapshot` | gateway-side | gateway-side | gateway-side | strong snapshot tokens | snapshot-backed |
| `postgres` `table` or `query` | `snapshot` | gateway-side | gateway-side | gateway-side | strong snapshot tokens | snapshot-backed |
| `postgres` `table` | `live` | gateway-side | Postgres column pushdown for filter-free scans, otherwise gateway-side | gateway-side | no strong snapshot tokens | not snapshot-backed |

Unsupported combinations are rejected at config load: file `live`, Postgres `live` with a configured `query`, and `live` with `mtime` refresh. Postgres `query` sources stay snapshot-only so operator SQL is executed only during controlled ingest or refresh, never per public request. Future datasource connectors must follow the same convention: only generated SQL over structured table metadata may receive pushdown, and unsupported operations must fall back to gateway-side execution or be rejected explicitly.

At startup, Registry Relay logs one `ingest.datasource_capabilities` event per configured table. For Postgres live scans, the admin listener's `/metrics` route also exports low-cardinality live scan metrics for scan duration, concurrency wait time, exported rows, and exported bytes. These metrics intentionally do not include dataset ids, table names, SQL, env vars, request ids, or row values.

Field types:

```text
string, number, integer, boolean, date, timestamp
```

Use `sensitive: true` on source or entity fields whose query values are redacted or deterministically hashed in audit records. This flag is audit-only in beta: it does not hide a field from API responses and does not grant or deny read access.

## Entities

Entities are the public REST resources:

```yaml
entities:
  - name: individual
    title: Individual
    description: A person enrolled in Program X
    table: individuals_table
    concept_uri: psc:concepts/Person
    fields:
      - name: id
        from: individual_id
        sensitive: true
      - name: payment_amount
        from: payment_amount
    relationships:
      - name: household
        kind: belongs_to
        target: household
        foreign_key: household_id
    access:
      metadata_scope: social_registry:metadata
      aggregate_scope: social_registry:aggregate
      read_scope: social_registry:rows
      evidence_verification_scope: social_registry:evidence_verification
    api:
      default_limit: 100
      max_limit: 1000
      require_purpose_header: true
      required_filters:
        - id
      allowed_filters:
        - field: id
          ops: [eq, in]
      allowed_expansions:
        - household
    publicschema:
      target: Person
      mapping_path: mappings/individual-person.publicschema.yaml
      schema_validation_path: ../publicschema.org/dist/schemas/Person.schema.json
```

When `fields` is present, only listed fields are exposed. When it is omitted, every table column is exposed. For sensitive datasets, prefer an explicit field list. Use entity `read_scope`, required filters, purpose-header requirements, and explicit field projection for exposure control; `sensitive: true` controls audit redaction only.

Row-level authorization scopes are not supported. The `row_scope` resource setting is rejected by config parsing; model row exposure with dataset/entity read scopes, required filters, purpose headers, and projected fields instead.

Relationships are dataset-local in V1. Cross-dataset workflows must compose client-side with separate scoped calls and separate audit records.

### OGC API features

Build with `--features ogcapi-features` to expose spatial entities through the protected `/ogc/v1` surface. The feature does not add a top-level `standards` config block. Instead, opt in per entity with `spatial`:

```yaml
spatial:
  collection_id: facilities
  title: Public facilities
  description: Public facility locations from the civic registry.
  geometry:
    kind: point
    longitude_field: lon
    latitude_field: lat
    crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
  datetime_field: updated_at
  max_bbox_degrees: 5.0
  max_geometry_vertices: 10000
```

Phase 1 supports `kind: point` and `kind: geojson`. Point longitude, point latitude, datetime, and bbox helper fields must be exposed entity fields with compatible types. `kind: geojson` may use optional precomputed bbox fields:

```yaml
spatial:
  collection_id: parcels
  geometry:
    kind: geojson
    field: geometry
    crs: http://www.opengis.net/def/crs/OGC/1.3/CRS84
  bbox_fields:
    min_x: bbox_min_x
    min_y: bbox_min_y
    max_x: bbox_max_x
    max_y: bbox_max_y
```

Only CRS84 is accepted. `wkt` and `wkb` parse as reserved geometry kinds but are rejected by V1 validation. Collection ids default to the entity name and must be unique within a dataset. OGC discovery uses metadata scope; feature item reads use `read_scope` and preserve entity required filters, purpose-header requirements, projection, and audit behavior.

### Evidence verification

Evidence offerings expose Registry Notary discovery metadata:

```http
GET /metadata/evidence-offerings
GET /metadata/evidence-offerings/{offering_id}
```

Relay does not verify claims or evidence. `registry-notary` is the only claim/evidence verifier. The portable metadata manifest declares public offerings with `access.kind: registry-notary`, `endpoint_url`, `discovery_url`, and `ruleset` so clients can discover the Notary service that owns verification.

```yaml
access:
  evidence_verification_scope: social_registry:evidence_verification
```

`evidence_verification_scope` remains a scope label for standards adapters and integrations that need to distinguish evidence-oriented access from row reads. It does not enable a Relay-local verification endpoint.

### PublicSchema VC mapping

Requires `--features publicschema-cel`. When present, entity-record VC issuance uses the mapping file to produce a PublicSchema.org credential subject instead of the default entity JSON shape.

```yaml
publicschema:
  target: Person                                          # required; PublicSchema concept name
  mapping_path: mappings/individual-person.publicschema.yaml  # required; CEL mapping document
  schema_validation_path: ../publicschema.org/dist/schemas/Person.schema.json  # optional; validates subject before signing
  context_url: https://publicschema.org/ctx/draft.jsonld  # optional; overrides default context
  schema_url: https://publicschema.org/schemas/Person.schema.json  # optional; overrides default credentialSchema.id
  credential_type: Person                                 # optional; overrides default VC type[1]
```

| Field | Default | Notes |
| --- | --- | --- |
| `target` | (required) | PublicSchema concept name; drives `credential_type` and `schema_url` defaults |
| `mapping_path` | (required) | Path to a CEL mapping YAML document; compiled at startup |
| `schema_validation_path` | absent | Local JSON Schema; when set, every mapped subject is validated before signing |
| `context_url` | `https://publicschema.org/ctx/draft.jsonld` | JSON-LD context URL in the issued VC |
| `schema_url` | `https://publicschema.org/schemas/{target}.schema.json` | `credentialSchema.id` in the issued VC |
| `credential_type` | `{target}` | `type[1]` value in the issued VC |

See [provenance.md](https://github.com/jeremi/registry-relay/blob/3938fdf3930e134d6dca97360baf64ac3a16bed2/docs/provenance.md) for CEL context variables, issuance behavior, audit records, and the build and test commands for this feature.

## Aggregates

Aggregates are declared on datasets and name their source entity:

```yaml
aggregates:
  - id: by_municipality
    title: Individuals by municipality
    description: Number of individuals by municipality
    source_entity: individual
    default_group_by:
      - municipality_code
    dimensions:
      - id: municipality_code
        label: Municipality
        field: municipality_code
    indicators:
      - id: individual_count
        label: Individuals
        function: count
        column: id
        unit_measure: people
    allowed_filters:
      - field: municipality_code
        ops: [eq, in]
      - field: enrolled_on
        ops: [gte, lte, between]
    temporal_field: enrolled_on
    disclosure_control:
      min_group_size: 5
      suppression: omit
```

Supported aggregate functions include the configured V1 set used by tests and examples, such as `count`, `sum`, and `avg`. The runtime config key remains `indicators` for compatibility; public aggregate APIs expose these configured series as measures. `temporal_field` is optional; when present, native aggregate `temporal.from` and `temporal.to` are translated into the declared range-capable allowed filter for that source-entity field. Dataset measure and dimension discovery is derived from these aggregate declarations, so keep ids stable and labels consumer-friendly. Keep disclosure thresholds explicit and reviewable.

### Spatial EDR aggregates

Spatial EDR exposure is opt-in. Requires `--features ogcapi-edr`.

```yaml
aggregates:
  - id: by_admin_area
    description: Individuals by administrative area
    source_entity: individual
    # ...dimensions, indicators, disclosure_control as normal...
    spatial:
      mode: admin_area
      collection_id: by_admin_area   # optional; defaults to aggregate id
      dimension: municipality_code   # declared dimension id used to join geometry
      geometry_entity: municipality  # entity name that holds geometry rows
      geometry_id_field: code        # field in geometry_entity matching the dimension values
      geometry_field: geometry       # geojson field in geometry_entity
      bbox_fields:                   # optional precomputed bbox fields in geometry_entity
        min_x: bbox_min_x
        min_y: bbox_min_y
        max_x: bbox_max_x
        max_y: bbox_max_y
      max_geometry_vertices: 10000   # optional; defaults to 10000
```

| Field | Default | Notes |
| --- | --- | --- |
| `mode` | (required) | Must be `admin_area` |
| `collection_id` | aggregate id | OGC collection identifier; must be unique within the dataset |
| `dimension` | (required) | Declared aggregate dimension id whose values are joined to geometry |
| `geometry_entity` | (required) | Entity that holds one geometry row per dimension value |
| `geometry_id_field` | (required) | Field in `geometry_entity` that matches dimension values |
| `geometry_field` | (required) | GeoJSON geometry field in `geometry_entity` |
| `bbox_fields` | absent | Optional precomputed bbox columns; same subkeys as entity `spatial.bbox_fields` |
| `max_geometry_vertices` | 10000 | Cap on GeoJSON vertices decoded from `geometry_field` |

`geometry_entity` must be an entity declared in the same dataset. `geometry_id_field` and `geometry_field` must be exposed entity fields with compatible types (string/integer for id, geojson-typed string for geometry). Only `kind: geojson` geometry is supported for spatial aggregates in V1.

## Provenance (response-credential issuer configuration)

The `provenance` block is optional. When absent or `enabled: false`, the gateway behaves as a plain JSON service. When enabled, callers can opt in to signed response credentials (W3C VCDM 2.0 VC-JWT) with `Accept: application/vc+jwt`. V1 supports local Ed25519 signing from either a `software` env-var JWK or a `file_watch` JWK file.

The key is named `provenance` for compatibility; it governs the response-credential issuer (DID, signing key, claim validity, and accepted media types). These credentials are W3C VCDM 2.0 VC-JWT with a Registry Relay JSON-LD context; they are not W3C PROV-O.

See [provenance.md](https://github.com/jeremi/registry-relay/blob/3938fdf3930e134d6dca97360baf64ac3a16bed2/docs/provenance.md) for the full signer, DID, schema, context, and rotation contract.

## Production checklist

- Source files are read-only to the process.
- `cache_dir` is writable and on a filesystem with enough space.
- Every env-backed `fingerprint.name` exists in the runtime environment.
- No raw key, fingerprint, private JWK, or full environment dump is logged.
- Admin listener, if enabled, is private.
- CORS origins are explicit.
- Personal-data entities use explicit field projections.
- Row and evidence-verification routes that need purpose tracking set `require_purpose_header: true`.
- Sensitive identifier fields are marked `sensitive: true` where audit redaction is required.
- Audit sink and retention match the deployment's governance requirements.
- For Postgres live tables, scrape `/metrics` from the admin listener and alert on live scan timeout/error growth, exported bytes, and concurrency wait time.