Relay protected read flow

View as Markdown

Registry Relay is a read-only HTTP gateway that fronts sensitive tabular sources with entity-shaped routes and caller-specific access controls. Storage table identifiers stay private; callers see only declared entities, fields, filters, and aggregates. For the broader scope of what each project owns, see the boundaries and map page.

Request lifecycle

The lifecycle has one startup phase (Bind) and four per-request phases (Match scope, Query, Audit, Respond).

Startup: Bind

At startup, Registry Relay loads config.yaml and validates it against metadata.yaml. Binding maps each logical dataset and entity from the manifest to physical sources, physical table columns, declared filters, and scopes. If a dataset in the manifest has no matching config.yaml entry, startup fails with runtime.binding.dataset_missing and the process exits non-zero. API-key auth has its own startup prerequisite: each api_keys entry needs a configured fingerprint reference, and the referenced value must resolve to a canonical sha256:<64 lowercase hex chars> fingerprint. Relay fails closed at startup if the reference is missing or malformed; see the deployment guide. Bind runs once per process; the per-request path assumes the bound state is valid.

Per request: four phases in sequence

1. Match scope. Relay extracts the bearer token or API key from Authorization: Bearer or x-api-key, verifies the credential, and resolves the caller’s principal. The caller’s scopes are compared against the scope required by the route. A caller without the correct <dataset_id>:<suffix> scope receives 403 Forbidden before any data access occurs.

2. Query. Relay translates the request path and query parameters into an internal query plan over the bound source tables. Only fields declared in entities[].fields are projected. Only filter operations declared in entities[].api.allowed_filters are accepted. Relay builds the query from its own plan; it never forwards raw SQL or arbitrary filter expressions to the source.

3. Audit. Before returning the response, Relay writes one JSONL record to the configured audit sink (stdout, file, or syslog). The record carries the principal_id, request_id (a ULID also returned in the x-request-id response header), dataset_id, entity_name, scopes_used, query_params (parameter names and operators, plus value_hash for fields marked sensitive: true), status_code, and duration_ms. Raw submitted values are not written to the audit sink. Registry Platform wraps the record with prev_hash and record_hash for tamper evidence. For deployment, set audit.hash_secret_env to a deployment-specific secret and mark identifier fields with sensitive: true. Relay uses the secret to write stable hmac-sha256: handles for sensitive lookups, record identifiers, and storage table identifiers. That lets an auditor correlate repeated access to the same subject inside one deployment without retaining names, national IDs, dates of birth, addresses, or source table names in the log. Protect the audit secret with the same controls as the audit log; rotating it makes new handles different from older records. The Relay configuration page documents audit.hash_secret_env and the sensitive: true field flag. For the full field list, see the Relay operations runbook.

4. Respond. Relay returns the entity-shaped response. The response body contains only projected fields for the authenticated caller’s visible datasets and entities. Storage table identifiers never appear in responses.

Entity-shaped routes vs storage tables

Storage tables are an implementation detail. Public routes reference datasets and entity names, never table ids. Relay reads private CSV, XLSX, Parquet, or PostgreSQL sources and serves them only as declared entities: the caller must hold the dataset’s rows scope, only declared fields are projected, and every request writes one audit JSONL record. Callers reach the entities over the native Registry Data API; optional feature-gated adapters can also expose them over SP-DCI sync and OGC API Records.

Registry Relay between private storage and a public API. Private storage holds
csv, xlsx, parquet, and postgres sources with their own file and table names.
Registry Relay applies a scope check (the caller must hold the dataset rows
scope), projects the entity shape (only declared fields, no arbitrary SQL), and
offers adapters: the Registry Data API, plus optional SP-DCI sync and OGC API
Records. The public API exposes entity cards such as household and individual
with declared fields and relationship links. Every request writes one audit
JSONL record carrying timestamp, principal, scopes, status, and duration.

The four primary route shapes for row access are:

GET /v1/datasets/{dataset_id}/entities/{entity}/records : paginated list of entity records.
GET /v1/datasets/{dataset_id}/entities/{entity}/records/{id} : single record by identifier.
GET /v1/datasets/{dataset_id}/entities/{entity}/records/{id}/relationships/{relationship} : related records for a declared relationship.
GET /v1/datasets/{dataset_id}/aggregates/{aggregate_id} : a pre-configured aggregate over entity records.

Metadata routes under /metadata/* follow the same auth path but require a metadata scope, not a rows or aggregate scope. The full route inventory is in the Registry Relay API reference.

Scope model

Data-plane scopes follow the pattern <dataset_id>:<suffix>, where the six suffix values are metadata, rows, aggregate, verify, evidence_verification, and identity_release. A data-plane scope is always bound to a specific dataset; a principal holding social_registry:metadata cannot access vital_events:metadata unless that scope is also assigned. Three scopes sit outside the pattern and are service-level rather than dataset-scoped: registry_relay:admin grants the administrative surface, registry_relay:ops_read grants read-only operational endpoints, and registry_relay:metrics_read grants the metrics endpoint; none of the three is ever implied by a dataset scope. Full scope configuration, API-key provisioning, and example principal assignments are in Client integration.

Closed surfaces

Five constraints a reviewer can verify against the code:

Read-only in v1: entity paths expose only GET. No POST, PUT, PATCH, or DELETE route is registered; provisioning is out of scope.
Closed filter set: callers cannot submit arbitrary query expressions. Relay accepts only the operators declared in entities[].api.allowed_filters and builds queries from its own plan.
Storage details stay private: table names, column names, and connection strings never appear in responses or error messages.
Raw lookup values stay out of audit: records capture operators for ordinary filters and keyed hashes for configured sensitive lookups, record identifiers, and table identifiers; they do not write raw submitted values.
No token minting: in OIDC mode, Relay is a resource server: it verifies tokens issued by an external identity provider and never mints or refreshes them.