Configuring your Apicurio Registry deployment

This chapter explains how to set important configuration options for your Apicurio Registry deployment. This includes features such as the Apicurio Registry web console, logging, health checks, and observability:

Configuring the Apicurio Registry web console
Configuring Apicurio Registry observability with OpenTelemetry
Configuring Apicurio Registry health checks on OpenShift
Environment variables for Apicurio Registry health checks

For a list of all available configuration options, see Apicurio Registry configuration reference.

Configuring the Apicurio Registry web console

You can set optional environment variables to configure the Apicurio Registry web console specifically for your deployment environment or to customize its behavior.

Prerequisites

You have already installed Apicurio Registry.

Configuring the web console deployment environment

When you access the Apicurio Registry web console in your browser, some initial configuration settings are loaded. The following configuration settings are required:

URL for core Apicurio Registry server REST API v3

Typically the Apicurio Registry Operator will automatically configure the UI component with the REST API v3 URL. However, you can override this value by configuring the appropriate environment variable in the UI component deployment configuration.

Procedure

Configure the following environment variables to override the default URL:

REGISTRY_API_URL: Specifies the URL for the core Apicurio Registry server REST API v3. For example, https://registry-api.my-domain.com/apis/registry/v3

Configuring the web console in read-only mode

You can configure the Apicurio Registry web console in read-only mode as an optional feature. This mode disables all features in the Apicurio Registry web console that allow users to make changes to registered artifacts. For example, this includes the following:

Creating a group
Creating an artifact
Uploading a new artifact version
Updating artifact metadata
Deleting an artifact

Procedure

Configure the following environment variable:

REGISTRY_FEATURE_READ_ONLY: Set to true to enable read-only mode. Defaults to false.

Configuring Apicurio Registry observability with OpenTelemetry

You can configure Apicurio Registry to export telemetry data using OpenTelemetry (OTel) for comprehensive observability. This includes distributed tracing, metrics export via OTLP protocol, and log correlation with trace context.

Apicurio Registry is built with OpenTelemetry support and all telemetry signals (traces, metrics, logs) are enabled at build time. However, the OpenTelemetry SDK is disabled by default at runtime. When enabled, Apicurio Registry exports telemetry data to an OpenTelemetry-compatible collector such as Jaeger, Grafana Tempo, or the OpenTelemetry Collector.

The individual signal properties (QUARKUS_OTEL_TRACES_ENABLED, QUARKUS_OTEL_METRICS_ENABLED, QUARKUS_OTEL_LOGS_ENABLED) are build-time properties and cannot be changed at runtime. All signals are already enabled in the Apicurio Registry build. Use QUARKUS_OTEL_SDK_DISABLED=false to enable telemetry at runtime.

Prerequisites

You have already installed Apicurio Registry.
You have an OpenTelemetry-compatible backend available (for example, Jaeger, Grafana Tempo, or OpenTelemetry Collector).

Enabling OpenTelemetry

To enable OpenTelemetry observability, configure the following environment variables:

Table 1. Environment variables for enabling OpenTelemetry
Environment Variable	Description
`QUARKUS_OTEL_SDK_DISABLED`	Set to `false` to enable the OpenTelemetry SDK and all telemetry signals. Default is `true` (disabled).
`QUARKUS_OTEL_EXPORTER_OTLP_ENDPOINT`	The endpoint URL of your OpenTelemetry collector. For example, `http://jaeger:4317` for gRPC or `http://jaeger:4318` for HTTP.

Example: Enabling OpenTelemetry with Jaeger

environment:
  QUARKUS_OTEL_SDK_DISABLED: "false"
  QUARKUS_OTEL_EXPORTER_OTLP_ENDPOINT: "http://jaeger:4317"

Configuring trace sampling for production

In production environments, you should configure trace sampling to reduce overhead and control the volume of trace data:

Table 2. Environment variables for trace sampling
Environment Variable	Description
`QUARKUS_OTEL_TRACES_SAMPLER`	The sampling strategy. Use `parentbased_traceidratio` for production.
`QUARKUS_OTEL_TRACES_SAMPLER_ARG`	The sampling ratio (0.0 to 1.0). A value of `0.1` samples 10% of traces.

Example: Production sampling configuration (10% of traces)

environment:
  QUARKUS_OTEL_SDK_DISABLED: "false"
  QUARKUS_OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4317"
  QUARKUS_OTEL_TRACES_SAMPLER: "parentbased_traceidratio"
  QUARKUS_OTEL_TRACES_SAMPLER_ARG: "0.1"

Configuring structured logging with trace context

When using JSON logging format, Apicurio Registry automatically includes trace context (trace ID and span ID) in log entries. This enables correlation between logs and traces.

Example: Enabling structured logging with trace context

environment:
  QUARKUS_OTEL_TRACES_ENABLED: "true"
  QUARKUS_LOG_CONSOLE_JSON: "true"

OpenTelemetry features in Apicurio Registry

When OpenTelemetry is enabled, Apicurio Registry provides the following observability features:

Distributed tracing: All REST API requests are automatically traced with spans containing request details, path parameters, and Apicurio-specific attributes such as groupId, artifactId, and version.
Storage layer tracing: All storage operations create child spans, enabling you to trace the complete request flow from REST API to database.
Kafka tracing: When using KafkaSQL storage, Kafka operations are automatically traced with context propagation.
Custom metrics: OpenTelemetry metrics for artifact operations, schema validations, and search requests are exported alongside existing Prometheus metrics.
Log correlation: When JSON logging is enabled, trace context is automatically injected into log entries for easy correlation.

Performance considerations

OpenTelemetry instrumentation adds a small performance overhead. The following table shows the measured impact when all telemetry signals are enabled with 100% sampling:

Table 3. Performance impact with OpenTelemetry enabled (100% sampling)
Operation	Latency Increase	Throughput Decrease	Impact Level
System Info	+9% (+0.35ms)	-7%	Low
Create Artifact	+4% (+0.13ms)	-1%	Minimal
Get Artifact	+5% (+0.05ms)	-6%	Minimal
Search Artifacts	+1% (+0.04ms)	-1%	Minimal
List Groups	+6% (+0.16ms)	-5%	Low

Key findings:

Average overhead is approximately 4-6% in throughput with 100% trace sampling.
With the recommended 10% sampling ratio (QUARKUS_OTEL_TRACES_SAMPLER_ARG=0.1), the overhead is reduced to less than 1%.
OpenTelemetry signals are disabled by default to avoid any overhead for users who do not require observability features.

To reproduce these benchmarks, run the following commands from the project root:

# Build the project
./mvnw clean install -DskipTests

# Run benchmark with OTEL disabled (baseline)
./mvnw test -pl app -Dtest=OpenTelemetryPerformanceTest \
    -DOpenTelemetryPerformanceTest=enabled

# Run benchmark with OTEL enabled
./mvnw test -pl app -Dtest=OpenTelemetryPerformanceEnabledTest \
    -DOpenTelemetryPerformanceTest=enabled

Backwards compatibility

OpenTelemetry support is fully backwards compatible:

The existing Prometheus metrics endpoint (/q/metrics) remains available and unchanged.
Health check endpoints (/q/health/*) continue to work as before.
All existing Micrometer-based metrics continue to function.

Additional resources

Configuring Apicurio Registry health checks on OpenShift

You can configure optional environment variables for liveness and readiness probes to monitor the health of the Apicurio Registry server on OpenShift:

Liveness probes test if the application can make progress. If the application cannot make progress, OpenShift automatically restarts the failing Pod.
Readiness probes test if the application is ready to process requests. If the application is not ready, it can become overwhelmed by requests, and OpenShift stops sending requests for the time that the probe fails. If other Pods are OK, they continue to receive requests.

The default values of the liveness and readiness environment variables are designed for most cases and should only be changed if required by your environment. Any changes to the defaults depend on your hardware, network, and amount of data stored. These values should be kept as low as possible to avoid unnecessary overhead.

Prerequisites

You must have an OpenShift cluster with cluster administrator access.
You must have already installed Apicurio Registry on OpenShift.
You must have already installed and configured your chosen Apicurio Registry storage in either Strimzi or PostgreSQL.

Procedure

In the OpenShift Container Platform web console, log in using an account with cluster administrator privileges.
Click Installed Operators > Apicurio Registry.
On the ApicurioRegistry tab, click the Operator custom resource for your deployment, for example, example-apicurioregistry.
In the main overview page, find the Deployment Name section and the corresponding DeploymentConfig name for your Apicurio Registry deployment, for example, example-apicurioregistry.
In the left navigation menu, click Workloads > Deployment Configs, and select your DeploymentConfig name.
Click the Environment tab, and enter your environment variables in the Single values env section, for example:
- NAME: LIVENESS_STATUS_RESET
- VALUE: 350
Click Save at the bottom.

Alternatively, you can perform these steps using the OpenShift oc command. For more details, see the OpenShift CLI documentation.

Additional resources

Environment variables for Apicurio Registry health checks

This section describes the available environment variables for Apicurio Registry health checks on OpenShift. These include liveness and readiness probes to monitor the health of the Apicurio Registry server on OpenShift. For an example procedure, see Configuring Apicurio Registry health checks on OpenShift.

The following environment variables are provided for reference only. The default values are designed for most cases and should only be changed if required by your environment. Any changes to the defaults depend on your hardware, network, and amount of data stored. These values should be kept as low as possible to avoid unnecessary overhead.

Liveness environment variables

Table 4. Environment variables for Apicurio Registry liveness probes
Name	Description	Type	Default
`LIVENESS_ERROR_THRESHOLD`	Number of liveness issues or errors that can occur before the liveness probe fails.	Integer	`1`
`LIVENESS_COUNTER_RESET`	Period in which the threshold number of errors must occur. For example, if this value is 60 and the threshold is 1, the check fails after two errors occur in 1 minute	Seconds	`60`
`LIVENESS_STATUS_RESET`	Number of seconds that must elapse without any more errors for the liveness probe to reset to OK status.	Seconds	`300`
`LIVENESS_ERRORS_IGNORED`	Comma-separated list of ignored liveness exceptions.	String	`io.grpc.StatusRuntimeException,org.apache.kafka.streams.errors.InvalidStateStoreException`

Because OpenShift automatically restarts a Pod that fails a liveness check, the liveness settings, unlike readiness settings, do not directly affect behavior of Apicurio Registry on OpenShift.

Readiness environment variables

Table 5. Environment variables for Apicurio Registry readiness probes
Name	Description	Type	Default
`READINESS_ERROR_THRESHOLD`	Number of readiness issues or errors that can occur before the readiness probe fails.	Integer	`1`
`READINESS_COUNTER_RESET`	Period in which the threshold number of errors must occur. For example, if this value is 60 and the threshold is 1, the check fails after two errors occur in 1 minute.	Seconds	`60`
`READINESS_STATUS_RESET`	Number of seconds that must elapse without any more errors for the liveness probe to reset to OK status. In this case, this means how long the Pod stays not ready, until it returns to normal operation.	Seconds	`300`
`READINESS_TIMEOUT`	Readiness tracks the timeout of two operations: How long it takes for storage requests to complete How long it takes for HTTP REST API requests to return a response If these operations take more time than the configured timeout, this is counted as a readiness issue or error. This value controls the timeouts for both operations.	Seconds	`5`

Additional resources