Validating schemas using Kafka serializers/deserializers in Java clients

Apicurio Registry provides client serializers/deserializers (SerDes) for Kafka producer and consumer applications written in Java. Kafka producer applications use serializers to encode messages that conform to a specific event schema. Kafka consumer applications use deserializers to validate that messages have been serialized using the correct schema, based on a specific schema ID. This ensures consistent schema use and helps to prevent data errors at runtime.

This chapter explains how to use Kafka client SerDe in your producer and consumer client applications:

Kafka client applications and Apicurio Registry
Strategies to look up a schema in Apicurio Registry
Registering a schema in Apicurio Registry
Using a schema from a Kafka consumer client
Using a schema from a Kafka producer client
Using a schema from a Kafka Streams application

Prerequisites

You have read Introduction to Apicurio Registry
You have installed Apicurio Registry
You have created Kafka producer and consumer client applications

Kafka client applications and Apicurio Registry

Apicurio Registry decouples schema management from client application configuration. You can enable a Java client application to use a schema from Apicurio Registry by specifying its URL in your client code.

You can store the schemas in the registry to serialize and deserialize messages, which are referenced from your client applications to ensure that the messages that they send and receive are compatible with those schemas. Kafka client applications can push or pull their schemas from Apicurio Registry at runtime.

Schemas can evolve, so you can define rules in Apicurio Registry, for example, to ensure that schema changes are valid and do not break previous versions used by applications. Apicurio Registry checks for compatibility by comparing a modified schema with previous schema versions.

Apicurio Registry schema technologies

Apicurio Registry provides schema registry support for schema technologies such as:

Avro
Protobuf
JSON Schema

These schema technologies can be used by client applications through the Kafka client serializer/deserializer (SerDe) services provided by Apicurio Registry. The maturity and usage of the SerDe classes provided by Apicurio Registry might vary. The sections that follow provide more details about each schema type.

Producer schema configuration

A producer client application uses a serializer to put the messages that it sends to a specific broker topic into the correct data format.

To enable a producer to use Apicurio Registry for serialization:

Define and register your schema with Apicurio Registry (if it does not already exist).
Configure your producer client code with the following:
- URL of Apicurio Registry
- Apicurio Registry serializer to use with messages
- Strategy to map the Kafka message to a schema artifact in Apicurio Registry
- Strategy to look up or register the schema used for serialization in Apicurio Registry

After registering your schema, when you start Kafka and Apicurio Registry, you can access the schema to format messages sent to the Kafka broker topic by the producer. Alternatively, depending on configuration, the producer can automatically register the schema on first use.

If a schema already exists, you can create a new version using the registry REST API based on compatibility rules defined in Apicurio Registry. Versions are used for compatibility checking as a schema evolves. A group ID, artifact ID, and version represents a unique tuple that identifies a schema.

Consumer schema configuration

A consumer client application uses a deserializer to get the messages that it consumes from a specific broker topic into the correct data format.

To enable a consumer to use Apicurio Registry for deserialization:

Define and register your schema with Apicurio Registry (if it does not already exist)
Configure the consumer client code with the following:
- URL of Apicurio Registry
- Apicurio Registry deserializer to use with the messages
- Input data stream for deserialization

Retrieve schemas using a global ID

By default, the schema is retrieved from Apicurio Registry by the deserializer using a global ID, which is specified in the message being consumed. The schema global ID can be located in the message headers or in the message payload, depending on the configuration of the producer application.

When locating the global ID in the message payload, the format of the data begins with a magic byte, used as a signal to consumers, followed by the global ID, and the message data as normal. For example:

# ...
[MAGIC_BYTE]
[GLOBAL_ID]
[MESSAGE DATA]

Then when you start Kafka and Apicurio Registry, you can access the schema to format messages received from the Kafka broker topic.

Retrieve schemas using a content ID

Alternatively, you can configure to retrieve schemas from Apicurio Registry based on the content ID, which is the unique ID of the artifact content. While the global ID is the unique ID of an artifact version.

The content ID does not uniquely identify a version, but uniquely identifies the version content only. If multiple versions share the exact same content, they have a different global ID but the same content ID. Confluent Schema Registry uses content ID by default.

Strategies to look up a schema in Apicurio Registry

The Kafka client serializer uses lookup strategies to determine the artifact ID and global ID under which the message schema is registered in Apicurio Registry. For a given topic and message, you can use different implementations of the ArtifactResolverStrategy Java interface to return a reference to an artifact in the registry.

The classes for each strategy are in the io.apicurio.registry.serde.strategy package. Specific strategy classes for Avro SerDe are in the io.apicurio.registry.serde.avro.strategy package. The default strategy is the TopicIdStrategy, which looks for Apicurio Registry artifacts with the same name as the Kafka topic receiving messages.

Example

public ArtifactReference artifactReference(String topic, boolean isKey, T schema) {
        return ArtifactReference.builder()
                .groupId(null)
                .artifactId(String.format("%s-%s", topic, isKey ? "key" : "value"))
                .build();

The topic parameter is the name of the Kafka topic receiving the message.
The isKey parameter is true when the message key is serialized, and false when the message value is serialized.
The schema parameter is the schema of the message serialized or deserialized.
The ArtifactReference returned contains the artifact ID under which the schema is registered.

Which lookup strategy you use depends on how and where you store your schema. For example, you might use a strategy that uses a record ID if you have different Kafka topics with the same Avro message type.

ArtifaceResolverStrategy interface

The artifact resolver strategy provides a way to map the Kafka topic and message information to an artifact in Apicurio Registry. The common convention for the mapping is to combine the Kafka topic name with the key or value, depending on whether the serializer is used for the Kafka message key or value.

However, you can use alternative conventions for the mapping by using a strategy provided by Apicurio Registry, or by creating a custom Java class that implements io.apicurio.registry.serde.strategy.ArtifactResolverStrategy.

Strategies to return an artifact reference

Apicurio Registry provides the following strategies to return an artifact reference based on an implementation of ArtifaceResolverStrategy:

RecordIdStrategy: Avro-specific strategy that uses the full name of the schema.
TopicRecordIdStrategy: Avro-specific strategy that uses the topic name and the full name of the schema.
TopicIdStrategy: Default strategy that uses the topic name and key or value suffix.
SimpleTopicIdStrategy: Simple strategy that only uses the topic name.

DefaultSchemaResolver interface

The default schema resolver locates and identifies the specific version of the schema registered under the artifact reference provided by the artifact resolver strategy. Every version of every artifact has a single globally unique identifier that can be used to retrieve the content of that artifact. This global ID is included in every Kafka message so that a deserializer can properly fetch the schema from Apicurio Registry.

The default schema resolver can look up an existing artifact version, or it can register one if not found, depending on which strategy is used. You can also provide your own strategy by creating a custom Java class that implements io.apicurio.registry.serde.SchemaResolver. However, it is recommended to use the DefaultSchemaResolver and specify configuration properties instead.

Configuration for registry lookup options

When using the DefaultSchemaResolver, you can configure its behavior using application properties. The following table shows some commonly used examples:

Table 1. Apicurio Registry lookup configuration options
Property	Type	Description	Default
`apicurio.registry.find-latest`	`boolean`	Specify whether the serializer tries to find the latest artifact in the registry for the corresponding group ID and artifact ID.	`false`
`apicurio.registry.use-id`	`String`	Instructs the serializer to write the specified ID to Kafka and instructs the deserializer to use this ID to find the schema.	None
`apicurio.registry.auto-register`	`boolean`	Specify whether the serializer tries to create an artifact in the registry. The JSON Schema serializer does not support this.	`false`
`apicurio.registry.check-period-ms`	`String`	Specify how long to cache the global ID in milliseconds. If not configured, the global ID is fetched every time.	None

Registering a schema in Apicurio Registry

After you have defined a schema in the appropriate format, such as Apache Avro, you can add the schema to Apicurio Registry.

You can add the schema using the following approaches:

Apicurio Registry web console
curl command using the Apicurio Registry REST API
Maven plug-in supplied with Apicurio Registry
Schema configuration added to your client code

Client applications cannot use Apicurio Registry until you have registered your schemas.

Apicurio Registry web console

When Apicurio Registry is installed, you can connect to the web console from the ui endpoint:

http://MY-REGISTRY-URL/ui

From the console, you can add, view and configure schemas. You can also create the rules that prevent invalid content being added to the registry.

Curl command example

 curl -X POST -H "Content-type: application/json; artifactType=AVRO" \
   -H "X-Registry-ArtifactId: share-price" \ (1)
   --data '{
     "type":"record",
     "name":"price",
     "namespace":"com.example",
     "fields":[{"name":"symbol","type":"string"},
     {"name":"price","type":"string"}]}'
   https://my-cluster-my-registry-my-project.example.com/apis/registry/v2/groups/my-group/artifacts -s (2)

1	Simple Avro schema artifact.
2	OpenShift route name that exposes Apicurio Registry.

Maven plug-in example

<plugin>
  <groupId>io.apicurio</groupId>
  <artifactId>apicurio-registry-maven-plugin</artifactId>
  <version>${apicurio.version}</version>
  <executions>
      <execution>
        <phase>generate-sources</phase>
        <goals>
            <goal>register</goal>  (1)
        </goals>
        <configuration>
            <registryUrl>http://REGISTRY-URL/apis/registry/v2</registryUrl> (2)
            <artifacts>
                <artifact>
                    <groupId>TestGroup</groupId> (3)
                    <artifactId>FullNameRecord</artifactId>
                    <file>${project.basedir}/src/main/resources/schemas/record.avsc</file>
                    <ifExists>FAIL</ifExists>
                </artifact>
                <artifact>
                    <groupId>TestGroup</groupId>
                    <artifactId>ExampleAPI</artifactId> (4)
                    <type>GRAPHQL</type>
                    <file>${project.basedir}/src/main/resources/apis/example.graphql</file>
                    <ifExists>RETURN_OR_UPDATE</ifExists>
                    <canonicalize>true</canonicalize>
                </artifact>
            </artifacts>
        </configuration>
    </execution>
  </executions>
 </plugin>

1	Specify `register` as the execution goal to upload the schema artifact to the registry.
2	Specify the Apicurio Registry URL with the `../apis/registry/v2` endpoint.
3	Specify the Apicurio Registry artifact group ID.
4	You can upload multiple artifacts using the specified group ID, artifact ID, and location.

Configuration using a producer client example

String registryUrl_node1 = PropertiesUtil.property(clientProperties, "registry.url.node1",
    "https://my-cluster-service-registry-myproject.example.com/apis/registry/v2"); (1)
try (RegistryService service = RegistryClient.create(registryUrl_node1)) {
    String artifactId = ApplicationImpl.INPUT_TOPIC + "-value";
    try {
        service.getArtifactMetaData(artifactId); (2)
    } catch (WebApplicationException e) {
        CompletionStage <ArtifactMetaData> csa = service.createArtifact(
            ArtifactType.AVRO,
            artifactId,
            new ByteArrayInputStream(LogInput.SCHEMA$.toString().getBytes())
        );
        csa.toCompletableFuture().get();
    }
}

1	You can register properties against more than one URL node.
2	Check to see if the schema already exists based on the artifact ID.

Using a schema from a Kafka consumer client

This procedure describes how to configure a Kafka consumer client written in Java to use a schema from Apicurio Registry.

Prerequisites

Apicurio Registry is installed
The schema is registered with Apicurio Registry

Procedure

Configure the client with the URL of Apicurio Registry. For example:

String registryUrl = "https://registry.example.com/apis/registry/v2";
Properties props = new Properties();
props.putIfAbsent(SerdeConfig.REGISTRY_URL, registryUrl);

Configure the client with the Apicurio Registry deserializer. For example:

// Configure Kafka settings
props.putIfAbsent(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, SERVERS);
props.putIfAbsent(ConsumerConfig.GROUP_ID_CONFIG, "Consumer-" + TOPIC_NAME);
props.putIfAbsent(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.putIfAbsent(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.putIfAbsent(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
// Configure deserializer settings
props.putIfAbsent(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
    AvroKafkaDeserializer.class.getName()); (1)
props.putIfAbsent(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
    AvroKafkaDeserializer.class.getName()); (2)

1	The deserializer provided by Apicurio Registry.
2	The deserialization is in Apache Avro JSON format.

Using a schema from a Kafka producer client

This procedure describes how to configure a Kafka producer client written in Java to use a schema from Apicurio Registry.

Prerequisites

Apicurio Registry is installed
The schema is registered with Apicurio Registry

Procedure

Configure the client with the URL of Apicurio Registry. For example:

String registryUrl = "https://registry.example.com/apis/registry/v2";
Properties props = new Properties();
props.putIfAbsent(SerdeConfig.REGISTRY_URL, registryUrl);

Configure the client with the serializer, and the strategy to look up the schema in Apicurio Registry. For example:

props.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, "my-cluster-kafka-bootstrap:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, AvroKafkaSerializer.class.getName()); (1)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, AvroKafkaSerializer.class.getName()); (2)
props.put(SerdeConfig.FIND_LATEST_ARTIFACT, Boolean.TRUE); (3)

1	The serializer for the message key provided by Apicurio Registry.
2	The serializer for the message value provided by Apicurio Registry.
3	The lookup strategy to find the global ID for the schema.

Using a schema from a Kafka Streams application

This procedure describes how to configure a Kafka Streams client written in Java to use an Apache Avro schema from Apicurio Registry.

Prerequisites

Apicurio Registry is installed
The schema is registered with Apicurio Registry

Procedure

Create and configure a Java client with the Apicurio Registry URL:

String registryUrl = "https://registry.example.com/apis/registry/v2";

RegistryService client = RegistryClient.cached(registryUrl);

Configure the serializer and deserializer:

Serializer<LogInput> serializer = new AvroKafkaSerializer<LogInput>(); (1)

Deserializer<LogInput> deserializer = new AvroKafkaDeserializer <LogInput>(); (2)

Serde<LogInput> logSerde = Serdes.serdeFrom(
    serializer,
    deserializer
);

Map<String, Object> config = new HashMap<>();
config.put(SerdeConfig.REGISTRY_URL, registryUrl);
config.put(AvroKafkaSerdeConfig.USE_SPECIFIC_AVRO_READER, true);
logSerde.configure(config, false); (3)

1	The Avro serializer provided by Apicurio Registry.
2	The Avro deserializer provided by Apicurio Registry.
3	Configures the Apicurio Registry URL and the Avro reader for deserialization in Avro format.

Create the Kafka Streams client:

KStream<String, LogInput> input = builder.stream(
    INPUT_TOPIC,
    Consumed.with(Serdes.String(), logSerde)
);