Exporting Apicurio Registry Kafka topic data
One of the supported Apicurio Registry storage options is Apache Kafka, which uses a Kafka topic named kafkasql-journal
to store data.
If you encounter a problem when using this storage option and want to report it to Apicurio Registry developers, you might need to provide an export of the data present in the kafkasql-journal
topic for analysis.
This document contains information on how to create such a topic export using the kcat
tool (formerly known as kafkacat
).
-
Kafka has been installed and is running in your environment.
-
You have deployed Apicurio Registry with data stored in the
kafkasql-journal
topic. -
The
kafkasql-journal
topic is still present.
Setting up kcat on a Kubernetes work pod
-
Your environment is Kubernetes.
-
You have logged in to the cluster using the
kubectl
command line interface.
-
Select a namespace where an ephemeral work pod will be started. This can be the same or a different namespace from where the Kafka cluster is deployed:
kubectl config set-context --current --namespace=default
-
Create an ephemeral work pod using the latest Fedora image, and connect to the pod using your terminal:
kubectl run work-pod -it --rm --image=fedora --restart=Never
If you keep the
--rm
flag, the work pod will be deleted when you disconnect from the remote terminal. -
You can install
kcat
using thednf
package manager. However, that version does not have JSON support enabled. Because you want to export the topic data in a JSON format with additional metadata, you must build thekcat
executable from source.In addition, while the
kcat
project is widely used for this use case, this project seems to be hibernated, and you require an additional feature for thekafkasql-journal
topic export to work properly. This feature is support for base64 encoded keys and values, and is important because the topic includes raw binary data, which might not be correctly encoded in the JSON output.Therefore, you must buildkcat
from source that includes base64 support, which has not been merged into the main project yet.Install
git
, and check out thekcat
repository:dnf install -y git git clone https://github.com/edenhill/kcat.git git remote add jjlin https://github.com/jjlin/kcat.git cd kcat git checkout jjlin/base64
-
Install the dependencies and build
kcat
:dnf install -y gcc librdkafka-devel yajl-devel ./configure make
-
Copy the executable to
/usr/bin
so that it is available in$PATH
:cp kcat /usr/bin
-
Configure environment variables that will be used in subsequent examples:
export KAFKA_BOOTSTRAP_SERVER="my-kafka-cluster-kafka-bootstrap.default.svc:9092"
If you do not require JSON support, you can use the following commands to install
|
Examples of using kcat
The following are several examples of how to use kcat
, including creation of a topic export:
-
List Kafka topics:
kcat -b $KAFKA_BOOTSTRAP_SERVER -L | grep "topic " | sed 's#\([^"]*"\)\([^"]*\)\(".*\)#\2#'
The
sed
command filters out extra information in this example. -
Export data from the
kafkasql-journal
topic in JSON format, with envelope, and base64 encoded keys and values:kcat -b $KAFKA_BOOTSTRAP_SERVER -C -t kafkasql-journal -S base64 -Z -D \\n -e -J \ > kafkasql-journal.topicdump
-
Create an export file for each listed topic by combining the preceding commands:
mkdir dump for t in $(kcat -b $KAFKA_BOOTSTRAP_SERVER -L | grep "topic " | sed 's#\([^"]*"\)\([^"]*\)\(".*\)#\2#'); do \ kcat -b $KAFKA_BOOTSTRAP_SERVER -C -t $t -S base64 -Z -D \\n -e -J > dump/$t.topicdump; \ done
Copy topic export files from the work pod
After the topic export files have been created, you can run the following command on your local machine to copy the files from the work pod:
kubectl cp work-pod:/kcat/dump .
Importing the kafkasql-journal topic data
To import kafkasql-journal
topic data that has been created with kcat
, use an application from the Apicurio Registry examples repository as follows:
git clone https://github.com/Apicurio/apicurio-registry.git
cd examples/tools/kafkasql-topic-import
mvn clean install
export VERSION=$(mvn help:evaluate -Dexpression=project.version -q -DforceStdout)
java -jar target/apicurio-registry-tools-kafkasql-topic-import-$VERSION-jar-with-dependencies.jar -b <optional-kafka-bootstrap-server-url> -f <path-to-topic-dump-file>
-
For more details about
kcat
, see the kcat repository. -
You can provide additional parameters to configure
kcat
for accessing Kafka, in the-X property=value
format. For the list of parameters, see the librdkafka configuration reference.