Exporting Apicurio Registry Kafka topic data
One of the supported Apicurio Registry storage options is Apache Kafka, which uses a Kafka topic named
kafkasql-journal to store data.
If you encounter a problem when using this storage option and want to report it to Apicurio Registry developers, you might need to provide an export of the data present in the
kafkasql-journal topic for analysis.
This document contains information on how to create such a topic export using the
kcat tool (formerly known as
Kafka has been installed and is running in your environment.
You have deployed Apicurio Registry with data stored in the
kafkasql-journaltopic is still present.
Your environment is Kubernetes.
You have logged in to the cluster using the
kubectlcommand line interface.
Select a namespace where an ephemeral work pod will be started. This can be the same or a different namespace from where the Kafka cluster is deployed:
kubectl config set-context --current --namespace=default
Create an ephemeral work pod using the latest Fedora image, and connect to the pod using your terminal:
kubectl run work-pod -it --rm --image=fedora --restart=Never
If you keep the
--rmflag, the work pod will be deleted when you disconnect from the remote terminal.
You can install
dnfpackage manager. However, that version does not have JSON support enabled. Because you want to export the topic data in a JSON format with additional metadata, you must build the
kcatexecutable from source.
In addition, while the
kcatproject is widely used for this use case, this project seems to be hibernated, and you require an additional feature for the
kafkasql-journaltopic export to work properly. This feature is support for base64 encoded keys and values, and is important because the topic includes raw binary data, which might not be correctly encoded in the JSON output.Therefore, you must build
kcatfrom source that includes base64 support, which has not been merged into the main project yet.
git, and check out the
dnf install -y git git clone https://github.com/edenhill/kcat.git git remote add jjlin https://github.com/jjlin/kcat.git cd kcat git checkout jjlin/base64
Install the dependencies and build
dnf install -y gcc librdkafka-devel yajl-devel ./configure make
Copy the executable to
/usr/binso that it is available in
cp kcat /usr/bin
Configure environment variables that will be used in subsequent examples:
If you do not require JSON support, you can use the following commands to install
The following are several examples of how to use
kcat, including creation of a topic export:
List Kafka topics:
kcat -b $KAFKA_BOOTSTRAP_SERVER -L | grep "topic " | sed 's#\([^"]*"\)\([^"]*\)\(".*\)#\2#'
sedcommand filters out extra information in this example.
Export data from the
kafkasql-journaltopic in JSON format, with envelope, and base64 encoded keys and values:
kcat -b $KAFKA_BOOTSTRAP_SERVER -C -t kafkasql-journal -S base64 -Z -D \\n -e -J \ > kafkasql-journal.topicdump
Create an export file for each listed topic by combining the preceding commands:
mkdir dump for t in $(kcat -b $KAFKA_BOOTSTRAP_SERVER -L | grep "topic " | sed 's#\([^"]*"\)\([^"]*\)\(".*\)#\2#'); do \ kcat -b $KAFKA_BOOTSTRAP_SERVER -C -t $t -S base64 -Z -D \\n -e -J > dump/$t.topicdump; \ done
After the topic export files have been created, you can run the following command on your local machine to copy the files from the work pod:
kubectl cp work-pod:/kcat/dump .
kafkasql-journal topic data that has been created with
kcat, use an application from the Apicurio Registry examples repository as follows:
git clone https://github.com/Apicurio/apicurio-registry-examples.git cd apicurio-registry-examples/tools/kafkasql-topic-import mvn clean install export VERSION=$(mvn help:evaluate -Dexpression=project.version -q -DforceStdout) java -jar target/apicurio-registry-tools-kafkasql-topic-import-$VERSION-jar-with-dependencies.jar -b <optional-kafka-bootstrap-server-url> -f <path-to-topic-dump-file>