Exporting Apicurio Registry Kafka topic data

One of the supported Apicurio Registry storage options is Apache Kafka, which uses a Kafka topic named kafkasql-journal to store data. If you encounter a problem when using this storage option and want to report it to Apicurio Registry developers, you might need to provide an export of the data present in the kafkasql-journal topic for analysis. This document contains information on how to create such a topic export using the kcat tool (formerly known as kafkacat).

Prerequisites
  • Kafka has been installed and is running in your environment.

  • You have deployed Apicurio Registry with data stored in the kafkasql-journal topic.

  • The kafkasql-journal topic is still present.

Setting up kcat on a Kubernetes work pod

Prerequisites
  • Your environment is Kubernetes.

  • You have logged in to the cluster using the kubectl command line interface.

Procedure
  1. Select a namespace where an ephemeral work pod will be started. This can be the same or a different namespace from where the Kafka cluster is deployed:

    kubectl config set-context --current --namespace=default
  2. Create an ephemeral work pod using the latest Fedora image, and connect to the pod using your terminal:

    kubectl run work-pod -it --rm --image=fedora --restart=Never

    If you keep the --rm flag, the work pod will be deleted when you disconnect from the remote terminal.

  3. You can install kcat using the dnf package manager. However, that version does not have JSON support enabled. Because you want to export the topic data in a JSON format with additional metadata, you must build the kcat executable from source.

    In addition, while the kcat project is widely used for this use case, this project seems to be hibernated, and you require an additional feature for the kafkasql-journal topic export to work properly. This feature is support for base64 encoded keys and values, and is important because the topic includes raw binary data, which might not be correctly encoded in the JSON output.Therefore, you must build kcat from source that includes base64 support, which has not been merged into the main project yet.

    Install git, and check out the kcat repository:

    dnf install -y git
    git clone https://github.com/edenhill/kcat.git
    git remote add jjlin https://github.com/jjlin/kcat.git
    cd kcat
    git checkout jjlin/base64
  4. Install the dependencies and build kcat:

    dnf install -y gcc librdkafka-devel yajl-devel
    ./configure
    make
  5. Copy the executable to /usr/bin so that it is available in $PATH:

    cp kcat /usr/bin
  6. Configure environment variables that will be used in subsequent examples:

    export KAFKA_BOOTSTRAP_SERVER="my-kafka-cluster-kafka-bootstrap.default.svc:9092"

If you do not require JSON support, you can use the following commands to install kcat using dnf:

dnf install -y "dnf-command(copr)"
dnf copr enable bvn13/kcat
dnf update
dnf install -y kafkacat

Examples of using kcat

The following are several examples of how to use kcat, including creation of a topic export:

  • List Kafka topics:

    kcat -b $KAFKA_BOOTSTRAP_SERVER -L | grep "topic " | sed 's#\([^"]*"\)\([^"]*\)\(".*\)#\2#'

    The sed command filters out extra information in this example.

  • Export data from the kafkasql-journal topic in JSON format, with envelope, and base64 encoded keys and values:

    kcat -b $KAFKA_BOOTSTRAP_SERVER -C -t kafkasql-journal -S base64 -Z -D \\n -e -J \
      > kafkasql-journal.topicdump
  • Create an export file for each listed topic by combining the preceding commands:

    mkdir dump
    for t in $(kcat -b $KAFKA_BOOTSTRAP_SERVER -L | grep "topic " | sed 's#\([^"]*"\)\([^"]*\)\(".*\)#\2#'); do \
      kcat -b $KAFKA_BOOTSTRAP_SERVER -C -t $t -S base64 -Z -D \\n -e -J > dump/$t.topicdump; \
    done

Copy topic export files from the work pod

After the topic export files have been created, you can run the following command on your local machine to copy the files from the work pod:

kubectl cp work-pod:/kcat/dump .

Importing the kafkasql-journal topic data

To import kafkasql-journal topic data that has been created with kcat, use an application from the Apicurio Registry examples repository as follows:

git clone https://github.com/Apicurio/apicurio-registry.git
git checkout {registry-version}.x
cd apicurio-registry/examples/tools/kafkasql-topic-import
mvn clean install
export VERSION=$(mvn help:evaluate -Dexpression=project.version -q -DforceStdout)
java -jar target/apicurio-registry-tools-kafkasql-topic-import-$VERSION-jar-with-dependencies.jar -b <optional-kafka-bootstrap-server-url> -f <path-to-topic-dump-file>
Additional resources