Tutorial on querying live streams of data with ksql (Kafka SQL)

Published on 20 October 2020 in Tutorial / Version 4.9 / ksqlDB - 3 minutes read - Last modified on 22 March 2023 - Read in jp

Real time data querying is becoming a modern standard. Who wants to wait until the next day or week when needing to take decision now?

The stream of data comes from an Apache Kafka topic which can be queried via ksqlDB.

Components

To keep things simple, all the pieces have been put together in a “one-click” Docker Compose project which contains:

Stream SQL Editor

One-line setup

For fetching the configurations and starting everything:

mkdir stream-sql-demo
cd stream-sql-demo
wget https://raw.githubusercontent.com/romainr/query-demo/master/stream-sql-demo/docker-compose.yml


docker-compose up -d
>
Creating network "stream-sql-demo_default" with the default driver
Creating hue-database                  ... done
Creating stream-sql-demo_jobmanager_1 ... done
Creating stream-sql-demo_mysql_1       ... done
Creating ksqldb-server                 ... done
Creating stream-sql-demo_zookeeper_1   ... done
Creating flink-sql-api                 ... done
Creating stream-sql-demo_taskmanager_1 ... done
Creating hue                           ... done
Creating ksqldb-cli                    ... done
Creating stream-sql-demo_kafka_1       ... done
Creating stream-sql-demo_datagen_1     ... done

Then the Hue Editor will be up at http://localhost:8888/.

As well as the ksqlDB API:

curl http://localhost:8088/info
> {"KsqlServerInfo":{"version":"0.12.0","kafkaClusterId":"DJzUX-zaTDCC5lqfVwf8kw","ksqlServiceId":"default_","serverStatus":"RUNNING"}}

For stopping everything:

docker-compose down

Query Experience

Notice that the Live SQL requires the New Editor which is in beta. In addition to soon offer multiple statements running at the same time on the same editor page and more robustness, it also bring the live result grid.

More improvements are on the way, in particular in the SQL autocomplete and Editor 2. In the future, the Task Server with Web Sockets will allow long running queries to run as separate tasks and prevent them from timing-out in the API server.

Note

In case you have an existing Hue Editor and want to point to the ksqlDB, just activate it via this config change:

[desktop]
enable_hue_5=true

[[interpreters]]

[[[ksqlDB]]]
name=ksqlDB
interface=ksql
options='{"url": "http://localhost:8088"}'

ksql

One nicety of ksqDB is its close integration with Kafka, for example we can list the topics:

SHOW TOPICS

The SQL syntax is a bit different but here is one way to create a similar table as above:

CREATE STREAM user_behavior (
  user_id BIGINT,
  item_id BIGINT,
  category_id BIGINT,
  behavior STRING,
  ts STRING
) WITH (kafka_topic='user_behavior', value_format='json', partitions=1)

And peek at it:

SELECT *
FROM user_behavior
EMIT CHANGES
LIMIT 30

In another statement within Hue's Editor or by booting the SQL shell:

docker exec -it ksqldb-cli ksql http://ksqldb-server:8088

You can also insert your own records and notice the live updates of the results:

INSERT INTO user_behavior (
  user_id ,
  item_id ,
  category_id ,
  behavior ,
  ts
)
VALUES
(1, 10, 20, 'buy', '1602998392')

 

In the next episodes, we will demo how to easily create tables directly from raw streams of data via the importer.

Any feedback or question? Feel free to comment here!

All these projects are also open source and welcome feedback and contributions. In the case of the Hue editor the Forum or Github issues are good places for that.

Onwards!

Romain


comments powered by Disqus

More recent stories

03 May 2023
Discover the power of Apache Ozone using the Hue File Browser
Read More
23 January 2023
Hue 4.11 and its new dialects and features are out!
Read More