Tutorial on querying live streams of data with ksql (Kafka SQL)

Published on 20 October 2020 in Querying / Tutorial / Version 4.9 - 3 minutes read - Last modified on 23 October 2020

Real time data querying is becoming a modern standard. Who wants to wait until the next day or week when needing to take decision now?

The stream of data comes from an Apache Kafka topic which can be queried via ksqlDB.

Components

To keep things simple, all the pieces have been put together in a “one-click” Docker Compose project which contains:

Stream SQL Editor

One-line setup

For fetching the configurations and starting everything:

mkdir stream-sql-demo
cd stream-sql-demo
wget https://raw.githubusercontent.com/romainr/query-demo/master/stream-sql-demo/docker-compose.yml


docker-compose up -d
>
Creating network "stream-sql-demo_default" with the default driver
Creating hue-database                  ... done
Creating stream-sql-demo_jobmanager_1 ... done
Creating stream-sql-demo_mysql_1       ... done
Creating ksqldb-server                 ... done
Creating stream-sql-demo_zookeeper_1   ... done
Creating flink-sql-api                 ... done
Creating stream-sql-demo_taskmanager_1 ... done
Creating hue                           ... done
Creating ksqldb-cli                    ... done
Creating stream-sql-demo_kafka_1       ... done
Creating stream-sql-demo_datagen_1     ... done

Then the Hue Editor will be up at http://localhost:8888/.

As well as the ksqlDB API:

curl http://localhost:8088/info
> {"KsqlServerInfo":{"version":"0.12.0","kafkaClusterId":"DJzUX-zaTDCC5lqfVwf8kw","ksqlServiceId":"default_","serverStatus":"RUNNING"}}

For stopping everything:

docker-compose down

Query Experience

Notice that the Live SQL requires the New Editor which is in beta. In addition to soon offer multiple statements running at the same time on the same editor page and more robustness, it also bring the live result grid.

More improvements are on the way, in particular in the SQL autocomplete and Editor 2. In the future, the Task Server with Web Sockets will allow long running queries to run as separate tasks and prevent them from timing-out in the API server.

Note

In case you have an existing Hue Editor and want to point to the ksqlDB, just activate it via this config change:

[notebook]
enable_notebook_2=true

[[interpreters]]

[[[ksqlDB]]]
name=ksqlDB
interface=ksql
options='{"url": "http://localhost:8088"}'

ksql

One nicety of ksqDB is its close integration with Kafka, for example we can list the topics:

SHOW TOPICS

The SQL syntax is a bit different but here is one way to create a similar table as above:

CREATE STREAM user_behavior (
  user_id BIGINT,
  item_id BIGINT,
  category_id BIGINT,
  behavior STRING,
  ts STRING
) WITH (kafka_topic='user_behavior', value_format='json', partitions=1)

And peek at it:

SELECT *
FROM user_behavior
EMIT CHANGES
LIMIT 30

In another statement within Hue's Editor or by booting the SQL shell:

docker exec -it ksqldb-cli ksql http://ksqldb-server:8088

You can also insert your own records and notice the live updates of the results:

INSERT INTO user_behavior (
  user_id ,
  item_id ,
  category_id ,
  behavior ,
  ts
)
VALUES
(1, 10, 20, 'buy', '1602998392')

 

In the next episodes, we will demo how to easily create tables directly from raw streams of data via the importer.

Any feedback or question? Feel free to comment here!

All these projects are also open source and welcome feedback and contributions. In the case of the Hue editor the Forum or Github issues are good places for that.

Onwards!

Romain


comments powered by Disqus

More recent stories

17 November 2020
Easy Querying of live Kafka data in a Big Table like HBase with SQL
Read More
20 October 2020
Tutorial on querying live streams of data with Flink SQL
Read More
20 October 2020
Tutorial on querying live streams of data with ksql (Kafka SQL)
Read More