How to use the Livy Spark REST Job Server API for doing some interactive Spark with curl

How to use the Livy Spark REST Job Server API for doing some interactive Spark with curl

Livy is an open source REST interface for using Spark from anywhere.

Note: Livy is not supported in CDH, only in the upstream Hue community.


It supports executing snippets of code or programs in a Spark Context that runs locally or in YARN. This makes it ideal for building applications or Notebooks that can interact with Spark in real time. For example, it is currently used for powering the Spark snippets of the Hadoop Notebook in Hue.

In this post we see how we can execute some Spark 1.5 snippets in Python.



Livy sits between the remote users and the Spark cluster


Starting the REST server

Based on the README, we check out Livy’s code. It is currently living in Hue repository for simplicity but hopefully will eventually graduate in its top project.

git clone [email protected]:cloudera/hue.git

Then we compile Livy with

cd hue/apps/spark/java
mvn -DskipTests clean package

Export these variables

export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf

And start it


Note: Livy defaults to Spark local mode, to use the YARN mode copy the configuration template file apps/spark/java/conf/livy-defaults.conf.tmpl into livy-defaults.conf and set the property:

livy.server.session.factory = yarn


Executing some Spark

As the REST server is running, we can communicate with it. We are on the same machine so will use ‘localhost’ as the address of Livy.

Let’s list our open sessions

curl localhost:8998/sessions


You can use

 | python -m json.tool

at the end of the command to prettify the output, e.g.:

curl localhost:8998/sessions/0 | python -m json.tool


There is zero session. We create an interactive PySpark session

curl -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" localhost:8998/sessions



Sessions ids are incrementing numbers starting from 0. We can then reference the session later by its id.

Livy supports the three languages of Spark:

Kinds Languages
spark Scala
pyspark Python
sparkr R


We check the status of the session until its state becomes idle: it means it is ready to be execute snippet of PySpark:

curl localhost:8998/sessions/0 | python -m json.tool

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                Dload  Upload   Total   Spent    Left  Speed

100  1185    0  1185    0     0  72712      0 --:--:-- --:--:-- --:--:-- 79000


    "id": 5,

    "kind": "pyspark",

    "log": [

       "15/09/03 17:44:14 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.",

       "15/09/03 17:44:14 INFO ui.SparkUI: Started SparkUI at",

       "15/09/03 17:44:14 INFO spark.SparkContext: Added JAR file:/home/romain/projects/hue/apps/spark/java-lib/livy-assembly.jar at with timestamp 1441327454666",

       "15/09/03 17:44:14 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because is not set.",

       "15/09/03 17:44:14 INFO executor.Executor: Starting executor ID driver on host localhost",

       "15/09/03 17:44:14 INFO util.Utils: Successfully started service '' on port 54584.",

       "15/09/03 17:44:14 INFO netty.NettyBlockTransferService: Server created on 54584",

       "15/09/03 17:44:14 INFO storage.BlockManagerMaster: Trying to register BlockManager",

       "15/09/03 17:44:14 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:54584 with 530.3 MB RAM, BlockManagerId(driver, localhost, 54584)",

       "15/09/03 17:44:15 INFO storage.BlockManagerMaster: Registered BlockManager"


    "state": "idle"




In YARN mode, Livy creates a remote Spark Shell in the cluster that can be accessed easily with REST


When the session state is idle, it means it is ready to accept statements! Lets compute 1 + 1

curl localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"1 + 1"}'


We check the result of statement 0 when its state is available

curl localhost:8998/sessions/0/statements/0



If the statement is taking less than a few milliseconds, Livy returns the response directly in the response of the POST command.

Statements are incrementing and all share the same context, so we can have a sequences

curl localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"a = 10"}'


Spanning multiple statements

curl localhost:8998/sessions/5/statements -X POST -H 'Content-Type: application/json' -d '{"code":"a + 1"}'



Let’s close the session to free up the cluster. Note that Livy will automatically inactive idle sessions after 1 hour (configurable).

curl localhost:8998/sessions/0 -X DELETE




Let’s say we want to create a shell running as the user bob, this is particularly useful when multi users are sharing a Notebook server

curl -X POST --data '{"kind": "pyspark", "proxyUser": "bob"}' -H "Content-Type: application/json" localhost:8998/sessions

Do not forget to add the user running Hue (your current login in dev or hue in production) in the Hadoop proxy user list (/etc/hadoop/conf/core-site.xml):


Additional properties

All the properties supported by spark shells like the number of executors, the memory, etc can be changed at session creation. Their format is the same as when typing spark-shell -h

curl -X POST --data '{"kind": "pyspark", "numExecutors": "3", "executorMemory": "2G"}' -H "Content-Type: application/json" localhost:8998/sessions


And that’s it! Next time we will explore some more advanced features like the magic keywords for introspecting data or printing images. Then, we will detail how to do batch submissions in compiled Scala, Java or Python (i.e. jar or py files).

The architecture of Livy was presented for the first time at Big Data Scala by the Bay last August and next updates will be at the Spark meetup before Strata NYC and Spark Summit in Amsterdam.


Feel free to ask any questions about the architecture, usage of the server in the comments, @gethue or the hue-user list. And pull requests are always welcomed!