How to use the Livy Spark REST Job Server API for doing some interactive Spark with curl

How to use the Livy Spark REST Job Server API for doing some interactive Spark with curl

Livy is an open source REST interface for using Spark from anywhere.

Note: Livy is not supported in CDH, only in the upstream Hue community.


It supports executing snippets of code or programs in a Spark Context that runs locally or in YARN. This makes it ideal for building applications or Notebooks that can interact with Spark in real time. For example, it is currently used for powering the Spark snippets of the Hadoop Notebook in Hue.

In this post we see how we can execute some Spark 1.5 snippets in Python.



Livy sits between the remote users and the Spark cluster


Starting the REST server

Based on the README, we check out Livy’s code. It is currently living in Hue repository for simplicity but hopefully will eventually graduate in its top project.

git clone [email protected]:cloudera/hue.git

Then we compile Livy with

cd hue/apps/spark/java
mvn -DskipTests clean package

Export these variables

export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf

And start it


Note: Livy defaults to Spark local mode, to use the YARN mode copy the configuration template file apps/spark/java/conf/livy-defaults.conf.tmpl into livy-defaults.conf and set the property:

livy.server.session.factory = yarn


Executing some Spark

As the REST server is running, we can communicate with it. We are on the same machine so will use ‘localhost’ as the address of Livy.

Let’s list our open sessions

curl localhost:8998/sessions


You can use

 | python -m json.tool

at the end of the command to prettify the output, e.g.:

curl localhost:8998/sessions/0 | python -m json.tool


There is zero session. We create an interactive PySpark session

curl -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" localhost:8998/sessions



Sessions ids are incrementing numbers starting from 0. We can then reference the session later by its id.

Livy supports the three languages of Spark:

Kinds Languages
spark Scala
pyspark Python
sparkr R


We check the status of the session until its state becomes idle: it means it is ready to be execute snippet of PySpark:

curl localhost:8998/sessions/0 | python -m json.tool

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                Dload  Upload   Total   Spent    Left  Speed

100  1185    0  1185    0     0  72712      0 --:--:-- --:--:-- --:--:-- 79000


    "id": 5,

    "kind": "pyspark",

    "log": [

       "15/09/03 17:44:14 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.",

       "15/09/03 17:44:14 INFO ui.SparkUI: Started SparkUI at",

       "15/09/03 17:44:14 INFO spark.SparkContext: Added JAR file:/home/romain/projects/hue/apps/spark/java-lib/livy-assembly.jar at with timestamp 1441327454666",

       "15/09/03 17:44:14 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because is not set.",

       "15/09/03 17:44:14 INFO executor.Executor: Starting executor ID driver on host localhost",

       "15/09/03 17:44:14 INFO util.Utils: Successfully started service '' on port 54584.",

       "15/09/03 17:44:14 INFO netty.NettyBlockTransferService: Server created on 54584",

       "15/09/03 17:44:14 INFO storage.BlockManagerMaster: Trying to register BlockManager",

       "15/09/03 17:44:14 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:54584 with 530.3 MB RAM, BlockManagerId(driver, localhost, 54584)",

       "15/09/03 17:44:15 INFO storage.BlockManagerMaster: Registered BlockManager"


    "state": "idle"




In YARN mode, Livy creates a remote Spark Shell in the cluster that can be accessed easily with REST


When the session state is idle, it means it is ready to accept statements! Lets compute 1 + 1

curl localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"1 + 1"}'


We check the result of statement 0 when its state is available

curl localhost:8998/sessions/0/statements/0



If the statement is taking less than a few milliseconds, Livy returns the response directly in the response of the POST command.

Statements are incrementing and all share the same context, so we can have a sequences

curl localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"a = 10"}'


Spanning multiple statements

curl localhost:8998/sessions/5/statements -X POST -H 'Content-Type: application/json' -d '{"code":"a + 1"}'



Let’s close the session to free up the cluster. Note that Livy will automatically inactive idle sessions after 1 hour (configurable).

curl localhost:8998/sessions/0 -X DELETE




Let’s say we want to create a shell running as the user bob, this is particularly useful when multi users are sharing a Notebook server

curl -X POST --data '{"kind": "pyspark", "proxyUser": "bob"}' -H "Content-Type: application/json" localhost:8998/sessions

Do not forget to add the user running Hue (your current login in dev or hue in production) in the Hadoop proxy user list (/etc/hadoop/conf/core-site.xml):


Additional properties

All the properties supported by spark shells like the number of executors, the memory, etc can be changed at session creation. Their format is the same as when typing spark-shell -h

curl -X POST --data '{"kind": "pyspark", "numExecutors": "3", "executorMemory": "2G"}' -H "Content-Type: application/json" localhost:8998/sessions


And that’s it! Next time we will explore some more advanced features like the magic keywords for introspecting data or printing images. Then, we will detail how to do batch submissions in compiled Scala, Java or Python (i.e. jar or py files).

The architecture of Livy was presented for the first time at Big Data Scala by the Bay last August and next updates will be at the Spark meetup before Strata NYC and Spark Summit in Amsterdam.


Feel free to ask any questions about the architecture, usage of the server in the comments, @gethue or the hue-user list. And pull requests are always welcomed!




  1. Peter Rudenko 4 years ago

    Does it support multiple spark contexts?

    • Hue Team 4 years ago

      Livy can manage multiple spark sessions, which each have their own contexts, but at the moment it doesn’t support a single session having multiple contexts.

  2. Ruslan 4 years ago

    That’s a great feature.
    Would Hue / Livy Server close a Spark Context once user closes a Spark Notebook page?
    Similarly as it closes e.g. connection to Impala when a page with SQL is closed.
    Otherwise I can see eventually we’ll have a lot of orphan sessions, and yarn resources will be exhausted quickly.

    • Hue Team 4 years ago

      You shouldn’t have to worry about that. Livy has two mechanisms to deal with this. First, closing a session will tear down the Spark Context. Second, there is an timer that will kill sessions if they haven’t received any activity in the past hour. This is configurable with the `livy.server.session.timeout` option.

      • Ruslan 4 years ago

        Thanks for explaining. That sounds great. Hadoop Notebooks is the most anticipated feature of Hue 3.9 release (we wait for CDH 5.5 to be released). Livy Server is available in 3.9 too, not just 3.10, right?

        • Hue Team 4 years ago

          Yes, Livy is available in 3.9. It’s in active development, so if you do run into any problems, please also check the master branch to see if we’ve already fixed whatever problems you may encounter.

  3. sumit 4 years ago

    Does it need – Cloudera to be installed
    Can I try it on a Cloudera VM just for testing purposes ?

  4. sumit 4 years ago

    Thanks for the answers – appreciated – some more questions please

    – I have explored Spark Job Server from OOYALA – how is this different currently or in the future – from Spark Job Server ?

    • Hue Team 4 years ago

      The main use case of Livy is to launch and interactive spark shells inside YARN, whereas the last I checked, the Ooyala Job Server is mainly about launching batch jar Spark jobs (which Livy can also do). So no need to implement an interface and compile your code, you can just submit snippets. Livy supports PySpark and R too. Also Livy runs the Drivers in the YARN cluster, so if Livy crashes we don’t lose the current jobs. There are plans to integrate with additional backends/protocols.

  5. Ashish 4 years ago

    Is there a way to submit python script (.py) with the post request instead of writing raw code as {“code”: “a+1”}?

  6. Damien Carol 4 years ago

    Does Livy can use dynamic allocation with YARN?

  7. lonely7345 3 years ago

    when i use hue notebook to execute a scala program for spark,it’s successful ,
    but when i open another browser to execute the same programe ,it’s faiulture.
    the /api/newsessions return the code 504 Gateway Timout.

    I must kill the session in the first page.and the other will be running.
    the livy server only support one session?

    • Hue Team 3 years ago

      It supports multiple sessions. Are you using the very latest version?

  8. sashi 3 years ago

    Hi Im getting the following error when trying to access spark notebook from hue.

    Error: org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at: org.apache.spark.SparkContext.(SparkContext.scala:81) com.cloudera.hue.livy.repl.scala.interpreter.Interpreter.start(Interpreter.scala:75) com.cloudera.hue.livy.repl.scala.SparkSession.(SparkSession.scala:41) com.cloudera.hue.livy.repl.scala.SparkSession$.create(SparkSession.scala:31) com.cloudera.hue.livy.repl.ScalatraBootstrap.init(Main.scala:106) org.scalatra.servlet.ScalatraListener.configureCycleClass(ScalatraListener.scala:67)

    Can you please help me in resolving the error :

  9. Ravi 3 years ago

    I am getting gateway timeout when I am running curl -X POST –data ‘{“kind”: “pyspark”}’ -H “Content-Type: application/json” localhost:8998/sessions
    before that every thing working perfectly fine. What could be the reason.

  10. Federico Ponzi 3 years ago

    For some reason, this code is not working:
    curl localhost:8998/sessions/2/statements -X POST -H ‘Content-Type: application/json’ -d ‘{“code”:”def r():\n i=1 + 1\n return i\n”}’

    Basically it dosen’t work if a statement like def or for have more than one line of definition. I need to run a larger program using livy. How to resolve this? I need also Impersonation feature

    Thanks a lot for help

  11. Riya 3 years ago

    Hi team,

    What is the difference between “livy.server.session.factory = yarn” mode and “livy.server.session.factory = process” mode. And I believe if this property is not set value defaults to local.How the execution is different in these three cases ?

    • Hue Team 3 years ago

      Like the Spark options: yarn-cluster mode or local mode

    • Hue Team 3 years ago

      Thanks & Updated!

  12. lidl 3 years ago

    Hi team.
    I set conf `ivy.server.session.factory = yarn`
    But when I create session, response message is :
    `URI ‘local:spark’ is not supported by any registered client factories.`

    Any advise?

  13. Mahdi 3 years ago


    I have couple of questions

    1) How do I run livy server so when I close my session it doesn’t stop? Right now, I have to SSH to one of the hosts and run Livy server and then I can use it.

    2) which node of the cluster I should do the livy server installation? which role? My Spark home is set to /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/spark/ is this a correct practice ?

    • Author
      Hue Team 3 years ago

      #1 Hue automatically sends a close to tell Livy to close the session when leaving the Notebook
      #2 Livy is not in CDH yet, so its does not have any role. Any host with a Spark Gateway will work

  14. Carlos Barichello 3 years ago

    This web page shows that statements are incrementing and all share the same context. It then goes on to show an example where the variable “a” gets a value of 10 in session 0. Next a statement in session 5 adds “a” to 1. That implies that a variable spans sessions. Did you mean for the second line (a+1) to be in session 0 instead of session 5?

    • Author
      Hue Team 3 years ago

      There is only one session here but multiple snippets of code. Snippets could be all in the same box, but by putting them in individual box they can be executed not all at the same time.

  15. hwy 3 years ago

    Hi ,I am using hue3.11,how can i config spark sql on hue with impersonation? i hava configed spark sql throuth spark thriftserver,thanks ,best regard!

    • Author
      Hue Team 3 years ago

      In the hue.ini config, you would need to configure and uncomment:



  16. Diego 2 years ago

    Regarding comparison with Spark Job Server.

    I’d like a live application that would allow me to run jars, but keeping a Spark Context alive, mainly to keep some variables broadcast in memory alive between Spark jobs. This is possible in Spark Job Server. How would this be done in Livy?

    • Author
      Hue Team 2 years ago

      I would recommend to ask the people, the project was moved there!

  17. Azer ILA 2 years ago


    I want to write an interactive spark driver that accomplish some operations such as reading from hdfs to data frame and execute some query on it and I want an external client (e.g. an open source dashboard or a java program ) to communicate with driver program, send request and receive results in form of some format such as JSON. Can I develop this scenario with livy.

  18. shaozhipeng 2 years ago

    I need help.When livy server started.I exe curl bigdata1:8998/sessions then
    curl -X POST –data ‘{“kind”: “pyspark”}’ -H “Content-Type: application/json” bigdata1:8998/sessions
    curl bigdata1:8998/sessions/0 | python -m json.tool
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    100 309 100 309 0 0 10953 0 –:–:– –:–:– –:–:– 11444
    “appId”: “application_1482889733217_0003”,
    “appInfo”: {
    “driverLogUrl”: “http://bigdata3:8042/node/containerlogs/container_1482889733217_0003_01_000001/hadoop”,
    “sparkUiUrl”: “http://bigdata1:8088/proxy/application_1482889733217_0003/”
    “id”: 0,
    “kind”: “pyspark”,
    “log”: [],
    “owner”: null,
    “proxyUser”: null,
    “state”: “idle”

    But when curl bigdata1:8998/sessions/0/statements -X POST -H ‘Content-Type: application/json’ -d ‘{“code”:”1 + 1″}’
    curl bigdata1:8998/sessions/0 | python -m json.tool
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    100 310 100 310 0 0 8478 0 –:–:– –:–:– –:–:– 9117
    “id”: 0,
    “appId”: “application_1482975521942_0001”,
    “owner”: null,
    “proxyUser”: null,
    “state”: “error”,
    “kind”: “pyspark”,
    “appInfo”: {
    “driverLogUrl”: “http://bigdata3:8042/node/containerlogs/container_1482975521942_0001_01_000001/hadoop”,
    “sparkUiUrl”: “http://bigdata1:8088/proxy/application_1482975521942_0001/”
    “log”: []

    • Author
      Hue Team 2 years ago

      Livy was moved to its own project:, we recommend to ask Livy specific questions there.

  19. wuchang 2 years ago

    I have configured hue proxyuser in core-site.xml ,but any user login into hue can remove other users’ data



    from notebook pyspark:
    For example , user named ‘hue’ login into hue system, and he new a notebook and the pyspark code is like this:
    import os
    os.system(‘hadoop fs -rm -r /user/appuser/test.dat’)

    this pyspark code can remove the data of user named ‘appuser’.
    the permission info of is:
    -rw-r–r– 2 appuser supergroup 0 2017-05-02 10:12 /user/appuser/test.

    my hue version is 3.11.0

    Anyone can give me some suggestions?

    • Author
      Hue Team 2 years ago

      In Job Browser, what is the username of the running Spark job?

  20. David 2 years ago

    When I try to use … POST –data ‘{“kind”:”pyspark”}’ -H “Content-Type: application/json” … (as was instructed in the create an interactive pyspark section), I am getting an error that says that ‘kind’ is not a recognizable field. Why could this be?

  21. Ute 2 years ago

    Can I run python code using the same Livy session in parallel in Spark ?

  22. Harun 9 months ago


    I am trying to create an application using livy where different users can run their spark code. The setup consists basically of one server that submits statements to the livy server and I’ve set up a KDC server to authenticate the server that submits the jobs.

    But my question would be, how is security handled by livy. As far as I tested, the JVM running the session has file access and also network access, so how could I restrict the created sessions to access only specific folders and specific IP addresses? Is that possible?

    I set up livy with the following configuration btw:
    livy.environment production
    livy.impersonation.enabled true
    livy.server.port 8998
    livy.server.session.timeout 3600000
    livy.server.auth.kerberos.keytab /path/to/your/livy_bin/key.keytab/new.keytab
    livy.server.auth.kerberos.principal HTTP/[email protected]
    livy.server.auth.type kerberos
    livy.server.access_control.enabled = true
    livy.server.access_control.users = livy

    And also, if I used livy with multiple kerberos principals and an HDFS Active Directory, would that be safe enough to allow people impersonate and only access their own files,

    Any help appriciated,

  23. Harun 9 months ago


    We are trying to build a setup where we have a server that submits jobs of different users to the Livy server via the REST API. We established a kerberos server to authenticate against livy. But we want to prohibit the users to access a different users’ data, the filesystem, and the network.

    My question would then be, how secure is livy? Users can inject custom code to run on livy, but this gives them the ability to access the filesystem on the host the livy server resides in. That could be potentially dangerous from my point of view, they could potentially access the keytab on the livy server also.

    I know that the session created creates also a JVM, so one session lives in a JVM, and could I change the security settings of that JVM to only access specific paths and specific IP addresses only? Would that mean for me to change the source code of livy?

    And in the case of using HDFS with active directory to secure the datasystem, so that users need to specify a kerberos key to access their files, how could I manage multiple principals in one server, to get this working?

    Any help to any of the questions is very much appriciated,

    Thanks in forehand

  24. turbo 5 months ago

    how much data can be returned by livy ?
    exm: i want to use livy session query hive but i worried the result is very big can be returned or not
    may be TB level or GB lervel
    thanks guys!

  25. Geena Fernandez 2 months ago

    I am trying to run the below code,
    curl localhost:8998/sessions/0/statements -X POST -H ‘Content-Type: application/json’ -d ‘{“code”:”import org.apache.spark.SparkConf\nimportContext\nval sqlContext = new HiveContext(sc)\nimport sqlContext.implicits._\nval resultDF = sqlContext.sql(“show databases”)\”}’

    But i am getting an error

    “Unexpected character (‘s’ (code 115)): was expecting comma to separate Object entries\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 230]”
    line:1, column 230 is “val sqlContext”

    Can you please help?

    Also is there a different way to run spark-sql queries through Livy?

  26. Shyam 1 month ago

    I tried this both on local and yarn mode and it kept giving me this error. Would anyone know why this happens?
    I run this commands in sequence and the respective outputs

    [[email protected] bin]# curl localhost:8998/sessions/0 | python -m json.tool
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    100 24 100 24 0 0 132 0 –:–:– –:–:– –:–:– 134
    “Session ‘0’ not found.”

    [[email protected] bin]# curl localhost:8998/sessions/0 | python -m json.tool
    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    100 142 100 142 0 0 6095 0 –:–:– –:–:– –:–:– 6454
    “appId”: null,
    “appInfo”: {
    “driverLogUrl”: null,
    “sparkUiUrl”: null
    “id”: 0,
    “kind”: “pyspark”,
    “log”: [],
    “owner”: null,
    “proxyUser”: null,
    “state”: “idle”

    [[email protected] bin]# curl localhost:8998/sessions/0/statements -X POST -H ‘Content-Type: application/json’ -d ‘{“code”:”1 + 1″}’
    {“id”:0,”state”:”available”,”output”:{“status”:”error”,”execution_count”:0,”ename”:”Error”,”evalue”:”Interpreter died:\n”,”traceback”:[]}}

Leave a reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.