Hadoop / Spark Notebook and Livy REST Job Server improvements!

Hadoop / Spark Notebook and Livy REST Job Server improvements!

The Notebook application as well as the REST Spark Job Server are being revamped. These two components goals are to let users execute Spark in their browser or from anywhere. They are still in beta but next version of Hue will have them graduate. Here are a list of the improvements and a video demo:

  • Revamp of the snippets of the Notebook UI
  • Support for Spark 1.3, 1.4, 1.5
  • Impersonation with YARN
  • Support for R shell
  • Support for submitting jars or python apps

How to play with it?

See in this post how to use the Notebook UI and on this page on how to use the REST Spark Job Server named Livy. The architecture of Livy was recently detailed in a presentation at Big Data Scala by the Bay. Next updates will be at the Spark meetup before Strata NYC and Spark Summit in Amsterdam.

Slicker snippets interface

The snippets now have a new code editor, autocomplete and syntax highlighting. Shortcut links to HDFS paths and Hive tables have been added.

notebook

 

R support

The SparkR shell is now available, and plots can be displayed inline

spark-r-snippet

Support for closing session and specifying Spark properties

All the spark-submit, spark-shell, pyspark, sparkR properties of jobs & shells can be added to the sessions of a Notebook. This will for example let you add files, modules and tweak the memory and number of executors.

notebook-sessions

 

So give this new Spark integration a try and feel free to send feedback on the hue-user list or @gethue!

22 Comments

  1. Jimmy Song 2 years ago

    Hi, I have watched the video, it’s cool! But where can I find the method to configure hue to support spark notebook? I found this page http://gethue.com/new-notebook-application-for-spark-sql-2/?lang=jp, I am surprised why there is only a Japanese page but no English, I remember there must be one.
    The supported language written on the page is :
    # List of available types of snippets
    languages='[{“name”: “Scala”, “type”: “scala”},{“name”: “Python”, “type”: “python”},{“name”: “Impala SQL”, “type”: “impala”},{“name”: “Hive SQL”, “type”: “hive”},{“name”: “Text”, “type”: “text”}]’
    But the videos show a lot of languages, how to configure the other languages? I want to integrate R shell.
    Anyone can answer me ?

  2. Jimmy Song 2 years ago

    Thank you for your reply.
    BTW, attached my environment:
    Centos 7.1, Cloudera Manager 5.5.1, CDH5.5.1 installed with parcels.
    I have a question, how can I kill a spark notebook session gracefully but not kill the spark job? I know every time when I open a new notebook a session is created meanwhile? Am I right?

    • Hue Team 2 years ago

      Right now when you exist the Notebook by clicking on another tab, Hue tries to close the Spark sessions of the Notebook.

      If not you can click on the little cogs icon in the menu, and click close. Or if using YARN, kill the application in the Job Browser.

  3. Jimmy Song 2 years ago

    Thank you for your reply.
    It doesn’t work when I close the notebook, the job is still working. The only method to kill the job through the job browser.
    When I run Hive on Spark snippets on the notebook:
    set hive.execution.engine=spark;
    select * from web_logs where web_logs.city=”Shanghai”;
    I used to get the following error:
    Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
    What’s the matter with hue?

  4. Václav Surovec 2 years ago

    Hi to all!

    Do you know why I am keep getting this error below when running SparkWordCount jar app in HUE Spark Notebook? My HUE version is 3.9.0 (CDH 5.5.1). Do you think you can please help me?

    Path: /user/hdfs/SparkWordCount-1.0-SNAPSHOT.jar
    Class: cz.tmobile.surovecv.sparkwordcount.JavaWordCount

    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/pkg/moip/data2_mzlp/mzpl/work/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/avro-tools-1.7.6-cdh5.5.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/pkg/moip/data2_mzlp/mzpl/work/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    Warning: Local jar /user/hdfs/SparkWordCount-1.0-SNAPSHOT.jar does not exist, skipping.
    java.lang.ClassNotFoundException: cz.tmobile.surovecv.sparkwordcount.JavaWordCount
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:162)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:160)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:160)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

    • Hue Team 2 years ago

      Are you running Livy in the default local mode? (not YARN)
      What did you put in the jar input field?

      • Václav Surovec 1 year ago

        Hi!
        I am running in local mode and I did not put anything in the jar input field…

        Thank you for any help!

  5. poling 1 year ago

    Hi :
    I want to ask that can I run spark and hue on different nodes.
    and in hue node run local livy-server to connect remote spark ?

    thanks

  6. Riya 1 year ago

    Hi ,
    I have installed Hue 3.9. I am getting below error when I try to use R shell from Spark Notebook :

    t File “/usr/local/hue/apps/spark/src/spark/decorators.py”, line 77, in decorator
    return func(*args, **kwargs)
    File “/usr/local/hue/apps/spark/src/spark/api.py”, line 53, in create_session
    response[‘session’] = get_api(request.user, session).create_session(lang=session[‘type’], properties=properties)
    File “/usr/local/hue/apps/spark/src/spark/models.py”, line 320, in create_session
    raise QueryError(‘\n’.join(status[‘log’]))
    QueryError: at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:152)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.io.IOException: error=13, Permission denied
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.(UNIXProcess.java:248)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    … 37 more

    please help me here.

    Thanks,

  7. Ama 1 year ago

    How to import a package(ex. spark-csv) into pyspark shell?

    • Mike 1 year ago

      Hi Ama,
      did you already manage to import the packages?
      I’m having exactly the same problem right now..
      I don’t know how to get the extra packages running in the pySpark notebook..

      Best,
      Mike

  8. lee 1 year ago

    Hi . I want to play spark in notebook. when i saw video, Spark(Beta) menu exist.
    but when I run Livy with yarn session and check notebook menu, There is not Spark menu.

    It is not supported Or can’t I find?? please help !!!

    • Hue Team 1 year ago

      If the Spark app is not visible in the ‘Editor’ menu, you will need to unblacklist it from the hue.ini:

      [desktop]
      app_blacklist=

      Note: To override a value in Cloudera Manager, you need to enter verbatim each mini section from below into the Hue Safety Valve: Hue Service → Configuration → Service-Wide → Advanced → Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini

  9. GiaBar 11 months ago

    I’ve tried to override the value in CM as you described but it doesn’t work yet: Spark app is not visible.
    Do you have any idea?

    • Author
      Hue Team 11 months ago

      Spark will show up as a Notebook with a Spark or PySpark snippet now. Do you also have?

      [notebook]
      show_notebooks=true

  10. Sarkar Sudakshina 3 months ago

    I could able to configure notebook and can run scala programs. Now I want to use some classes from external jars, and when I do import package, it couldn’t find the corresponding jars.
    I am not sure how to add jar in classpath from hue notebook hue web UI. I tried to find the solution on blog and hue documentation there is nothing mentioned about it

    Hue team, I need you suggestion and guidance how to achieve it

  11. Sarkar Sudakshina 3 months ago

    I have tried to set the ‘spark.driver.extraClassPath’ and ‘spark.executor.extraClassPath’ in spark-defaults to point to some random directory on hdfs
    like , i used hdfs:///tmp/spark/* for both the configurations in spark-defaults.xml
    Then I added the corresponding jars to that directory on Hdfs for example, I have kept aaa.jar in hdfs:///tmp/spark/aaa.jar

    then I wrote on HUE notebook(scala interpreter)
    from aaa import org.barboliled._

    and execute it . It throws an error

Leave a reply

Your email address will not be published. Required fields are marked *

*