Beta of new Notebook Application for Spark & SQL

Beta of new Notebook Application for Spark & SQL

Hue 3.8 brings a new way to directly submit Spark jobs from a Web UI.

Last year we released Spark Igniter to enable developers to submit Spark jobs through a Web Interface. While this approach worked, the UX left a lot to be desired. Programs had to implement an interface, be compiled beforehand and YARN support was lacking. We also wanted to add support for Python and Scala, focusing on delivering an interactive and iterative programming experience similar to using a REPL.

notebook-1

 

This is for this that we started developing a new Spark REST Job Server that could provide these missing functionalities. On top of it, we revamped the UI for providing a Python Notebook-like feeling.

Note that this new application is pretty new and labeled as ‘Beta’. This means we recommend trying it out and contributing, but its usage is not officially supported yet as the UX is going to evolve a lot!

This post describes the Web Application part. We are using Spark 1.3 and Hue master branch.

 

 

Based on a new:

Supports:

  • Scala
  • Python
  • Java
  • SQL
  • YARN

 

If the Spark app is not visible in the ‘Editor’ menu, you will need to unblacklist it from the hue.ini:

[desktop]
app_blacklist=

Note: To override a value in Cloudera Manager, you need to enter verbatim each mini section from below into the Hue Safety Valve: Hue Service → Configuration → Service-Wide → Advanced → Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini

 

On the same machine as Hue, go in the Hue home:

If using the package installed:

cd /usr/lib/hue

 

Recommended

Use Livy Spark Job Server from the Hue master repository instead of CDH (it is currently much more advanced): see build & start the latest Livy

 

If not, use Cloudera Manager:

cd /opt/cloudera/parcels/CDH/lib/
HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/-hue-HUE_SERVER-#
echo $HUE_CONF_DIR
export HUE_CONF_DIR
Where # is substituted by the last number, e.g. hue-HUE_SERVER-65

Then cd to hue directory And start the Spark Job Server from the Hue home:

./build/env/bin/hue livy_server

 

You can customize the setup by modifying these properties in the hue.ini:

[spark]
# URL of the REST Spark Job Server.
server_url=http://localhost:8090/

# List of available types of snippets
languages='[{"name": "Scala", "type": "scala"},{"name": "Python", "type": "python"},{"name": "Impala SQL", "type": "impala"},{"name": "Hive SQL", "type": "hive"},{"name": "Text", "type": "text"}]'

# Uncomment to use the YARN mode
## livy_server_session_kind=yarn

 

Next

This Beta version brings a good set of features, a lot more is on the way. In the long term, we expect all the query editors (e.g. Pig, DBquery, Phoenix…) to use this common interface. Later, individual snippets could be drag & dropped for making visual dashboards, notebooks could be embedded like in Dropbox or Google docs.

We are also interested in getting feedback on the new  Spark REST Job Server and see what the community thinks about it (contributions are welcomed ;).

As usual feel free to comment on the hue-user list or @gethue!

115 Comments

  1. Andrew Mo 2 years ago

    #nice

  2. iain wright 2 years ago

    I have been eagerly waiting for this release! Very excited for the improved yarn support and new spark job server. going to roll this tonight!

    • Hue Team 2 years ago

      🙂

      And we have some upcoming updates and blog post series 😉

  3. Louis 2 years ago

    I’ve just made the jump from HDP 2.2 and this is a huge improvement. Are there plans to incorporate Spark SQL into the notebook?

    • Hue Team 2 years ago

      If you boot the Spark SQL Server, the Hive or Impala configs can point to it and it will work. It probably become bit cleaner at some point.

  4. Minh Ha 2 years ago

    Hi, I had installed HUE 3.8 on CDH 5.4

    PySpark work well but I cannot use HiveContext on Spark.

    from pyspark.sql import HiveContext
    sqlCtx = HiveContext(sc)
    results = sqlCtx.sql(“SELECT * FROM sample_07”).collect()
    results

    I put trackback logs here: http://pastebin.com/V5CxHen4

  5. dale 2 years ago

    cool! I am using HDP, do you know where the respective files are stored to export the variable?

    • Hue Team 2 years ago

      We don’t know 🙂 If it is Spark 1.3, using Hue from the github master branch should work though!

  6. Vladimir Metodiev 2 years ago

    Hi, I have tried to download the example Spark notebooks. I have clicked download but nothing has shown up in my saved notebooks.

    • Hue Team 2 years ago

      You mean install from Step 2: Examples tab? Are you running the latest Hue? Did you have any error while installing the samples?

  7. Vladimir Metodiev 2 years ago

    HI,
    Version is 3.8.1, and there were no errors on the Install.

    • Hue Team 2 years ago

      There is no ‘Download’, do you mean downloading the result of a snippet?

  8. Vladimir Metodiev 2 years ago
    • Hue Team 2 years ago

      Ha, so just after installing, do you see on the /logs page that?
      {code}
      [22/Jun/2015 08:29:21 -0700] models ERROR error sharing sample user documents
      Traceback (most recent call last):
      File “/home/romain/projects/hue/desktop/core/src/desktop/models.py”, line 364, in sync
      tag = DocumentTag.objects.get_example_tag(user=job.owner)
      UnboundLocalError: local variable ‘job’ referenced before assignment
      {code}

      If you logged in as a hue admin, do you see it?

  9. Yixiao Lin 2 years ago

    Hi, I had installed HUE 3.8 on CDH 5.4.2, but I can’t find the spark notebook examples in the example install page. Is it part of 3.8? How can I get it if it is not part of 3.8?

    • Hue Team 2 years ago

      Did you un-blacklist it cf. above? It is in C5.4 but hidden as the Spark Server is not supported by CM yet.

      FYI the Spark Notebook is much better on master and evolving quickly there.

      • Yixiao Lin 2 years ago

        I switched to master branch now but the spark note book example is not showing up.
        I did un-blacklist it. I can open the spark app. I can see all the other examples in the step 2: Hive/Impala/pig.
        I got the examples by import from json, the sample is at apps/spark/src/spark/fixtures/initial_spark_examples.json

        • Hue Team 2 years ago

          I just tried from master with a fresh DB and it worked. What do you see on the /logs page after you clicked on install the Spark example?

  10. Wyatt 2 years ago

    I am trying to use the cassandra spark connector with your notebook, the way I am doing it is passing all of the jars in the spark-submit command that livy_server calls. However when I try to import com.datastax.spark.connector._ i get `error object datastax is not a member of package com import com.datastax.spark.connector._^.
    I can get this to work on the spark-shell and by submitting spark to the master. I have no issue running file based rdds from the notebook.

    • Hue Team 2 years ago

      We do not show up yet the command parameters to add jars (and change number of executor or memory…) to the spark-shell or pyspark or spark-submit: https://issues.cloudera.org/browse/HUE-2727

      This is almost there though! (next week)

      • Riya 2 years ago

        I am just curious to know whether this issue is fixed in version 3.9 ?
        I tried to add an external jar using
        spark_config = (SparkConf().set(“spark.jars”,jar_path))
        sc = SparkContext(conf=spark_config)

        But I got java.lang.ClassNotFoundException: when I was trying to use the jar which I set using “spark.jars”

        Thanks,

  11. Cleo 2 years ago

    I’m using CDH5.4, I installed Spark on Hue , but when I want to execute a scala or python program there are a problems: fisrstly it takes too much time to create a sesion, and when it is created, even if I launch the command line, nothing happens, and I have errors in the livy_server.also this error appears in the web UI of spark : Java.lang.illegalStateException : Session is in State starting()(error 500)
    Other question is about lanching a spark program written in scala( I built a jar and stocked the input file into hdfs) , How can I execute it ?

  12. Bala 2 years ago

    Hi,
    I am using CDH 5.4.2 with Hue 3.7 and Spark installed. I run the livy server under hue . when I run any code under Scala in notebook, i get this error.

    ERROR com.cloudera.hue.livy.server.WebApp$ – internal error
    java.lang.IllegalStateException: Session is in state Starting()

    I have edited hue.ini to include
    livy_server_session_kind=yarn

    as per this https://groups.google.com/a/cloudera.org/forum/#!topic/hue-user/1Fa0qDYDaxU

    But I still get the same error. I have tested my spark installation works through the interactive shell. Appreciate your help on this! thank you!

    • Hue Team 2 years ago

      Did it work first in the original ‘livy_server_session_kind=process’ mode?
      Are you using the Hue from CDH or the latest from github? (we recommend to use the one from github, it was improved a lot since)

      • Bala 2 years ago

        Hey hue team, thanks for your response. hue.ini did not have livy_server_session_kind earlier at all. I did not see a spark section in the hue.ini file.
        It did not work even then. I added them. And yes, I am using the hue that came with CDH 5.4.2. It will be great if I can make it work with what I have right now in CDH 5.4.2 🙁 . Otherwise, I have to talk to many other people to agree upon upgrading hue to the latest from github.

        • Hue Team 2 years ago

          It is not supported in CDH yet, the recommended way is to use it from github and send feedback that way we can improve it!

  13. dale 2 years ago

    My Notebook application was working fine and then it suddenly stopped working.

    I killed the livy_server process and started it again using ./hue livy_server. When I go to open a Python notebook in the WebUI I see the following log (console output for livy_server):

    https://www.dropbox.com/s/pxje6amo2wfltwh/livy_server_log.txt?dl=0

    Are there other processes that need to be killed?
    Thanks.

    • Hue Team 2 years ago

      It looks like the process is still running, try to do a ‘ps -ef |grep livy’ and kill the Livy server process if it’s still there

      • dale 2 years ago

        Ok sure, so there are a ton of processes running still when I do ‘ps -ef | grep livy’ : https://www.dropbox.com/s/s00cuy4popko1ob/grep_livy?dl=0

        Would all of these need to be killed? How come it is creating so many processes?

        Thanks.

        • Hue Team 2 years ago

          Probably because you tried to start it many times and the processes got stuck?

          • dale 2 years ago

            Yep, killed them all and it worked again.

          • Hue Team 2 years ago

            For your information each of those is when you create a Python or Scala shell. We don’t close them yet when you close the workflow, cf. https://github.com/cloudera/hue/issues/198

            Restarting the Livy Server itself will stopp them all too.

          • dale 2 years ago

            Is there a command to restart the livy_server directly? Rather than killing all the processes directly and then issuing ‘./hue livy_server’ is there a command LIKE ‘./hue livy_server restart’ ?

          • Hue Team 2 years ago

            There is no restart command (as more a packaging thing), but we kill the livy process manually or with CTRL+C will stop everything (and could be quickly automated in bash)

  14. Cleo 2 years ago

    Hi Hue team,
    Considering that Spark notebook isn’t supported by CDH5.4, how I can use it from github in my CDH ?
    thanks 🙂

  15. Cleo 2 years ago

    Hi,
    I have a centos 6.6 , it’s the same guide as ubuntu ?

  16. Cleo 2 years ago

    Hi,
    I have installed Hue 3.8 from a guide in the comment above.
    when I attempt to choose a language for a snippet ( I m interested in Scala ), there is an error :
    No usable value for lang Did not find value which can be converted into java.lang.String (error 400)
    Any suggestions please 🙂

  17. Cleo 2 years ago

    Hi Hue users,
    I still get the same error when executing scala or python programs in notebooks :/
    The same problem was already posted here http://stackoverflow.com/questions/30628143/spark-notebook-not-working-on-hue-for-emr but without any answer .
    Appreciate your help on this! thank you!

  18. Chliao 2 years ago

    Now I’m try Beta of new Notebook Application for Spark & SQL
    When I try to install Livy
    I can’t find $HUE_HOME/apps/spark/java this path
    my path like this
    [[email protected] spark]# pwd
    /opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/lib/hue/apps/spark
    [[email protected] spark]# ll
    total 32
    -rw-r–r– 1 root root 62 May 8 14:31 babel.cfg
    drwxr-xr-x 2 root root 4096 May 8 14:31 ext-eggs
    -rw-r–r– 1 root root 932 May 8 14:31 hueversion.py
    drwxr-xr-x 2 root root 4096 May 8 14:39 java-lib
    -rwxr-xr-x 1 root root 2680 May 8 14:31 livy-client.py
    -rw-r–r– 1 root root 1745 May 8 14:31 Makefile
    -rw-r–r– 1 root root 1224 May 8 14:31 setup.py
    drwxr-xr-x 4 root root 4096 May 8 14:31 src
    [[email protected] spark]#

    Which software need to install ?
    thank you
    Env.=CDH5.4.1(include Spark 1.3)+apache-maven-3.3.3 + java-1.7.0-openjdk

    • Hue Team 2 years ago

      Does the ‘$HUE_HOME/build/env/bin/hue livy_server’ work for you?

      FYI this is also not supported in CDH yet

  19. Chliao 2 years ago

    Results of the ‘$HUE_HOME/build/env/bin/hue livy_server’ like below
    [[email protected] java]# cd /opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/lib/hue
    [[email protected] hue]# ./build/env/bin/hue livy_server
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/jars/livy-assembly-3.7.0-cdh5.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    01:53:26.139 [main] INFO com.cloudera.hue.jetty.server.Server – jetty-8.y.z-SNAPSHOT
    01:53:26.185 [main] INFO o.scalatra.servlet.ScalatraListener – Initializing life cycle class: ScalatraBootstrap
    01:53:26.186 [main] INFO c.c.h.livy.server.ScalatraBootstrap – Using process sessions
    01:53:26,297 |-INFO in null – Will use configuration resource [/logback-access.xml]
    01:53:26,298 |-INFO in [email protected] – URL [jar:file:/opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/jars/livy-assembly-3.7.0-cdh5.4.1.jar!/logback-access.xml] is not of type file
    01:53:26,304 |-INFO in ch.qos.logback.access.joran.action.ConfigurationAction – debug attribute not set
    01:53:26,304 |-INFO in ch.qos.logback.core.joran.action.StatusListenerAction – Added status listener of type [ch.qos.logback.core.status.OnConsoleStatusListener]
    01:53:26,305 |-INFO in ch.qos.logback.core.joran.action.AppenderAction – About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
    01:53:26,305 |-INFO in ch.qos.logback.core.joran.action.AppenderAction – Naming appender as [STDOUT]
    01:53:26,305 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA – Assuming default type [ch.qos.logback.access.PatternLayoutEncoder] for [encoder] property
    01:53:26,319 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction – Attaching appender named [STDOUT] to null
    01:53:26,319 |-INFO in ch.qos.logback.access.joran.action.ConfigurationAction – End of configuration.
    01:53:26,319 |-INFO in [email protected] – Registering current configuration as safe fallback point
    01:53:26,297 |-INFO in LogbackRequestLog – Will use configuration resource [/logback-access.xml]
    01:53:26,298 |-INFO in [email protected] – URL [jar:file:/opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/jars/livy-assembly-3.7.0-cdh5.4.1.jar!/logback-access.xml] is not of type file
    01:53:26,304 |-INFO in ch.qos.logback.access.joran.action.ConfigurationAction – debug attribute not set
    01:53:26,304 |-INFO in ch.qos.logback.core.joran.action.StatusListenerAction – Added status listener of type [ch.qos.logback.core.status.OnConsoleStatusListener]
    01:53:26,305 |-INFO in ch.qos.logback.core.joran.action.AppenderAction – About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
    01:53:26,305 |-INFO in ch.qos.logback.core.joran.action.AppenderAction – Naming appender as [STDOUT]
    01:53:26,305 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA – Assuming default type [ch.qos.logback.access.PatternLayoutEncoder] for [encoder] property
    01:53:26,319 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction – Attaching appender named [STDOUT] to null
    01:53:26,319 |-INFO in ch.qos.logback.access.joran.action.ConfigurationAction – End of configuration.
    01:53:26,319 |-INFO in [email protected] – Registering current configuration as safe fallback point

    01:53:26.342 [main] INFO c.c.h.jetty.server.AbstractConnector – Started [email protected]:8998
    01:53:26.344 [main] INFO com.cloudera.hue.livy.WebServer – Starting server on 8998
    01:54:10.470 [ForkJoinPool-1-worker-1] INFO c.c.hue.livy.server.SessionManager – created session 6c92d431-3a43-45ff-901b-8ed3946bd8f6
    127.0.0.1 – – – 06/Aug/2015:01:54:10 -0700 “POST /sessions HTTP/1.1” 201 64
    127.0.0.1 – – – 06/Aug/2015:01:54:10 -0700 “GET /sessions/6c92d431-3a43-45ff-901b-8ed3946bd8f6 HTTP/1.1” 200 64
    127.0.0.1 – – – 06/Aug/2015:01:54:10 -0700 “GET /sessions/6c92d431-3a43-45ff-901b-8ed3946bd8f6 HTTP/1.1” 200 64
    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
    at org.apache.spark.deploy.SparkSubmitArguments.parse$1(SparkSubmitArguments.scala:432)
    at org.apache.spark.deploy.SparkSubmitArguments.parseOpts(SparkSubmitArguments.scala:288)
    at org.apache.spark.deploy.SparkSubmitArguments.(SparkSubmitArguments.scala:87)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:105)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    … 5 more
    127.0.0.1 – – – 06/Aug/2015:01:54:11 -0700 “GET /sessions/6c92d431-3a43-45ff-901b-8ed3946bd8f6 HTTP/1.1” 200 64
    127.0.0.1 – – – 06/Aug/2015:01:54:12 -0700 “GET /sessions/6c92d431-3a43-45ff-901b-8ed3946bd8f6 HTTP/1.1” 200 64
    127.0.0.1 – – – 06/Aug/2015:01:54:13 -0700 “GET /sessions/6c92d431-3a43-45ff-901b-8ed3946bd8f6 HTTP/1.1” 200 64

    • Hue Team 2 years ago

      If you do ‘spark-shell’ or ‘pyspark’ on the machine, does it work?

    • Tim 2 years ago

      Hi there,
      I have exactly the same issue, have you been able to sort it out? If yes, what were the steps ? Rest of the apps works well apart from the Spark … We using Hadoop 2.2.6 mini cluster with Ambari 2.1 and Hue 3.8.1, Ubuntu 12.04 … Any ideas how to fix that ? Cheers

      • Hue Team 2 years ago

        Which Spark version are you using? Does the regular Spark shell in the command line work?

  20. Dale 2 years ago

    I am using the PySpark notebook application within 3.8 on HDP.

    When I run the notebook my logs are saying that it is being executed as user “root”. Do you know why this happens as opposed to the user that I am logged into Hue as: “User1”?

    This is an issue because I wish to restrict certain users accessing certain files/tables.

  21. Dale 2 years ago

    As a separate issue, I am persistently seeing the following error in the livy_server log:

    15:48:43.265 [qtp281893917-14] ERROR com.cloudera.hue.livy.server.WebApp$ – internal error
    java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: /**.**.*.***:46184 to http://**.**.*.***:46184/execute.
    Do you know what this means? Thanks.

    • Hue Team 2 years ago

      Dale: That error suggests that the interactive session worker has died. I’ve recently improved error handling in this case, so if you update to the latest git version of Hue and livy you should hopefully see a better error message when that occurs.

  22. david 2 years ago

    Are there any logs for the Spark livy_server ? I would like to see an output of what is shown when I run “./hue livy_server”.
    Cheers

  23. zhouhao 2 years ago

    hello hue Team. I have a problem when I click the + in my brower to create a scala text.
    the logs of the terminal where I started the livy_server is:
    15/09/15 02:09:41 INFO ServerConnector: Started [email protected]{HTTP/1.1}{0.0.0.0:43452}
    15/09/15 02:09:41 INFO Server: Started @3041ms
    15/09/15 02:09:41 INFO WebServer: Starting server on 43452
    Starting livy-repl on http://10.1.7.110:43452
    15/09/15 02:09:41 INFO HttpServer: Starting HTTP Server
    15/09/15 02:09:42 INFO Utils: Successfully started service ‘HTTP class server’ on port 41096.
    15/09/15 02:09:47 INFO SparkContext: Running Spark version 1.4.1
    15/09/15 02:09:47 ERROR SparkContext: Error initializing SparkContext.
    java.lang.IllegalArgumentException: For input string: “$SPARK_HOME/logs”
    at scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:238)
    at scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:226)
    at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:31)
    at org.apache.spark.SparkConf$$anonfun$getBoolean$2.apply(SparkConf.scala:309)
    at org.apache.spark.SparkConf$$anonfun$getBoolean$2.apply(SparkConf.scala:309)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.SparkConf.getBoolean(SparkConf.scala:309)
    at org.apache.spark.SparkContext.(SparkContext.scala:382)
    at com.cloudera.hue.livy.repl.scala.SparkInterpreter.start(SparkInterpreter.scala:69)
    at com.cloudera.hue.livy.repl.Session$$anonfun$2.apply$mcV$sp(Session.scala:57)
    at com.cloudera.hue.livy.repl.Session$$anonfun$2.apply(Session.scala:55)
    at com.cloudera.hue.livy.repl.Session$$anonfun$2.apply(Session.scala:55)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    15/09/15 02:09:47 INFO SparkContext: Successfully stopped SparkContext

    thanks ~

    • Hue Team 2 years ago

      Hi, which version of Hue are you running? Compiled yourself from master / a release? Thanks!

  24. zhouhao 2 years ago

    Hi ,HueTeam!Thanks for your reply.I clone from the master branch of GitHub few days ago.The version is 3.9.0.And I started on the one machine of the spark cluster but not the master one.thanks~expect your resume

    • zhouhao 2 years ago

      By the way.I didn’t clone the livy_server from the github. I just clone hue from the master branch and build it. I started the livy_server with the command: ./hue livy_server.
      thanks~

      • Hue Team 2 years ago

        Is you Spark shell working normally? ($SPARK_HOME/logs might be related to Spark itself)

  25. Serena 2 years ago

    I am using CDH 5.4.2 with HUE 3.9. After i run ./build/env/bin/hue livy_server, the error message is :

    Exception thrown when initializing server: java.lang.IllegalArgumentException: requirement failed: pyspark.zip not found in Spark environment; cannot run pyspark application in YARN mode

    I do have problem to run spark job in Yarn mode, but i can run local model.
    I set the hue.ini file to livy_server_session_kind=process , run build/env/bin/supervisor again, it still give me the same error. It didn’t pick to run as local mode.

    btw, I am use spark 1.3.

    • Serena 2 years ago

      Is there possible the Hue 3.9 has to use Spark 1.5?

      • Hue Team 2 years ago

        Not really, Spark 1.5 will work with current Hue master, upcoming Hue 3.10 / CDH5.7.
        Spark and the Notebook app are moving to fast to be backward compatible currently.

        • Serena 2 years ago

          I installed the new spark 1.5.1 and run Hue 3.9 with 1.5.1 spark, and it works.

  26. Nilesh Saratkar 2 years ago

    Dies Hue Notebook supports R Kernel.

    • Hue Team 2 years ago

      Yes, in local mode

  27. Sarthak Saxena 2 years ago

    Hi,
    When I build the livy server using Maven I get the following issue:
    mvn -DskipTests clean package

    [ERROR] Plugin net.alchim31.maven:scala-maven-plugin:3.2.2 or one of its dependencies could not be resolved: Failed to read artifact descriptor for net.alchim31.maven:scala-maven-plugin:jar:3.2.2: Could not transfer artifact net.alchim31.maven:scala-maven-plugin:pom:3.2.2 from/to central (https://repo.maven.apache.org/maven2): java.security.ProviderException: java.security.KeyException -> [Help 1]

    Kindly if you could help.

    • Obaid 2 years ago

      Hi Sartak Saxena,

      I faced same issue, resolve just by installing nss:
      yum install nss

  28. Sarthak Saxena 2 years ago

    Hi,

    When I start a new notebook and run it I get Connection refused error messages.

    Could someone please help.

  29. ema 2 years ago

    hello,

    i’ve install cdh5.5.1 on a small test cluster with three centos7 nodes.

    i added the following services because i’m not sure which are necessary for the spark-notebook in hue…[HDFS,HBase,Hive,Impala,Oozie,Spark,YARN(MR2),ZooKeeper].
    i cloned the master hue from git and ran “make apps”.
    it compiled but not without errors.
    i have managed to get the configuration errors @hue-node:8888/about down to just the one, “SQLITE_NOT_FOR_PRODUCTION_USE “.
    when i open my test scala notebook which has just “1 + 1 + 1” in it, i get “Gateway timeout (error 504)” & when i try to run it, i get errors referring to SLFJ4 “Class path contains multiple SLF4J bindings. SLF4J” and “user root is not allowed to impersonate fred”, please

    FIX MY CLUSTER!..no, just kidding, any help is appreciated, really!

    • Hue Team 2 years ago

      SQLITE_NOT_FOR_PRODUCTION_USE is just a warning where Hue prefers to be used with a MySql or PostGres database.

      About impersonation, you need something like http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_cdh_hue_configure.html?scroll=topic_15_4#topic_15_4_1_unique_1

      <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
      </property>
      <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
      </property>
      
      • ema 2 years ago

        hi there HueTeam,

        i would like to stop all services that are not necessary to run the spark notebook, which is fails due to memory issues. could you tell me which ones should be kept up and running & which i can shutdown?

        thanks for the prompt reply!

        • Hue Team 2 years ago

          This will depend on what mode you’re running Spark in (local or YARN) and whether you need to access Hive data and/or data in HDFS. Assuming that you’re running Spark in YARN mode, and don’t need Sentry security enabled, you can at least blacklist: hbase, zookeeper, security, impala, oozie, pig, rdbms, and sqoop.

          • ema 2 years ago

            perfect, thank you

  30. ema 2 years ago

    i’m having an issue using the spark notebook as follows,

    when i connect from hue, livy says the following

    changing view acls to root, fred
    changing modify acls to root, fred
    …(where fred is my hue login) then comes;
    ERROR SparkContext: Error initializing SparkContext.

    this is similar to what i get when using the spark-shell as follows;
    spark-shell
    but goes away when i do it as follows;
    su hue -s /bin/bash -c spark-shell

    i have check both the configuration in cloudera manager and the /etc/hue/conf/hue.ini file & they both have hue as all users except for hdfs

    any clue?

  31. prashant 2 years ago

    >> Did it work first in the original ‘livy_server_session_kind=process’ mode?

    I’m trying to run statements as part of a yarn session. Session starts and statements are not accepted as the server says session is Starting. And it soon finishes to successful completion. I checked the session timeout is set to an hour, and I used session kind to be yarn. However using session kind to be process works absolutely fine. Anything I might be missing?

  32. Obaid 2 years ago

    Hi,
    I am running livy + hue Notebook for Spark as per shown in this document(CHH 5.5.2)

    My problem is when I try pyspark sample (just 1+1) run it fails with below error:

    java.net.ConnectException: Call From dvhdmgt1.example.com/192.168.56.201 to dvhdnn1.example.com:8032 failed on connection exception: java.net.ConnectException: Connection refused

    Note: My yarn is running a HA and dvhdnn1.example.com is my standby resource manager.
    I got above error when I put “livy_server_session_kind=process”. When i use “livy_server_session_kind=yarn” then I get “gateway timeout”, I feel it is similar; always tries to use standby resource manager instead of active.

    Anybody has solution on this issue ?

    Thanks

  33. Chris Horvath 2 years ago

    Is there potential to leverage spark thriftserver? similar to HiveSQL or ImpalaSQL

    I don’t mean the “Hive on Spark” model, but accessing a seperate JDBC thriftserver

  34. Riya 2 years ago

    Hi,

    I am running Hue 3.9 . On the first screen of Hue ,where it checks configuration everything seems to be fine “All OK. Configuration check passed”. But when I select R shell or Pyspark from Notebook options I am getting below error ,which prevents me to try out anything in Spark Notebook of Hue.

    at java.lang.reflect.Method.invoke(Method.java:497) at
    org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665) at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:154) at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:152) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) … 4 more 16/03/07 05:48:48 INFO util.Utils: Shutdown hook called 16/03/07 05:48:48 INFO util.Utils: Deleting directory /tmp/spark-5f1abf76-3f05-4b22-91d3-021872ac133d

    Please help me with this.
    Thanks,

  35. Riya 2 years ago

    Thanks Hue tam. I am getting error when I try curl statements. Does it mean Livy server is not started properly .
    How to fix this.

    curl localhost:8998/sessions/0 | python -m json.tool

    % Total % Received % Xferd Average Speed Time Time Time Current
    Dload Upload Total Spent Left Speed
    112 784 0 784 0 0 105k 0 –:–:– –:–:– –:–:– 127k
    {
    “id”: 0,
    “kind”: “pyspark”,
    “log”: [
    “\tat java.lang.reflect.Method.invoke(Method.java:497)”,
    “\tat org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)”,
    “\tat org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:154)”,
    “\tat org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:152)”,
    “\tat java.security.AccessController.doPrivileged(Native Method)”,
    “\tat javax.security.auth.Subject.doAs(Subject.java:422)”,
    “\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)”,
    “\t… 4 more”,
    “16/03/09 01:39:39 INFO util.Utils: Shutdown hook called”,
    “16/03/09 01:39:39 INFO util.Utils: Deleting directory /tmp/spark-fd297776-73be-479a-a7c8-72bde849f9f6”
    ],
    “proxyUser”: “hue”,
    “state”: “error”
    }

  36. Markovich 2 years ago

    Hi Hue Team,

    I’ve some questions about spark notebook:
    1) How can I change Python version used by pyspark?
    I’ve installed Anaconda for CDH and now I’d like to change pyspark to new Anacondas Python.
    2) How can I enable code highlite and autocomplete in Spark notebook? (Now only local autocomplete works).

    The option is great. When is it going to be fully integrated in CDH?

    Regards,
    Markovich

    • Hue Team 2 years ago

      Hi Markovich, regarding 1) there should be some options for Pyspark to pickup things from your env, ie. PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON.
      We didn’t quite get 2): what would you like to have the autocomplete on? Can you make an example? Thanks!

  37. Simon 2 years ago

    Dear Hue Team,

    when trying to run a Scala command, I get:

    The Spark session could not be created in the cluster: WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/spark) overrides detected (/opt/cloudera/parcels/CDH/lib/spark). WARNING: Running spark-class from user-defined location. 16/03/20 12:19:25 INFO RMProxy: Connecting to ResourceManager at cluster1/192.168.24.201:8032 16/03/20 12:19:25 WARN UserGroupInformation: PriviledgedActionException as:admin (auth:PROXY) via root (auth:SIMPLE) cause:org.apache.hadoop.security.authorize.AuthorizationException: User: root is not allowed to impersonate admin ERROR: org.apache.hadoop.security.authorize.AuthorizationException: User: root is not allowed to impersonate admin

    I am running the livy-server with the follwing command

    env LIVY_SERVER_JAVA_OPTS=”-Dlivy.server.session.factory=yarn” CLASSPATH=`hadoop classpath` $LIVY_HOME/bin/livy-server
    Do you have any ideas how to solve this issue

    It would be nice, if you could help me with that issue.

  38. Krish 1 year ago

    How to import a Juypter Notebook/ *.IPNB file?

    • Hue Team 1 year ago

      Drag & Dropping was implemented some time ago and should still work

      • Krish 1 year ago

        I am using Hue 3.9, the import option allows only json file and I am not seeing any drag and drop option

        • Hue Team 1 year ago

          You would need to open a Notebook and drag & drop the ipython file inside?

  39. Ashwin 1 year ago

    I’m using CDH 5.2, i’ve add “[desktop] app_blacklist=” to the Service and Wide Configuration tab of Hue on Cloudera Manager but Spark still not show up on Hue.

    I also try to edit the hue.ini in /etc/hue/conf but nothing happen to, the desktop configuration tab on Hue still show this:
    app_blacklist

    • Hue Team 1 year ago

      Could you add it to the section with the name ‘Safety Valve for hue.ini’ instead?

      • Ashwin 1 year ago

        I’m using CDH5.4, I installed Spark on Hue , but when I want to execute a scala or python program there are a problems: firstly it takes too much time to create a session, and when it is created, even if I launch a simple program println ( 1 + 1 ), nothing happens, and I have errors in the livy_server.also this error appears in the web UI of spark

        Java.lang.illegalStateException : Session is in State starting()(error 500)
        Any ideas 🙂

        • Hue Team 1 year ago

          It means the Spark on Yarn session is taking to long to boot. Did you try to create a shell just with the API? If it works, you could then bump the timeout in Hue.

  40. Miles Y. 1 year ago

    Hi, can you update your document for the current Hue/CDH, and clarify that you are required to download/install Livy server separately? An example install would be even more helpful.

    Thanks!

  41. Miles Y. 1 year ago

    Cloudera Manager’s Hue configuration section has “Blacklist” already defined as a Service-Wide property, with empty default value. However, Hue is still blacklisting Spark and Impala (as defined during installation). Are we to ignore it and just follow your instruction above to manually add to hue_safety_valve_server.ini?

    • Author
      Hue Team 1 year ago

      If using CM, you would need to unblacklist by adding something like this in the Hue safety valve:
      [desktop]
      app_blacklist=

      • Miles Y. 1 year ago

        Still not showing up in menu. Below is my config:

        root:/var/run/cloudera-scm-agent/process/1151-hue-HUE_SERVER$ grep blacklist hue.ini
        app_blacklist=spark,zookeeper,impala,search,indexer,sqoop,security

        root:/var/run/cloudera-scm-agent/process/1151-hue-HUE_SERVER$ cat hue_safety_valve.ini
        [desktop]
        app_blacklist=impala,search,indexer

        [notebook]
        show_notebooks=true

        [spark]
        server_url=http://{HOST}:8998/
        livy_server_host={HOST}
        livy_server_port=8998
        languages='[{“name”: “Scala”, “type”: “scala”},{“name”: “Python”, “type”: “python”},{“name”: “Hive SQL”, “type”: “hive”},{“name”: “Text”, “type”: “text”}]’
        livy_server_session_kind=yarn

        The CM config override and processing itself seem to work – fiddling with app_blacklist was able to get Impala and Search to appear/disappear from Hue top menu. And Hue Quick Start wizard complains that Livy server is not reachable, if livy_server_host and livy_server_port properties are not included.
        I manually verified that Livy server (the new livy-server-0.2.0) is serving requests. The various Hue log files show no error related to Spark notebook (“POST /notebook/api/autocomplete/ H
        TTP/1.1”, “POST /notebook/api/create_session HTTP/1.1”) – only some non-critical errors from HBase and Oozie.

        I also have a hue_safety_valve_server.ini containing Hue DB and librdbms setup.

        So far staying away from directly changing the master copy at /opt/cloudera/parcels/CDH/etc/hue/conf.empty/hue.ini – hope I don’t have to go that far.

        Thanks!

        • Author
          Hue Team 1 year ago

          Weird, I literally just put
          [notebook]
          show_notebooks=true
          in the Hue service safety valve, restarted Hue and I could see the Notebook menu.

  42. Reetika 10 months ago

    I am not able to execute any script ( ex: python,R,pyspark etc) through spark Notebook option.Though hive queries through spark Notebook does work.Any idea what can be the issue?
    Using,
    Hue version — 3.9.0
    Spark version — 1.6.1

    • Author
      Hue Team 10 months ago

      Do you have Livy up and running? livy.io

  43. Reetika 10 months ago

    One question , Executing hive queries through spark Notebook does not need livy? Beacuse I am able to run hive queries through spark Notebook option.

    • Author
      Hue Team 10 months ago

      As long as you use the Hive snippet type, the queries will be directly sent to HiveServer2.

Leave a reply

Your email address will not be published. Required fields are marked *

*