Get started with Spark: deploy Spark Server and compute Pi from your Web Browser

Get started with Spark: deploy Spark Server and compute Pi from your Web Browser

Note: This post is deprecated as of Hue 3.8 / April 24th 2015. Hue now has a new Spark Notebook application.

 

Hue ships with Spark Application that lets you submit Scala and Java Spark jobs directly from your Web browser.

The open source Spark Job Server is used for communicating with Spark (e.g. for listing, submitting Spark jobs, retrieving the results, creating contexts…).

Here are more details about how to run the Spark Job server as a service. This is better suited for production, to the contrary of the development mode detailed in the previous post. We are using CDH5.0 and Spark 0.9.

Package and Deploy the server

Most of the instructions are on the github.

We start by checking out the repository and building the project (note: if you are on Ubuntu and encrypted your disk, you will need to build from  /tmp). Then, from the Spark Job Server root directory:

mkdir bin/config
cp config/local.sh.template bin/config/settings.sh

And these two variables in settings.sh:

LOG_DIR=/var/log/job-server
SPARK_HOME=/usr/lib/spark (or SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark)

Then package everything:

bin/server_deploy.sh settings.sh
[info] - should return error message if classPath does not match
[info] - should error out if loading garbage jar
[info] - should error out if job validation fails
...
[info] Packaging /tmp/spark-jobserver/job-server/target/spark-job-server.jar ...
[info] Done packaging.
[success] Total time: 149 s, completed Jun 2, 2014 5:15:14 PM
/tmp/job-server /tmp/spark-jobserver
log4j-server.properties
server_start.sh
spark-job-server.jar
/tmp/spark-jobserver
Created distribution at /tmp/job-server/job-server.tar.gz

We have our main tarball /tmp/job-server/job-server.tar.gz, ready to be copied on a server.

Note:
You could also automatically copy the files with server_deploy.sh.

Start the Spark Job Server

We then extract job-server.tar.gz and copy our application.conf on the server. Make sure than ‘master’ points to the correct Spark Master URL.

scp /tmp/spark-jobserver/./job-server/src/main/resources/application.conf hue@server.com:

Edit application.conf to point to the master:

# Settings for safe local mode development
spark {
  master = "spark://spark-host:7077"
  …
}

Here is the content of our jobserver folder:

ls -l
total 25208
-rw-rw-r-- 1 ubuntu ubuntu     2015 Jun  9 23:05 demo.conf
-rw-rw-r-- 1 ubuntu ubuntu     2563 Jun 11 16:32 gc.out
-rw-rw-r-- 1 ubuntu ubuntu      588 Jun  9 23:05 log4j-server.properties
-rwxrwxr-x 1 ubuntu ubuntu     2020 Jun  9 23:05 server_start.sh
-rw-rw-r-- 1 ubuntu ubuntu      366 Jun  9 23:13 settings.sh
-rw-rw-r-- 1 ubuntu ubuntu 13673788 Jun  9 23:05 spark-job-server.jar

Note:
You can get the spark URL by looking at it on the Spark Master Web UI.

Also make sure that you see at least one Spark work:  "Workers: 1"

In the past, we had some problems (e.g. spark worker not starting) when trying to bind Spark to a localhost. We fixed it by hardcoding in the spark-env.sh:

sudo vim /etc/spark/conf/spark-env.sh

export STANDALONE_SPARK_MASTER_HOST=spark-host

Now just start the server and the process will run in the background:

./server_start.sh

You can check if it is alive by grepping it:

ps -ef | grep 9999

ubuntu   28755     1  2 01:41 pts/0    00:00:11 java -cp /home/ubuntu/spark-server:/home/ubuntu/spark-server/spark-job-server.jar::/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/conf:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/assembly/lib/*:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/examples/lib/*:/etc/hadoop/conf:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/*:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/../hadoop-hdfs/*:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/../hadoop-yarn/*:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/../hadoop-mapreduce/*:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/lib/scala-compiler.jar:/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/lib/jline.jar -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCTimeStamps -Xloggc:/home/ubuntu/spark-server/gc.out -XX:MaxPermSize=512m -XX:+CMSClassUnloadingEnabled -Xmx5g -XX:MaxDirectMemorySize=512M -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=9999 -Dcom.sun.manage

That’s it!

Run the Pi example!

The Spark Job Server comes with a few examples that you can build with one command. Let’s run the Pi job.

We open up the Spark App on http://hue:8888/spark, go to the the application tab and upload the job-server-tests-0.3.x.jar.

Now in the editor, specify the class to run, here spark.jobserver.LongPiJob and execute it!

You will see the Spark Application running on the Spark Master UI too. If you want to get a long running application, create a context, then assign this context to the application in the editor.

spark-master-ui

 

Sum-up

This is how we setup the Spark Server on demo.gethue.com/spark. As usual, feel free to comment on the hue-user list or @gethue!

Happy Sparking!

PS: we hope to see you in person at the Hue or Job Server talks at the upcoming Spark Summit!

12 Comments

  1. liyang 4 years ago

    I have started spark-jobserver,but when i use hue spark,
    HTTP method not allowed, supported methods: GET (error 405)
    Spark Editor The app won’t work without a running Livy Spark Server

    • Hue Team 4 years ago

      That’s because you are using the upstream master version of Hue and the Spark Server has changed in the meanwhile. You can run the new server with ./build/env/bin/hue livy_server

      • obins 4 years ago

        Thanks Hue team. I had this error “The app won’t work without a running Livy Spark Server” coming on hue 3.8. But sorted after starting livy_server (/build/env/bin/hue livy_server).

  2. Hue Team 4 years ago

    Current Spark Livy Server works with Scala 1.2, and soon 1.3,.

    Which Hue are you using?

    • liyang 4 years ago

      My hue is Hue-3.7.0(branch-3.7.1),and hadoop is 2.2.0,scala-2.11.2,spark-0.9.2-bin-hadoop2.
      Now I test the spark and scala is ok.
      But the hue does not work.

  3. liyang 4 years ago

    My hue is Hue-3.7.0(branch-3.7.1),and hadoop is 2.2.0,scala-2.11.2,spark-0.9.2-bin-hadoop2.
    Now I test the spark and scala is ok.
    But the hue does not work.

  4. Tarek Abouzeid 4 years ago

    i am working on a cluster of 3 nodes 1 master and 2 workers , i deployed the job server on the master and opened the hue spark UI , and uploaded the job server example jar tried to submit the LongPiJob , the job failed due to not found class of server in both workers giving this error :

    Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4, bdvm03.ejada.com): java.lang.NoClassDefFoundError: spark/jobserver/SparkJob at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at spark.jobserver.LongPiJob$$anonfun$estimatePi$1.apply(LongPiJob.scala:59) at spark.jobserver.LongPiJob$$anonfun$estimatePi$1.apply(LongPiJob.scala:57) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:172) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$18.apply(RDD.scala:853) at org.apache.spark.rdd.RDD$$anonfun$18.apply(RDD.scala:851) at org.apache.spark.SparkContext$$anonfun$29.apply(SparkContext.scala:1350) at org.apache.spark.SparkContext$$anonfun$29.apply(SparkContext.scala:1350) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassNotFoundException: spark.jobserver.SparkJob at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) … 28 more Driver stacktrace:

  5. sidahmed benkhaoua 4 years ago

    Could not find or load main class spark.jobserver.JobServer

Leave a reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.