YARN Resource Manager High Availability (HA) in MR2

YARN Resource Manager High Availability (HA) in MR2

Similarly to the JobTracker High Availability configuration for MR1, Hue supports (since today’s date in master or Hue 3.7 or CDH5.1) more than one Resource Manager in case the Resource Manager goes down.

Hue will automatically pick up the active Resource Manager even if it failed over. This is possible because:

  • When submitting Oozie jobs, the logical name of the Resource Manager is used instead of the hostname of the current Resource Manager
  • Job Browser will automatically look for the active Resource Manager API if needed

Here is an example of configuration for the [[yarn_clusters]] section in hue.ini:

[hadoop]

  # Configuration for YARN (MR2)
  # ------------------------------------------------------------------------
  [[yarn_clusters]]

    [[[default]]]

      # Whether to submit jobs to this cluster
      submit_to=True

      # Name used when submitting jobs
      logical_name=ha-rm

      # URL of the ResourceManager API
      resourcemanager_api_url=http://gethue-1.com:8088

      # URL of the ProxyServer API
      proxy_api_url=http://gethue-1.com:8088

      # URL of the HistoryServer API
      history_server_api_url=http://gethue-1.com:19888

    [[[ha]]]
      # Enter the host on which you are running the failover Resource Manager
      resourcemanager_api_url=http://gethue-2.com:8088
      logical_name=ha-rm
      submit_to=True

We hope that the multi Resource Manager support will make your life with Hadoop easier!

As usual feel free to send feedback on the hue-user list or @gethue!

11 Comments

  1. John 3 years ago

    Would like to inform anybody who doesn’t know what is logical name. It is the yarn.resourcemanager.cluster-id found in /etc/hadoop/conf/yarn-site.xml

  2. Rahul 3 years ago

    Your comment is awaiting moderation.
    I have the below settings in yarn-site.xml

    yarn.resourcemanager.ha.rm-ids=rm1,rm2
    yarn.resourcemanager.hostname=HOSTNAME-OF-RM1
    yarn.resourcemanager.hostname.rm1=HOSTNAME-OF-RM1
    yarn.resourcemanager.hostname.rm2=HOSTNAME-OF-RM2

    so in hue.ini I sould have the below configuration right?

    [[yarn_clusters]]
    [[[default]]]
    logical_name=rm1

    [[[HA]]]
    resourcemanager_api_url=HOSTNAME-OF-RM2
    logical_name=rm2
    submit_to=True

    Please let me know if the values are correct?

    • Hue Team 3 years ago

      Do not forget some fields, e.g.:

      [[yarn_clusters]]
      [[[default]]]
      logical_name=rm1
      resourcemanager_api_url=http://HOSTNAME-OF-RM1:8088
      history_server_api_url=http://HOSTNAME-OF-HS:19888
      submit_to=True
      [[[HA]]]
      logical_name=rm2
      resourcemanager_api_url=http://HOSTNAME-OF-RM2:8088
      history_server_api_url=http://HOSTNAME-OF-HS:19888
      submit_to=True

  3. Jorge 2 years ago

    I’ve installed Hue 3.9 and I have a problem when I try to run pig code Pig Editor or an Job Designer example but it works if I run “oozie job” command. Error in oozie.log:

    2015-10-08 04:36:07,155 WARN ActionStartXCommand:523 – SERVER[m1.novalocal] USER[bbvoop] GROUP[-] TOKEN[] APP[Shell] JOB[0000001-151008034921005-oozie-bbvo-W] ACTION[0000001-1510
    [email protected]] Error starting action [Shell]. ErrorType [TRANSIENT], ErrorCode [JA009], Message [JA009: Cannot initialize Cluster. Please check your configuration
    for mapreduce.framework.name and the correspond server addresses.]
    org.apache.oozie.action.ActionExecutorException: JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses
    .
    at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:454)
    at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:434)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1130)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1299)
    at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:250)
    at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
    at org.apache.oozie.command.XCommand.call(XCommand.java:286)
    at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
    Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
    at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
    at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
    at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475)
    at org.apache.hadoop.mapred.JobClient.(JobClient.java:454)
    at org.apache.oozie.service.HadoopAccessorService$3.run(HadoopAccessorService.java:452)
    at org.apache.oozie.service.HadoopAccessorService$3.run(HadoopAccessorService.java:450)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:450)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1342)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1078)
    … 8 more

    The configuration in Hue.ini

    [[yarn_clusters]]

    [[[default]]]
    # Enter the host on which you are running the ResourceManager
    ## resourcemanager_host=m2.novalocal

    # The port where the ResourceManager IPC listens on
    ## resourcemanager_port=8032

    # Whether to submit jobs to this cluster
    submit_to=True

    # Resource Manager logical name (required for HA)
    logical_name=rm1

    # Change this if your YARN cluster is Kerberos-secured
    security_enabled=true

    # URL of the ResourceManager API
    resourcemanager_api_url=http://m2.novalocal:8088

    # URL of the ProxyServer API
    proxy_api_url=http://m2.novalocal:8088

    # URL of the HistoryServer API
    history_server_api_url=http://m3.novalocal:19888

    # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
    # have to be verified against certificate authority
    ## ssl_cert_ca_verify=True

    # HA support by specifying multiple clusters
    # e.g.

    [[[ha]]]
    # Resource Manager logical name (required for HA)
    logical_name=rm2
    # URL of the ResourceManager API
    resourcemanager_api_url=http://m3.novalocal:8088

    # URL of the HistoryServer API
    history_server_api_url=http://m3.novalocal:19888

    # URL of the ProxyServer API
    proxy_api_url=http://m2.novalocal:8088

    # Change this if your YARN cluster is Kerberos-secured
    security_enabled=true

    # Whether to submit jobs to this cluster
    submit_to=True

    Other hue applications (hive editor, spark notebooks,…) works fine. Any idea?

  4. dale 2 years ago

    For a HA cluster, should the “[[[default]]] resourcemanager_host” & “[[[default]]] resourcemanager_port” be commented out of the hue.ini script?

    • dale 2 years ago

      In fact it may just be easier for me to show my .ini config for YARN as a whole. I’ve encountered a lot of issues recently since my Active/Standby RM switched over. Since then, I’ve seen multiple issues and queries are no longer running.

      [[yarn_clusters]]

      [[[default]]]
      # Enter the host on which you are running the ResourceManager
      resourcemanager_host=x.y.z.224

      # The port where the ResourceManager IPC listens on
      resourcemanager_port=8032

      # Whether to submit jobs to this cluster
      submit_to=True

      # Resource Manager logical name (required for HA)
      logical_name=rm1

      # Change this if your YARN cluster is Kerberos-secured
      ## security_enabled=false

      # URL of the ResourceManager API
      resourcemanager_api_url=http://x.y.z.224:8088

      # URL of the ProxyServer API
      ## proxy_api_url=http://x.y.z.224:8088

      # URL of the HistoryServer API
      history_server_api_url=http://x.y.z.223:19888

      # In secure mode (HTTPS), if SSL certificates from from YARN Rest APIs
      # have to be verified against certificate authority
      ## ssl_cert_ca_verify=True

      # HA support by specifying multiple clusters
      # e.g.

      [[[ha]]]
      # Resource Manager logical name (required for HA)
      logical_name=rm2

      resourcemanager_api_url=http://x.y.z.223:8088

      history_server_api_url=http://x.y.z.223:19888

      submit_to=True

      Thanks.

    • Hue Team 2 years ago

      Yes, if you have 2 RM, you need to specify boths host/ports in the config by un-commentting them (and putting their info, that way Hue knows where to point to).

  5. Marek 1 year ago

    Something is not correct with this configuration. I have HUE 3.8.1.

    I have the following setup and I can see in tcpdump that hue is always trying to get to default node on port 8088, it receives 307 redirect and then it is following this redirect, but next call is made again to default node even if it is not an active rm.

    Sample redirect follow:

    09:18:09.480630 IP hue.48928 > rm1.radan-http:
    GET /ws/v1/cluster/apps?user=maslanm&finalStatus=UNDEFINED HTTP/1.1
    Host: rm1:8088
    Connection: keep-alive
    Accept-Encoding: gzip, deflate
    Accept: application/json
    User-Agent: python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-431.20.3.el6.x86_64

    09:18:09.482260 IP rm1.radan-http > hue.48928:

    HTTP/1.1 307 TEMPORARY_REDIRECT
    Cache-Control: no-cache
    Expires: Wed, 10 Aug 2016 08:18:09 GMT
    Date: Wed, 10 Aug 2016 08:18:09 GMT
    Pragma: no-cache
    Expires: Wed, 10 Aug 2016 08:18:09 GMT
    Date: Wed, 10 Aug 2016 08:18:09 GMT
    Pragma: no-cache
    Content-Type: text/plain; charset=UTF-8
    Location: http://rm2:8088/ws/v1/cluster/apps
    Content-Length: 98
    Server: Jetty(6.1.26.hwx)

    09:18:09.485800 IP hue.48928 > rm2.radan-http:
    GET /ws/v1/cluster/apps HTTP/1.1
    Host: rm2:8088
    Connection: keep-alive
    Accept-Encoding: gzip, deflate
    Accept: application/json
    User-Agent: python-requests/2.6.0 CPython/2.6.6 Linux/2.6.32-431.20.3.el6.x86_64

    and we are loosing GET parameters from the url – so in this case hue is showing all the apps instead of the ones specific for the user in query.

    My current hue.ini part:
    [[yarn_clusters]]
    [[[default]]]
    logical_name=rm1
    resourcemanager_api_url=http://rm1:8088
    history_server_api_url=http://rm1:19888
    proxy_api_url=http://rm1:8088
    submit_to=True
    resourcemanager_host=rm1
    resourcemanager_port=8032
    security_enabled=true

    [[[ha]]]
    logical_name=rm2
    resourcemanager_api_url=http://rm2:8088
    proxy_api_url=http://rm2:8088
    submit_to=True
    resourcemanager_host=rm2
    resourcemanager_port=8032
    security_enabled=true

    Can you guys also change “logical_name=ha-rm” it is set to the same value and according to above comments it should have 2 different values?

    • Author
      Hue Team 1 year ago

      Are you using kerberos? (seems like yes according to the config) There was a bunch of recent improvements that are in Hue 3.11 or CDH5.8 about RM HA support.

      About logical_name, I believe it should be the same name

      • Marek 1 year ago

        Yes,
        I’m using kerberos. Sure we will be updating HUE to newest stable version in upcoming weeks and I can recheck it then.

        Just to clarify logical_name:
        in first comment John wrote that logical_name should be the same as “yarn.resourcemanager.cluster-id”, in replay Hue Team suggested “yarn.resourcemanager.ha.rm-ids”. In all the examples above people were using “yarn.resourcemanager.ha.rm-ids” for logical_name – different for [[[default]]] and [[[ha]]] sections. What is the correct value for logical_name property in both sections?

Leave a reply

Your email address will not be published. Required fields are marked *

*