Export and import your Oozie workflows

Export and import your Oozie workflows

August 7th 2015 update: this post is now deprecated as of Hue 3.9: http://gethue.com/exporting-and-importing-oozie-workflows/

 

There is no handy way to import and export your Oozie workflows until Hue 4 and HUE-1660, but here is a manual workaround possible since Hue 3.8/CDH5.4 and its new Oozie Editor.

The previous methods were very error prone as they required to insert data in multiple tables at the same time. Now, there is only one record by workflow.

 

Export all workflows

./build/env/bin/hue dumpdata desktop.Document2 --indent 2 --natural > data.json

 

Export specific workflows

20000013 is the id you can see in the URL of the dashboard.

./build/env/bin/hue dumpdata desktop.Document2 --indent 2 --pks=20000013 --natural > data.json

You can specify more than one id:

--pks=20000013,20000014,20000015

 

Load the workflows

Then

./build/env/bin/hue loaddata data.json

 

Refresh the documents

Until we hit Hue 4, this step is required in order to make the imported documents appear:

./build/env/bin/hue sync_documents

 

And that’s it, the dashboards with the same IDs will be refreshed with the imported ones!

oozie-spark

 

Note:

If the document with the same id already exists in the database, just set its id to null in data.json and it will be inserted as a new document.

vim data.json

then change

"pk": 16,

to

"pk": null,

 

Note:

If using CM, export this variable in order to point to the correct database:

HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/-hue-HUE_SERVER-id
echo $HUE_CONF_DIR
export HUE_CONF_DIR

Where <id> is the most recent ID in that process directory for hue-HUE_SERVER.

or even quicker

export HUE_CONF_DIR="/var/run/cloudera-scm-agent/process/`ls -alrt /var/run/cloudera-scm-agent/process | grep HUE | tail -1 | awk '{print $9}'`"

 

Have any questions? Feel free to contact us on hue-user or @gethue!

22 Comments

  1. Miles Y. 3 years ago

    Tried this procedure to migrate workflows from one CDH 5.4.4 (Hue 3.7.0) cluster to a new CDH 5.6 (Hue 3.9) one, but loaddata fails with the following:

    [root ~]# $CLOUDERA_HOME/lib/hue/build/env/bin/hue loaddata ETA_RackRpt_Wf.json
    Traceback (most recent call last):
    File “/opt/cloudera/parcels/CDH/lib/hue/build/env/bin/hue”, line 12, in
    load_entry_point(‘desktop==3.9.0’, ‘console_scripts’, ‘hue’)()
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/desktop/core/src/desktop/manage_entry.py”, line 57, in entry
    execute_from_command_line(sys.argv)
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/management/__init__.py”, line 399, in execute_from_command_line
    utility.execute()
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/management/__init__.py”, line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/management/base.py”, line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/management/base.py”, line 285, in execute
    output = self.handle(*args, **options)
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/management/commands/loaddata.py”, line 55, in handle
    self.loaddata(fixture_labels)
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/management/commands/loaddata.py”, line 84, in loaddata
    self.load_label(fixture_label)
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/management/commands/loaddata.py”, line 134, in load_label
    for obj in objects:
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/serializers/json.py”, line 76, in Deserializer
    six.reraise(DeserializationError, DeserializationError(e), sys.exc_info()[2])
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/serializers/json.py”, line 70, in Deserializer
    for obj in PythonDeserializer(objects, **options):
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/serializers/python.py”, line 117, in Deserializer
    m2m_data[field.name] = [m2m_convert(pk) for pk in field_value]
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/serializers/python.py”, line 112, in m2m_convert
    return field.rel.to._default_manager.db_manager(db).get_by_natural_key(*value).pk
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/desktop/core/src/desktop/models.py”, line 731, in get_by_natural_key
    return self.get(uuid=uuid, version=version, is_history=is_history)
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/models/manager.py”, line 151, in get
    return self.get_queryset().get(*args, **kwargs)
    File “/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/models/query.py”, line 310, in get
    self.model._meta.object_name)
    django.core.serializers.base.DeserializationError: Problem installing fixture ‘ETA_RackRpt_Wf.json’: Document2 matching query does not exist.

    Is this due to incompatible object format between the two versions? Any workaround? (except upgrading the CDH 5.4.4 cluster)

    • Hue Team 3 years ago

      Does this workflow have a subworkflow?

      It seems like one of its dependency does not exist in the targeted Hue.

      • Miles Y. 3 years ago

        No, it’s the bi-directional dependency set when workflow is used in a coordinator.

        Standalone workflows w/o coordinators import with no issue. I manually removed the values from “dependencies” field in all workflow jsons, and were able to import them via Hue Workflow Editor UI. On the other hand, command-line import via loaddata doesn’t work. Hue UI cannot see them – apparently because a separate reference needs to be added in desktop_document table.

        Please clarify and update your documentation accordingly.

        Thanks!

  2. Rich vh. 3 years ago

    I am also having the same problem/error above with workflows containing subworkflows (Hue 3.9). The workflow json dump file seems to contain 2 way dependencies between the master workflow with referenced subworkflow and the subworkflow itself back to the master workflow as shown in the json snippet below. Attempting to load either of these independently fails because the process expect the other workflow to already exist.

    {
    “pk”: 9,
    “model”: “desktop.document2”,
    “fields”: {
    “uuid”: “eba3df3d-db09-d148-d932-390fda33c39c”,
    “extra”: “”,
    “type”: “oozie-workflow2”,
    “description”: “Diff Ratio Analytics Calculation Engine”,
    “tags”: [],
    “is_history”: false,
    “last_modified”: “2016-03-07T15:36:07.279”,
    “version”: 1,
    “owner”: [
    “admin”
    ],
    “dependencies”: [
    [
    “89c8f74e-bd6b-ea03-9366-7a3529539549”,
    1,
    false
    ]
    ],
    “data”: “{}”,
    “name”: “Diff Ratio Analytics – Live”
    }
    },
    {
    “pk”: 19,
    “model”: “desktop.document2”,
    “fields”: {
    “uuid”: “89c8f74e-bd6b-ea03-9366-7a3529539549”,
    “extra”: “”,
    “type”: “oozie-workflow2”,
    “description”: “Master workflow for running weekly workflows”,
    “tags”: [],
    “is_history”: false,
    “last_modified”: “2016-03-29T10:55:44.753”,
    “version”: 1,
    “owner”: [
    “admin”
    ],
    “dependencies”: [
    [
    “eba3df3d-db09-d148-d932-390fda33c39c”,
    1,
    false
    ],
    [
    “89efbbc6-f906-d83c-964a-0b49f8b7ebbc”,
    1,
    false
    ]
    ],
    “data”: “{}”,
    “name”: “Weekly Loads – Live”
    }
    }

  3. Raj 3 years ago

    I face the following error when exporting oozie workflow from cdh 5.1 to cdh 5.9

    return self.config.get_value(data, present=present, prefix=self.prefix, coerce_type=True)
    File “/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hue/desktop/core/src/desktop/lib/conf.py”, line 263, in get_value
    return self._coerce_type(raw_val, prefix)
    File “/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hue/desktop/core/src/desktop/lib/conf.py”, line 283, in _coerce_type
    return self.type(raw)
    File “/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hue/desktop/core/src/desktop/lib/conf.py”, line 712, in coerce_password_from_script
    raise subprocess.CalledProcessError(p.returncode, script)
    subprocess.CalledProcessError: Command ‘/var/run/cloudera-scm-agent/process/851-hue-HUE_SERVER/altscript.sh sec-8-password’ returned non-zero exit status 1

  4. Miles Y. 3 years ago

    The v3.9 shipped with CDH 5.7.1 seems to have introduced another bug: export/import workflow between two CDH 5.7.1 clusters failed with error “Problem installing fixture ‘***.json’: Document2 matching query does not exist.” Same behavior through either web UI or command line.

    Experimenting revealed the probable cause: new composite field “parent_directory” introduced since CDH 5.6:

    {
    “pk”: 50112,
    “model”: “desktop.document2”,
    “fields”: {
    “uuid”: “bfc4ca31-9497-40e5-a322-cc57550c4c8e”,
    “extra”: “”,
    “type”: “oozie-workflow2”,
    “description”: “…..”,
    “is_history”: false,
    “parent_directory”: [
    “d19cd8da-56f4-415b-bb87-61e9e16d6c9c”,
    1,
    false
    ],
    “last_modified”: “2016-07-29T14:13:20.889”,
    “version”: 1,

    }
    }

    Checking desktop_document2 schema shows that it has ‘parent_directory_id’, but not ‘parent_directory’. Manually deleting ‘parent_directory’ allowed import to succeed. However, this design seems similar to ‘owner’ field, which also doesn’t exist in the table.

    Any suggestion?

    • Author
      Hue Team 3 years ago

      CDH5.8 has a bunch of improvements about exporting / importing documents

  5. Suyog 2 years ago

    Hi,

    I want to export oozie workflows/jobs from hue 2.6 to hue 3.9.

    ./hue dumpdata desktop.Document2 –indent 2 –pks=298 –natural > data.json gives below error:

    ./hue: error: no such option: –pks

    Please guide me through the process.

    • Author
      Hue Team 2 years ago

      Have you tried without –pks?

      • Suyog 2 years ago

        Yes, but it was giving me an error:

        ./hue: error: no such option: -i.

        So I tried without -indent as well…it worked.

        Thanks…!!

        But am I doing anything wrong to export specific workflows ??

        • Suyog 2 years ago

          And while importing in Hue 3.9, I am getting below error:

          /usr/local/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/base.py 112 get_response

          /usr/local/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/transaction.py 371 inner

          /usr/local/hue/desktop/core/src/desktop/api2.py 136 import_documents

          • Suyog 2 years ago

            Sorry guys,

            missed the ‘hue loaddata’ step.

            Data is imported properly and I can see the workflows in newly installed HUE.

            Thanks for the help…!!

  6. Amandeep 1 year ago

    Hi,

    I was able to dump data from Hue 3.7 and import the data in Hue 4.1 but it says

    This workflow was imported from an old Hue version, save it to create a copy in the new format or open it in the old editor.

    and when I save it to create a copy in the new format I get this exception, so the workflows are not getting saved in the new format and saving in old format is deleting arguments and File Paths in Shell Command

    [17/Nov/2017 04:12:22 -0800] access INFO 172.31.18.115 hadoop – “POST /oozie/editor/workflow/save/ HTTP/1.1”
    [17/Nov/2017 04:12:22 -0800] middleware INFO Processing exception: ‘NoneType’ object has no attribute ‘do_as_user’: Traceback (most recent call last):
    File “/usr/lib/hue/build/env/lib/python2.7/site-packages/Django-1.6.10-py2.7.egg/django/core/handlers/base.py”, line 112, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
    File “/usr/lib/hue/build/env/lib/python2.7/site-packages/Django-1.6.10-py2.7.egg/django/db/transaction.py”, line 371, in inner
    return func(*args, **kwargs)
    File “/usr/lib/hue/apps/oozie/src/oozie/decorators.py”, line 113, in decorate
    return view_func(request, *args, **kwargs)
    File “/usr/lib/hue/apps/oozie/src/oozie/decorators.py”, line 103, in decorate
    return view_func(request, *args, **kwargs)
    File “/usr/lib/hue/apps/oozie/src/oozie/views/editor2.py”, line 206, in save_workflow
    workflow_doc = _save_workflow(workflow, layout, request.user)
    File “/usr/lib/hue/apps/oozie/src/oozie/models2.py”, line 3659, in _save_workflow
    _import_workspace(fs, user, workflow_instance)
    File “/usr/lib/hue/apps/oozie/src/oozie/models2.py”, line 3635, in _import_workspace
    job.check_workspace(fs, user)
    File “/usr/lib/hue/apps/oozie/src/oozie/models2.py”, line 94, in check_workspace
    create_directories(fs, [REMOTE_SAMPLE_DIR.get()])
    File “/usr/lib/hue/desktop/libs/liboozie/src/liboozie/submission2.py”, line 512, in create_directories
    if not fs.do_as_user(fs.DEFAULT_USER, fs.exists, directory):
    AttributeError: ‘NoneType’ object has no attribute ‘do_as_user’

Leave a reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.