Object/File Storage public REST API

Published on 13 August 2021 in Version 4.11 / Development / Browsing - 5 minutes read

Leveraging more REST API into your own project: list, upload, download… data files.

This post comes with a live tutorial of the file listing API.

The Hue SQL Editor project has been evolving for more than 10 years and allows you to query any Database or Data Warehouse.

Like previously described in the SQL Editor API post, all the end user functionalities and under the cover grunt work of integration can be simply reused programmatically (freeing up time to let you focus on the data work itself instead).

The main use cases for the File API is to upload files and create an SQL Table on top of them or retrieve those pesky URIs: Filebrowser Copy Path and Open In Importer Options

The API leverages the standard credentials of your users (SSO via LDAP, SAML…) and is the same as if they were interacting via the Web UI directly. In bonus, it is Cloud agnostic so end users don't need to learn about the intricacies of each provided and use an interface they are already familiar with.

Object Storage API

The simplest operation is to list the content of your buckets or directories.

Start by asking for an API access token (also known as JWT):

curl -X POST https://demo.gethue.com/api/token/auth -d 'username=demo&password=demo'

{"refresh":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoicmVmcmVzaCIsImV4cCI6MTYyOTQ3MTE0MiwianRpIjoiYjNkMDUzN2I1OGU5NDNlZGE0OTJiYzVmOTkzMDEwOTEiLCJ1c2VyX2lkIjoyfQ._MXo09PzisvqY7-1NMVIaLiUCVksYx2ZA5v_PWTk0TY","access":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc"}

Then provide the access value in each following calls. In your case, update the examples below with your own:

Authorization: Bearer <Your access value here>"

Here is how to list the content of a path, here a S3 bucket s3a://demo-gethue:

curl -X GET https://demo.gethue.com/api/storage/view=s3a://demo-gethue -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc"

{
  "path": "s3a://demo-gethue",
  "breadcrumbs": [
    {
      "url": "s3a%3A%2F%2F",
      "label": "s3a://"
    },
    {
      "url": "s3a%3A%2F%2Fdemo-gethue",
      "label": "demo-gethue"
    }
  ],
  "current_request_path": "/filebrowser/view=s3a%3A%2F%2Fdemo-gethue",
  "is_trash_enabled": false,
  "files": [
    {
      "path": "s3a://",
      "name": "..",
      "stats": {
        "path": "s3a://",
        "size": 0,
        "atime": null,
        "mtime": null,
        "mode": 16895,
        "user": "",
        "group": "",
        "aclBit": false
      },
      "mtime": "",
      "humansize": "0 bytes",
      "type": "dir",
      "rwx": "drwxrwxrwx",
      "mode": "40777",
      "url": "/filebrowser/view=s3a%3A%2F%2F",
      "is_sentry_managed": false
    },
    {
      "path": "s3a://demo-gethue",
      "name": ".",
      "stats": {
        "path": "s3a://demo-gethue",
        "size": 0,
        "atime": 1628866612,
        "mtime": 1628866612,
        "mode": 16895,
        "user": "",
        "group": "",
        "aclBit": false
      },
      "mtime": "August 13, 2021 02:56 PM",
      "humansize": "0 bytes",
      "type": "dir",
      "rwx": "drwxrwxrwx",
      "mode": "40777",
      "url": "/filebrowser/view=s3a%3A%2F%2Fdemo-gethue",
      "is_sentry_managed": false
    },
    {
      "path": "s3a://demo-gethue/data",
      "name": "data",
      "stats": {
        "path": "s3a://demo-gethue/data/",
        "size": 0,
        "atime": null,
        "mtime": null,
        "mode": 16895,
        "user": "",
        "group": "",
        "aclBit": false
      },
      "mtime": "",
      "humansize": "0 bytes",
      "type": "dir",
      "rwx": "drwxrwxrwx",
      "mode": "40777",
      "url": "/filebrowser/view=s3a%3A%2F%2Fdemo-gethue%2Fdata",
      "is_sentry_managed": false
    }
  ],
  "page": {
    "number": 1,
    "num_pages": 1,
    "previous_page_number": 0,
    "next_page_number": 0,
    "start_index": 1,
    "end_index": 1,
    "total_count": 1
  },
  "pagesize": 30,
  "home_directory": null,
  "descending": null,
  "cwd_set": true,
  "file_filter": "any",
  "current_dir_path": "s3a://demo-gethue",
  "is_fs_superuser": false,
  "groups": [],
  "users": [],
  "superuser": null,
  "supergroup": null,
  "is_sentry_managed": false,
  "apps": [
    "filebrowser",
    "metastore",
    "useradmin",
    "indexer",
    "notebook"
  ],
  "show_download_button": true,
  "show_upload_button": true,
  "is_embeddable": false,
  "s3_listing_not_allowed": ""
}

Some of the parameters:

  • pagesize=45
  • pagenum=1
  • filter=
  • sortby=name
  • descending=false

e.g. pagesize=45&pagenum=1&filter=&sortby=name&descending=false

Then peek at the data of the s3a://demo-gethue/data/web_logs/index_data.csv file:

curl -X GET https://demo.gethue.com/api/storage/view=s3a://demo-gethue/data/web_logs/index_data.csv -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc"

{
  "show_download_button": true,
  "is_embeddable": false,
  "editable": false,
  "mtime": "October 31, 2016 03:34 PM",
  "rwx": "-rw-rw-rw-",
  "path": "s3a://demo-gethue/data/web_logs/index_data.csv",
  "stats": {
  "size": 6199593,
  "aclBit": false,
  ...............
  "contents": "code,protocol,request,app,user_agent_major,region_code,country_code,id,city,subapp,latitude,method,client_ip,  user_agent_family,bytes,referer,country_name,extension,url,os_major,longitude,device_family,record,user_agent,time,os_family,country_code3
    200,HTTP/1.1,GET /metastore/table/default/sample_07 HTTP/1.1,metastore,,00,SG,8836e6ce-9a21-449f-a372-9e57641389b3,Singapore,table,1.2931000000000097,GET,128.199.234.236,Other,1041,-,Singapore,,/metastore/table/default/sample_07,,103.85579999999999,Other,"demo.gethue.com:80 128.199.234.236 - - [04/May/2014:06:35:49 +0000] ""GET /metastore/table/default/sample_07 HTTP/1.1"" 200 1041 ""-"" ""Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org)""
    ",Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org),2014-05-04T06:35:49Z,Other,SGP
    200,HTTP/1.1,GET /metastore/table/default/sample_07 HTTP/1.1,metastore,,00,SG,6ddf6e38-7b83-423c-8873-39842dca2dbb,Singapore,table,1.2931000000000097,GET,128.199.234.236,Other,1041,-,Singapore,,/metastore/table/default/sample_07,,103.85579999999999,Other,"demo.gethue.com:80 128.199.234.236 - - [04/May/2014:06:35:50 +0000] ""GET /metastore/table/default/sample_07 HTTP/1.1"" 200 1041 ""-"" ""Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org)""
    ",Mozilla/5.0 (compatible; phpservermon/3.0.1; +http://www.phpservermonitor.org),2014-05-04T06:35:50Z,Other,SGP
  ...............
}

Some of the parameters:

  • offset=0
  • length=204800
  • compression=none
  • mode=text

e.g. ?offset=0&length=204800&compression=none&mode=text

And then decide to download it:

curl -X GET https://demo.gethue.com/api/storage/download=s3a://demo-gethue/data/web_logs/index_data.csv -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjI4OTUyNzQyLCJqdGkiOiJkYTEzZjI2OWY2N2M0MTNiODNiNGYwNzY1ZDA3NzdmMCIsInVzZXJfaWQiOjJ9.47gnDdIwVSo_cULXU856WUgW8FW7UHXMg7FH-dDpoRc"
  • download: file path from any configured remote file system

It is also possible to upload your data directly (if you have the proper write permissions in the remote destination folder):

curl -X POST https://demo.gethue.com/api/storage/upload/file?dest=s3a://demo-gethue/web_log_data/ --form [email protected]
  • dest: folder path will be created if it does not exist yet
  • hdfs_file: relative or absolute path to a file. It should be read more like local_file, it is not related to HDFS

When the data is stored in the cloud, it becomes easy to create a SQL table so that it can be queried. One way it to open up the File Browser and copy the path of the data into a CREATE TABLE or go via the Create table wizard which will do all the work for you.

Note that small data files don't even need to go via the cloud storage and can be directly uploaded via drag & drop in the Web interface or Importer API. Something that will be demoed next so stay tuned!

Importer direct upload steps gif

Proper security

It is also a good timing. The file listing (for HDFS, the Hadoop file system) has be present since day one. Later on AWS S3, Azure Storage, Google Cloud Storage have been added but were lacking fine grained security (i.e. all the users were using the same credentials).

This is not true anymore as recently the shared signed URL technology of these cloud storages is being leveraged under the hood to have each user perform file operations under their own distinct credentials. This allows true self service instead of restricting data uploads to only admin. Users can upload their own files and analyze them without contacting anybody else. Another bottleneck removed!

If interested in more technical details, read more about AWS Shared Signature or Azure Signed URLs.

Open the Create Table Wizard or copy a file URI
Open the Create Table Wizard or copy a file URI

Now there is no excuses to not be data driven via self service ;)

Using GCP or other storage? Let us know!

And In case you missed it, the coolest API is the Execute a SQL query, play with it!

Onwards!

Romain from the Hue Team


comments powered by Disqus

More recent stories

03 May 2023
Discover the power of Apache Ozone using the Hue File Browser
Read More
23 January 2023
Hue 4.11 and its new dialects and features are out!
Read More